Below is a copy and paste of my Mantis bug report. It details how my 3x1TB RAID 5 array got completely wiped out.
I fell into the trap of thinking that my data (family pictures, etc.) was invincible on a RAID system. Not long ago I moved all of this important data onto my RAID array for this reason, and now its all gone. What a hard lesson learned! I posting this not to complain (i realize my faults in this) - but to help others so that this does not happen to them. A RAID array isn't invincible, and you don't need massive damage to hard drives or anything major to lose your data.
Begin mantis bug report paste:
I have been using a 3-drive (3 x 1TB SATA II Drives) raid array with LinuxMCE for the last 6 months. With over 300 movies on it, my entire music collection and all of our home movies and pictures. I put all of our "sensitive data" on the raid array to be protected from data loss - as of today, it is now all gone - all my family's pictures, home movies, and over 6 months of invested time building my media library. I will explain how this happened, and what needs improved to keep it from happening to other people.
Yesterday, I made a new device template for my VizioVW37L TV. After installing it with the setup wizard, my core/hybrid (An Asus M2NPV-VM with an AMD BE-2400. 2 gb RAM), my system started acting funny, and eventually the screen went black. Looking through the web admin, I saw that my software raid 5 was listed as "Failed". Sure enough, none of my movies were available, none of my pictures (which is why the screensaver went to a pure blank screen) - nothing. Looking at the web admin at the individual drives of the raid array, they were all listed as "Spare" and "Removed" (or similar wording). I let it sit overnight to see if it would pick back up, rebooted and reloaded several times, and still nothing. I unplugged the raid array, rebooted, shut down then replugged it all back in. Still nothing on reboot. The raid screen on the web admin still showed the array as "Failed". No more information, nor options to fix it or anything. In desperation, I deleted the raid array device, and created a new one, using the same 3 disks. (Hoping that this one will be detected and all of my media would be back). Well after several hours, still nothing.
Looking at the web admin at this point, the only thing left to try was the "Create Raid Array" button that was sitting next to my new Raid device that I had re-created earlier. I hit this button obviously thinking that it would try to re-initialize the array or otherwise try to start using it again. After hitting the button, the raid status changed to "Damaged/Repairing" - so I figured that it was checking things over and was going to start using the raid array again. After about 6 hours of "Repairing" I found out that it had wiped out my entire raid array (Did it reformat it??). I can now access and use my newly created raid array just fine and an fdsk shows no errors - so I don't think that my array had actually failed in the first place. (Was it a database error? I've been having a ton of database problems lately...)
In my opinion, here is what needs fixed with the current system:
- Currently the admin shows no information as to what disk failed, why it failed or any useful information to help find the problem. This information is essential when a scenario like mine happens.
- Currently, there are no options to repair a reported failed/damaged array or try to redetect it etc. Once it fails, there is simply no recourse to fix it.
- Currently, after creating a new Raid Array device, the user has to hit a "Create Raid Array" type of button. As I learned the hard way, this comepletely reformatted all of my disks, with absolutely no messages or prompting. At a bare minimum, the user should be informed that all data will be wiped from their drives. Furthermore, it should also try to detect if there is an array present on the disk and remind the user that this set of disks already appear to have a Raid setup, and if they continue, the disks will be reformatted.
So in the end I found that just because there is a Raid system available, it doesn't guarantee that your data is safe. However, the above listed features are necessary to ensure that accidents don't happen, and that if there is a drive failure, there are things you can do to keep your data from being lost.
Now the good news:
I'm downloading LMCE 0710 rc2, and plan on videotaping a full install (with lots of devices - CM11A, USBUIRTs, 4 Media Directors, 4 15" Touchscreen orbiters, lots of X10 switches and outlets, template creation for my A/V gear, remaking my scenarios and redoing my timed events....) I'm also going to try to get a SATA port multiplexer and add 2 more 1TB drives and run all 5 drives off of one internal SATA port. In the end, this tragedy will turn into something positive. Its just going to take months to get all of my movies copied again.