Author Topic: Very sad day - (How I lost about 2TB of media on my RAID array) - Hard Lesson  (Read 5107 times)

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Below is a copy and paste of my Mantis bug report. It details how my 3x1TB RAID 5 array got completely wiped out.
I fell into the trap of thinking that my data (family pictures, etc.) was invincible on a RAID system. Not long ago I moved all of this important data onto my RAID array for this reason, and now its all gone. What a hard lesson learned! I posting this not to complain (i realize my faults in this) - but to help others so that this does not happen to them. A RAID array isn't invincible, and you don't need massive damage to hard drives or anything major to lose your data.

Begin mantis bug report paste:
I have been using a 3-drive (3 x 1TB SATA II Drives) raid array with LinuxMCE for the last 6 months. With over 300 movies on it, my entire music collection and all of our home movies and pictures. I put all of our "sensitive data" on the raid array to be protected from data loss - as of today, it is now all gone - all my family's pictures, home movies, and over 6 months of invested time building my media library. I will explain how this happened, and what needs improved to keep it from happening to other people.

Yesterday, I made a new device template for my VizioVW37L TV. After installing it with the setup wizard, my core/hybrid (An Asus M2NPV-VM with an AMD BE-2400. 2 gb RAM), my system started acting funny, and eventually the screen went black. Looking through the web admin, I saw that my software raid 5 was listed as "Failed". Sure enough, none of my movies were available, none of my pictures (which is why the screensaver went to a pure blank screen) - nothing. Looking at the web admin at the individual drives of the raid array, they were all listed as "Spare" and "Removed" (or similar wording). I let it sit overnight to see if it would pick back up, rebooted and reloaded several times, and still nothing. I unplugged the raid array, rebooted, shut down then replugged it all back in. Still nothing on reboot. The raid screen on the web admin still showed the array as "Failed". No more information, nor options to fix it or anything. In desperation, I deleted the raid array device, and created a new one, using the same 3 disks. (Hoping that this one will be detected and all of my media would be back). Well after several hours, still nothing.

Looking at the web admin at this point, the only thing left to try was the "Create Raid Array" button that was sitting next to my new Raid device that I had re-created earlier. I hit this button obviously thinking that it would try to re-initialize the array or otherwise try to start using it again. After hitting the button, the raid status changed to "Damaged/Repairing" - so I figured that it was checking things over and was going to start using the raid array again. After about 6 hours of "Repairing" I found out that it had wiped out my entire raid array (Did it reformat it??). I can now access and use my newly created raid array just fine and an fdsk shows no errors - so I don't think that my array had actually failed in the first place. (Was it a database error?  I've been having a ton of database problems lately...)

In my opinion, here is what needs fixed with the current system:
- Currently the admin shows no information as to what disk failed, why it failed or any useful information to help find the problem. This information is essential when a scenario like mine happens.
- Currently, there are no options to repair a reported failed/damaged array or try to redetect it etc. Once it fails, there is simply no recourse to fix it.
- Currently, after creating a new Raid Array device, the user has to hit a "Create Raid Array" type of button. As I learned the hard way, this comepletely reformatted all of my disks, with absolutely no messages or prompting. At a bare minimum, the user should be informed that all data will be wiped from their drives. Furthermore, it should also try to detect if there is an array present on the disk and remind the user that this set of disks already appear to have a Raid setup, and if they continue, the disks will be reformatted.

So in the end I found that just because there is a Raid system available, it doesn't guarantee that your data is safe. However, the above listed features are necessary to ensure that accidents don't happen, and that if there is a drive failure, there are things you can do to keep your data from being lost.




Now the good news:
I'm downloading LMCE 0710 rc2, and plan on videotaping a full install (with lots of devices - CM11A, USBUIRTs, 4 Media Directors, 4 15" Touchscreen orbiters, lots of X10 switches and outlets, template creation for my A/V gear, remaking my scenarios and redoing my timed events....) I'm also going to try to get a SATA port multiplexer and add 2 more 1TB drives and run all 5 drives off of one internal SATA port. In the end, this tragedy will turn into something positive. Its just going to take months to get all of my movies copied again.
« Last Edit: June 12, 2008, 09:28:01 pm by jondecker76 »

bulek

  • Administrator
  • wants to work for LinuxMCE
  • *****
  • Posts: 909
  • Living with LMCE
    • View Profile
Hi,

I'm sorry about your case. I also learned few things about media collection a hard way (some time ago UpdateMedia Daemon was polluting mp3 files, few mistakes by my own, etc...)... Since then, I always keep the copy of a whole media collection on separate NAS, but mine is much smaller than yours...

Anyway I'm thinking about keeping my media collection in networked NAS anyway and use Core only for basic features..... With all this fluctuation with upgrades, bug fixes, etc... But I'll probably have a copy somewhere also...

Regards,

Bulek.
Thanks in advance,

regards,

Bulek.

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
I think that I'm leaning towards an NAS also. But I will be sure to keep a backup schedule for any data that isn't replaceable - and even my music library which would probably fit on a few DVDs. Anyways, i'm going to move forward with a positive attitude and try to learn from this.

cirion

  • Guru
  • ****
  • Posts: 353
    • View Profile
Ouch! Thats a big loss!

I have a 4x750GB raid 5 in my LinuxMCE, and I must say I have experienced the same error of not getting my media up and the Raid being failed. I have actually gotten this quite often...

It usually happens when I fuck up something.... Like when I am trying different DVB card setups and crash linux, or the power goes out in my appartment (I now have a UPS). I also got it when I installed Mythbuntu on a second partition on my bootdrive. I had carefully planed not to wipe the raid, so I disconnected it. When I then later tried mounting the Raid in Mythbuntu nothing worked... Booted linuxmce and raid was failed and nothing seemed to work. I even reinstalled LinuxMCE... I tried lots of commandline tools and in the end I found I had swapped 2 sata cables... After that I marked the cables 1 - 4 with a permanent marker :)

But unlike you, I have always been able to recover my Raid again after a failure. Usually LinuxMCE starts a rebuild after a day, or after a few reboots. The process takes 14hours on my drives. 3 times it has not recovered by itself and the 2 first times it happend I just disconnected my Raid, reinstalled LinuxMCE and it did a rebuild and found all media. Yesterday I happend again... I pulled the wrong cable... Poor linuxmce crashed my raid again and did not start it again... So this time I tried disconnecting the Raid, boot linuxmce with no raid drives and deleted the raid from the webadmin. I then turned it off again and connected the Raid. After a boot and a quick reload my media appeard again :)

From earlier experiences with Raid5, I know it's not bulletproof... It will crash, but a rebuild probably saves all data, but just to be shure I allways save my pictures, documents or any other media I care about in more than one place. From experience Raid5 has been the safest place so far for me. I started using it in 1999 with 6x300GB drives. Now I use a Thecus N5200 with 4x1TB and LinuxMCE with 4x750GB (My old Thecus drives).

The LinuxMCE raid setup is great and increadibly easy... But it does lack support for recovery, check and resizing. I have succsessfully expanded my Raid5 from 3x750GB to 4x750GB in LinuxMCE with all data intact. It required commandline tools and I have posted it in another thread.

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
I have also had the same thing happen in the past (also after doing some kind of configuration to the system), but it always redetected the array. This time it didn't.

I'm now going to move to a NAS server for my RAID drives, go back to a backup system, and better use resources available to me (for example, there are plenty of helpful and experienced people in this community that I could have asked for advice instead of trying to mess with it on my own)

Anyways, while its not good, i'm glad I'm not the only person that has ever had this problem.

nite_man

  • NEEDS to work for LinuxMCE
  • ***
  • Posts: 1019
  • Want to work with LinuxMCE
    • View Profile
    • Smart Home Blog
It's really not good. I have only one 250Gb HDD on my core :). Family photos and movies are stored to my home PC and I made backup on DVDs. I'm thinking about NAS (D-Link DNS-323 looks nice for me). Probably the best solution to store everything there and just attach it to the LinuxMCE. In that case  all data will be more protected then with using software RAID.
Michael Stepanov,
My setup: http://wiki.linuxmce.org/index.php/User:Nite_man#New_setup
Russian LinuxMCE community: http://linuxmce.ru

samuelmukoti

  • Regular Poster
  • **
  • Posts: 49
    • View Profile
@johndecker

Very sorry about your loss!  Was wondering when your video footage of the setup will be available,  am keen to do a similar setup but am a bit stuck on how to setup TVs and STBs that only use IR using the UIRT.  If you could please document how u do your templates etc, that would be great! 

Best regards,

Sam

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
I recorded all day yesterday. I have a lot more to do, but I will start editing it down for the first couple of videos very soon. So far it has been fairly smooth - but there have been some problems (which I am documenting). In the end, what I don't want is another sugar-coated video making everything look easy. I want it to show a real experience with a mixture or supported and unsupported hardware.

gazzzman

  • Veteran
  • ***
  • Posts: 118
    • View Profile
hi there johndecker !
I am SO sorry to hear what happened to you!
it is horrible and there is always SOMETHING you can't replace..
as I said.. I feel really sorry for you :(

AFAIC software (and the "sort of hardware" raid solutions on motherboards and the like) are NEVER really bulletproof!
I store all my stuff on a NAS (using Freenas) at the moment
and I am looking at Openfiler..  looks like a really good NAS very stable and configurable!
it will perform scheduled backup and all manner of other tasty things :)
I would HIGHLY recommend a hardware raid solution if you can find one.. if not Openfiler supports multiple software raid configurations

it is great to see you come out of this with a positive attitude and not shouting at everyone :)
good luck with your rebuild!
Garry
 
-----BEGIN GEEK CODE BLOCK-----
Version 3.1
GCC@GE@GIT@GO dpu S-: a+ C+++ L++ E-- W+++ N+ o+++ w-- O M+ PS+++ PE-- Y++
PGP+ t++ 5 X++ R- tv b+ DI++ D---- G e++* h*++ r+++ Y++++
-----END GEEK CODE BLOCK-----

bulek

  • Administrator
  • wants to work for LinuxMCE
  • *****
  • Posts: 909
  • Living with LMCE
    • View Profile
Hi,

while feeling sorry for JonDecker, I also lost my media disk with all data (luckily, majority was backed up)....

My story goes like this: LMCE started to give warning about low disk space... I was absent for two days and when I came back disk was full and also I got flooded /tmp directory with "tmp.lksdflv123rfl" and similar entries and I tried to wipe them. What I didn't see is that those were mounting points for my regular media disk and therefore deleted all my media content....

I've written this so maybe this won't happen to anyone else.....

Regards,

Bulek...
Thanks in advance,

regards,

Bulek.