[SOLVED]: URGENT 810 Software RAID failed after power outage

Beeker · June 14, 2013, 10:50:26 PM

Hi All,
We had a power failure at home and my system with 810 and when the power came back and it booted back up now the LMCE software RAID is showing all the drives as removed I have tried su mdadm -D /dev/md1 and it comes back with mdadm: md device /dev/md1 does not appear to be active.

I have very limited cmd line knowledge so any help would be appreciated to see if there is anything I can try to recover the RAID and data as all our family photos are on it and I am currently building up a new QNAP NAS with RAID6 as I was advised to do

I have attached a photo from LMCE of the RAID

Kind regards
Beeker

Crumble · June 16, 2013, 09:22:01 PM

Read this

If the md driver detects a write error on a device in a RAID1, RAID4,
RAID5, RAID6, or RAID10 array, it immediately disables that device
(marking it as faulty) and continues operation on the remaining
devices. If there are spare drives, the driver will start recreating
on one of the spare drives the data which was on that failed drive,
either by copying a working drive in a RAID1 configuration, or by doing
calculations with the parity block on RAID4, RAID5 or RAID6, or by
finding and copying originals for RAID10.

In kernels prior to about 2.6.15, a read error would cause the same
effect as a write error. In later kernels, a read-error will instead
cause md to attempt a recovery by overwriting the bad block. i.e. it
will find the correct data from elsewhere, write it over the block that
failed, and then try to read it back again. If either the write or the
re-read fail, md will treat the error the same way that a write error
is treated, and will fail the whole device.

Since all seem to be removed read this:

http://linuxexpresso.wordpress.com/2010/03/31/repair-a-broken-ext4-superblock-in-ubuntu/

Obviously ignore the parts about parted magic, test disk. fsck is already in linux (if you didn't know that). You have not explained what we are working with here btw. RAID5 external NAS i assume?

Just run these two commands (dev/xxx being one/all of the RAID partitions of course) and report back the info. Unless you feel comfortable fixing it urself. I am not sure its a bad superblock, not a good idea to start trying to fix things without knowing what the problem is first. Just a guess. Good Luck!

sudo fdisk -l

mdadm -E /dev/xxx (on all the RAID partitions)

Crumble · June 16, 2013, 10:00:49 PM

oh and this derp. mdadm –detail /dev/mdx whatever RAID number is at the end of md

Beeker · June 17, 2013, 01:20:26 AM

Thanks very much for the info I will try it now and report back and sorry its a software RAID5 in LMCE

Kind regards
Beeker

Beeker · June 17, 2013, 01:39:32 AM

Hi Crumble

This is what I got from sudo fdisk -l and from memory sdf was always the spare drive

dcerouter_1024641:/home/bruce# sudo fdisk -l

Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39686ed2

Device Boot Start End Blocks Id System
/dev/sda1 * 1 90446 726507463+ 83 Linux
/dev/sda2 90447 91201 6064537+ 5 Extended
/dev/sda5 90447 91201 6064506 82 Linux swap / Solaris

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006a5c8

Device Boot Start End Blocks Id System
/dev/sdb1 1 182401 1465136001 83 Linux

Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb217d64b

Device Boot Start End Blocks Id System
/dev/sdc1 1 182401 1465136001 83 Linux

Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x43b7a284

Device Boot Start End Blocks Id System
/dev/sde1 1 182401 1465136001 83 Linux

Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xc8efaf55

Disk /dev/sdf doesn't contain a valid partition table

I also tried

dcerouter_1024641:/home/bruce# mdadm -E /dev/md1
mdadm: No md superblock detected on /dev/md1.

Any help on how to proceed would be greatly appreciated I wasn't sure where to go from here

Regards
Beeker

Crumble · June 17, 2013, 02:17:32 AM

what about mdadm –detail /dev/md1. It may be in the process of rebuilding which takes some time.

Beeker · June 17, 2013, 02:26:10 AM

I tried
mdadm –detail /dev/md1 and got

dcerouter_1024641:/home/bruce# mdadm -detail /dev/md1
mdadm: -d does not set the mode, and so cannot be the first option.

So than I tried mdadm -D /dev/md1 and got
dcerouter_1024641:/home/bruce# mdadm -D /dev/md1
mdadm: md device /dev/md1 does not appear to be active.

Crumble · June 17, 2013, 03:01:31 AM

run each command and post results please

cat /proc/mdstat

mdadm -E /dev/sdb1
mdadm -E /dev/sdc1
mdadm -E /dev/sde1

Beeker · June 17, 2013, 03:20:18 AM

Here you go
cat /proc/mdstat

dcerouter_1024641:/home/bruce# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdb1[0](S) sdf[4](S) sde1[3](S) sdc1[1](S)
5860546304 blocks

mdadm -E /dev/sdb1
dcerouter_1024641:/home/bruce# mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1

Update Time : Thu Jun 13 09:02:55 2013
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ee358c49 - correct
Events : 1802

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync
3 3 8 65 3 active sync /dev/sde1
4 4 8 80 4 spare /dev/sdf

mdadm -E /dev/sdc1
dcerouter_1024641:/home/bruce# mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1

Update Time : Thu Jun 13 09:02:55 2013
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ee358c5b - correct
Events : 1802

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync
3 3 8 65 3 active sync /dev/sde1
4 4 8 80 4 spare /dev/sdf

mdadm -E /dev/sde1
dcerouter_1024641:/home/bruce# mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1

Update Time : Thu Jun 13 09:02:55 2013
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ee358c7f - correct
Events : 1802

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync
3 3 8 65 3 active sync /dev/sde1
4 4 8 80 4 spare /dev/sdf

Crumble · June 17, 2013, 03:45:42 AM

Ok, looks like all the drives are clean. Your RAID is showing all the drives as spare (like in the pic) wasn't sure how accurate the gui is. This may not be a superblock problem, but lets find out for sure.

Run

mdadm --assemble --scan -v

this will let us know which have bad superblocks and if that is indeed the problem.

Crumble · June 17, 2013, 03:49:23 AM

wait a sec dont run that, wish there was an edit option. gimme a sec

Beeker · June 17, 2013, 03:50:08 AM

ok

Crumble · June 17, 2013, 04:24:22 AM

ok, i just noticed disk sdd1 did not appear in the fdisk -l report. This is odd, we need to figure out what is going on there. run mdadm -E /dev/sdd1 and post what it says.

Beeker · June 17, 2013, 04:43:17 AM

This is what I got

dcerouter_1024641:/home/bruce# mdadm -E /dev/sdd1
mdadm: cannot open /dev/sdd1: No such file or directory

Crumble · June 17, 2013, 04:53:16 AM

This is odd, if that drive is or was an active hot spare it should be partitioned and ready to write to if drive failure occurs. I would think b,c,d would be the active drives and e the hot spare with f your backup hot spare. Is this correct?

LinuxMCE Forums

News:

[SOLVED]: URGENT 810 Software RAID failed after power outage

Beeker

Crumble

Crumble

Beeker

Beeker

Crumble

Beeker

Crumble

Beeker

Crumble

Crumble

Beeker

Crumble

Beeker

Crumble