Hi All,
We had a power failure at home and my system with 810 and when the power came back and it booted back up now the LMCE software RAID is showing all the drives as removed I have tried su mdadm -D /dev/md1 and it comes back with mdadm: md device /dev/md1 does not appear to be active.
I have very limited cmd line knowledge so any help would be appreciated to see if there is anything I can try to recover the RAID and data as all our family photos are on it and I am currently building up a new QNAP NAS with RAID6 as I was advised to do
I have attached a photo from LMCE of the RAID
Kind regards
Beeker
Read this
If the md driver detects a write error on a device in a RAID1, RAID4,
RAID5, RAID6, or RAID10 array, it immediately disables that device
(marking it as faulty) and continues operation on the remaining
devices. If there are spare drives, the driver will start recreating
on one of the spare drives the data which was on that failed drive,
either by copying a working drive in a RAID1 configuration, or by doing
calculations with the parity block on RAID4, RAID5 or RAID6, or by
finding and copying originals for RAID10.
In kernels prior to about 2.6.15, a read error would cause the same
effect as a write error. In later kernels, a read-error will instead
cause md to attempt a recovery by overwriting the bad block. i.e. it
will find the correct data from elsewhere, write it over the block that
failed, and then try to read it back again. If either the write or the
re-read fail, md will treat the error the same way that a write error
is treated, and will fail the whole device.
Since all seem to be removed read this:
http://linuxexpresso.wordpress.com/2010/03/31/repair-a-broken-ext4-superblock-in-ubuntu/
Obviously ignore the parts about parted magic, test disk. fsck is already in linux (if you didn't know that). You have not explained what we are working with here btw. RAID5 external NAS i assume?
Just run these two commands (dev/xxx being one/all of the RAID partitions of course) and report back the info. Unless you feel comfortable fixing it urself. I am not sure its a bad superblock, not a good idea to start trying to fix things without knowing what the problem is first. Just a guess. Good Luck! :)
sudo fdisk -l
mdadm -E /dev/xxx (on all the RAID partitions)
oh and this derp. mdadm –detail /dev/mdx whatever RAID number is at the end of md
Thanks very much for the info I will try it now and report back and sorry its a software RAID5 in LMCE
Kind regards
Beeker
Hi Crumble
This is what I got from sudo fdisk -l and from memory sdf was always the spare drive
dcerouter_1024641:/home/bruce# sudo fdisk -l
Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39686ed2
Device Boot Start End Blocks Id System
/dev/sda1 * 1 90446 726507463+ 83 Linux
/dev/sda2 90447 91201 6064537+ 5 Extended
/dev/sda5 90447 91201 6064506 82 Linux swap / Solaris
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006a5c8
Device Boot Start End Blocks Id System
/dev/sdb1 1 182401 1465136001 83 Linux
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb217d64b
Device Boot Start End Blocks Id System
/dev/sdc1 1 182401 1465136001 83 Linux
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x43b7a284
Device Boot Start End Blocks Id System
/dev/sde1 1 182401 1465136001 83 Linux
Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xc8efaf55
Disk /dev/sdf doesn't contain a valid partition table
I also tried
dcerouter_1024641:/home/bruce# mdadm -E /dev/md1
mdadm: No md superblock detected on /dev/md1.
Any help on how to proceed would be greatly appreciated I wasn't sure where to go from here
Regards
Beeker
what about mdadm –detail /dev/md1. It may be in the process of rebuilding which takes some time.
I tried
mdadm –detail /dev/md1 and got
dcerouter_1024641:/home/bruce# mdadm -detail /dev/md1
mdadm: -d does not set the mode, and so cannot be the first option.
So than I tried mdadm -D /dev/md1 and got
dcerouter_1024641:/home/bruce# mdadm -D /dev/md1
mdadm: md device /dev/md1 does not appear to be active.
run each command and post results please
cat /proc/mdstat
mdadm -E /dev/sdb1
mdadm -E /dev/sdc1
mdadm -E /dev/sde1
Here you go
cat /proc/mdstat
dcerouter_1024641:/home/bruce# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdb1[0](S) sdf[4](S) sde1[3](S) sdc1[1](S)
5860546304 blocks
mdadm -E /dev/sdb1
dcerouter_1024641:/home/bruce# mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Thu Jun 13 09:02:55 2013
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ee358c49 - correct
Events : 1802
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync
3 3 8 65 3 active sync /dev/sde1
4 4 8 80 4 spare /dev/sdf
mdadm -E /dev/sdc1
dcerouter_1024641:/home/bruce# mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Thu Jun 13 09:02:55 2013
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ee358c5b - correct
Events : 1802
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync
3 3 8 65 3 active sync /dev/sde1
4 4 8 80 4 spare /dev/sdf
mdadm -E /dev/sde1
dcerouter_1024641:/home/bruce# mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Update Time : Thu Jun 13 09:02:55 2013
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ee358c7f - correct
Events : 1802
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync
3 3 8 65 3 active sync /dev/sde1
4 4 8 80 4 spare /dev/sdf
Ok, looks like all the drives are clean. Your RAID is showing all the drives as spare (like in the pic) wasn't sure how accurate the gui is. This may not be a superblock problem, but lets find out for sure.
Run
mdadm --assemble --scan -v
this will let us know which have bad superblocks and if that is indeed the problem.
wait a sec dont run that, wish there was an edit option. gimme a sec
ok
ok, i just noticed disk sdd1 did not appear in the fdisk -l report. This is odd, we need to figure out what is going on there. run mdadm -E /dev/sdd1 and post what it says.
This is what I got
dcerouter_1024641:/home/bruce# mdadm -E /dev/sdd1
mdadm: cannot open /dev/sdd1: No such file or directory
This is odd, if that drive is or was an active hot spare it should be partitioned and ready to write to if drive failure occurs. I would think b,c,d would be the active drives and e the hot spare with f your backup hot spare. Is this correct?
No I have 4 x HDD's with one hot spare so it should be b,c,d & e with f as a hot spare so may be either the power or the sata lead may have come out, I will check these when I get home and let you know
oops didn't see your post. so yeah that is a good start. once you got d showing again, give these two commands a go. assuming it doesnt rebuild itself on its own.
mdadm --stop /dev/md1
mdadm --assemble --force /dev/md1 /dev/sd[b,c,d,e,f]
I got home and checked all the HDD's and found with one making a slow beeping noise which I can only assume is sdd1 so a couple of questions
Should I leave it plugged in and try to rebuild the RAID with the commands you gave me and hopefully the spare drive will now form part of the RAID though I think the spare drive is just that a "spare drive" and not a hot spare so the other choice would be to unplug sdd1 and plug that into sdf and then reassemble the RAID with your commands and hopefully the data will come back
I really appreciate all you help with trying to help me fix this thanks very much
Best regards
Beeker
Once you're back up and running, we need to make a ticket so that we can improve the RAID UI in the web admin, to help deal with these issues.
-Thom
Will do happy to provide any information or help
please make a ticket @ http://svn.linuxmce.org/ ... thanks :)
-Thom
Done Thom..........excuse any mistakes as its the first ticket I have ever created so I hope its correct :)
Please let me know if I need to make any changes
Beeker
Well, I do not recommend setting up the RAID again with out a spare. Unless you back up the data. Remember RAID 5 will survive one drive failure. If more than that fails that it is it, donezo. All that data is gone forever. I recommend replacing the drive. I know it sucks, but better safe than sorry. If you do not want to heed this warning. I would leave it as is and run those commands to see if it will rebuild. I really stress being patient if you need another drive and can't get one right away though.
Just so I fully understand could I clarify a couple of your suggestions as you point out I rather not lose all the data.
As the original RAID was set up with 4 x HDD's and the 5th one Linuxmce automatically marked as spare can I buy another 1.5TB HDD and replace the failed one in the raid and hopefully it will rebuild as from my understanding that you are saying if I replace the HDD I can't just have 4 hDD's I need it to have the spare as well and if that is correct once I replace the faulty HDD then it will it rebuild or will I have to use one of your commands that you sent.
I have a backup of some of the data though as its a RAID array I assume I can't take out each HDD and put it in my QNAP and copy it
Really appreciate all your help and sorry for the dumb questions I just need to recover the family photos or life won't be worth living I certainly have learned a lesson here about backing data up
Regards
Beeker
RAID does not like changes being made. Problems can arise in linux from just trying to rebuild the array with one disk missing. If the drive is there and say linux sees a write error or read error it will mark that dirty and should start using the spare. In your case it knows that f is part of the array. If d just isn't there this could create a problem. Even though it sees f and knows its a spare. I don't know why honestly. Just my experience. Your best bet is to replace d, leave f where it is. Then run the very first command you posted to verify it is rebuilding. If it does not automagically rebuild you will need to run the two commands i posted to force it to rebuild. That would be the smartest way of doing this. Although i should mention, there is always a chance that something else could go very wrong. RAID 5 is a piece of garbage. I do not understand why it was so popular. It puts the data on the disks like a tic tac toe game making it difficult to retrieve data. There are tools that can do this though. Knoppix has one. What you do is make images of the disks. Put the images on four more disks then have the tools try and rebuild the data. In your case i don't think that will work. Usually a four drive RAID 5 the fourth disk is the parity drive. Where the three disk RAID 5 parity is usually broken up between all three drives along with the data. So, on second thought backup probably isn't an option for you. You could try, there is no harm in making images of the drives before you try and rebuild it. At least that way, you can always send the images off for recovery if your significant other is making death threats. Then just blame it on the company, they screwed up, act really mad. :P
Thanks I will purchase a new Seagate 1.5TB HDD and see what it does and let you know............its going to take a couple of days to organise a new HDD
Thanks again for your patience and assistance with this issue........just confirming I should first try su mdadm -D /dev/md1 then if that does nothing then use the two cmd's that you suggested
Yes, that is exactly right. Good Luck Beeker.
Thanks I think I am going to need it :)
how did you make out beeker?
Hi Crumble,
I finally found a used Seagate 1.5TB on eBay for the right price of $61 and its on its way so I should have it next week so I am hoping that when I plug it in it will automatically rebuild the software RAID then I can copy all the content over to my new QNAP NAS that is ready to go with RAID6 and 19TB
I will have everything crossed next week when I put the replacement HDD in, so wish me luck and I will definitely let you know how it goes as you have been a fantastic help and if it all goes according to plan than I wont end up being killed if I can recover the wedding photos and all the other data :)
Thanks again
Kind regards
Beeker
Hi Crumble,
Installed the new HDD and tried
mdadm --stop /dev/md1
mdadm --assemble --force /dev/md1 /dev/sd[b,c,d,e,f]
And got
dcerouter_1024641:/home/bruce# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
dcerouter_1024641:/home/bruce# mdadm --assemble --force /dev/md1 /dev/sd[b,c,d,e,f]
mdadm: no recogniseable superblock on /dev/sdb
mdadm: /dev/sdb has no superblock - assembly aborted
dcerouter_1024641:/home/bruce#
Any thoughts
Regards
Beeker
Also I tried the first command of mdadm -D /dev/md1
And got
dcerouter_1024641:/home/bruce# mdadm -D /dev/md1
mdadm: md device /dev/md1 does not appear to be active.
May be look up Beeker in the Funeral Notices :-[
Regards
Beeker
OK, this is good actually. We know what the problem is finally. Besides a bad HDD. I will show you where to look in a few hours. Doing some work this friday. Don't panic this is fixable.
Awesome news means I can cancel the funeral director :)
OK, first lets do this.
fdisk -l to make sure the partitions are named the same after replacing the drive. post the output
Here you go
dcerouter_1024641:/home/bruce# fdisk -l
Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39686ed2
Device Boot Start End Blocks Id System
/dev/sda1 * 1 90446 726507463+ 83 Linux
/dev/sda2 90447 91201 6064537+ 5 Extended
/dev/sda5 90447 91201 6064506 82 Linux swap / Solaris
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006a5c8
Device Boot Start End Blocks Id System
/dev/sdb1 1 182401 1465136001 83 Linux
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb217d64b
Device Boot Start End Blocks Id System
/dev/sdc1 1 182401 1465136001 83 Linux
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x43b7a284
Device Boot Start End Blocks Id System
/dev/sdd1 1 182401 1465136001 83 Linux
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
240 heads, 63 sectors/track, 193801 cylinders
Units = cylinders of 15120 * 512 = 7741440 bytes
Disk identifier: 0x2848762e
Device Boot Start End Blocks Id System
/dev/sde1 * 14 193802 1465033728 7 HPFS/NTFS
haha ok geez something has changed. im trying to figure out what happened.
Thanks I will hold off contacting the funeral director :)
FYI
The 4 x HDD's in the RAID are still showing as removed
ah ok so we have d now but no f LOL. maybe jiggled a wire loose? and it looks like e is formatted for HPFS/NTFS. UH, HPFS... o.0 what in the wild world of sports? I have not seen that before! That has got to be formatted first and we need /dev/sdf back. I hope i just never noticed that linux reports NTFS as HPFS/NTFS. HPFS has been dead since NT 4.0 came out. Read quote from microsoft LOL!
Disadvantages of HPFS
Because of the overhead involved in HPFS, it is not a very efficient choice for a volume of under approximately 200 MB. In addition, with volumes larger than about 400 MB, there will be some performance degradation. You cannot set security on HPFS under Windows NT.
HPFS is only supported under Windows NT versions 3.1, 3.5, and 3.51. Windows NT 4.0 cannot access HPFS partitions.
and thats ok if the drives are still showing as removed. we have a bad superblock to fix, I am assuming we will find some more but we will get to that in a bit. gotta get those two things sorted first.
I have removed sdd1 and just formatting with ext3 from my Win 7 PC using mini partition wizard than I will put it back in and run fdisk -l again and will post the results
NO DON"T DO THAT
E Beeker E
Sorry oops to late as that was the replacement HDD that was showing as no valid partition so I figured it needed to be formatted as ext3
Have I stuffed things ?
oh, maybe it was e. can you run fdisk -l again and post
Here you go after formatting sdd1 with ext3
dcerouter_1024641:/home/bruce# fdisk -l
Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39686ed2
Device Boot Start End Blocks Id System
/dev/sda1 * 1 90446 726507463+ 83 Linux
/dev/sda2 90447 91201 6064537+ 5 Extended
/dev/sda5 90447 91201 6064506 82 Linux swap / Solaris
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006a5c8
Device Boot Start End Blocks Id System
/dev/sdb1 1 182401 1465136001 83 Linux
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb217d64b
Device Boot Start End Blocks Id System
/dev/sdc1 1 182401 1465136001 83 Linux
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xc8efaf55
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x43b7a284
Device Boot Start End Blocks Id System
/dev/sde1 1 182401 1465136001 83 Linux
Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x2848762e
Device Boot Start End Blocks Id System
/dev/sdf1 1 182401 1465134080 83 Linux
dcerouter_1024641:/home/bruce#
if it was the replacement drive it probably was e.
woohoo! ok safe, didn't mean to scare you Beeker would be a shame to get this far and then format your data off LOL.
Phew sdd1 is definitely the replacement HDD that is currently showing no valid partition
ok, looks like you got f working again. this is probably why the drive assignment changed and made me pee myself a little when you formatted d. good stuff. ok lets work on the superblocks. gimme a couple to get the right strategy going. working at the moment too.
Thanks much appreciated :)
Hey Beeker, sorry I haven't replied in a while. I have been very busy at work and had no internet at the home. If you still need to fix that RAID follow these instructions on this link for /dev/sdb.
first run this command just to be safe
mdadm --stop /dev/md1
then follow this
http://linuxexpresso.wordpress.com/2010/03/31/repair-a-broken-ext4-superblock-in-ubuntu/
your using ext3 i think?
Once you have replaced the bad superblock with a backup run
mdadm --assemble --force /dev/md1 /dev/sd[b,c,d,e,f]
let me know if you have any problems
if you do get the RAID up and running again you will need to run fsck
but NOT until it is rebuilt and ready to use/online
On second thought don't run fsck. Backup your data once you get it running and then run fsck if you want to use that RAID. Sometimes fsck will throw stuff in the lost and found folder. Better to keep your organization the way it is for photos i am assuming.
Thanks Crumble,
Just in the car stuck in traffic and will be home
in about 40mins so I will let you know how it
goes and thanks again:)
Hey Crumble,
Reading the wordpress documentation it says to run
sudo fsck.ext3 -v /dev/sdd
Is it ok to run this cmd ?
Regards
Beeker
you want sudo fsck.ext3 -v /dev/sdb
do everything to /dev/sdb
If i am correct that is where the bad superblock was
make sure to run mdadm --stop /dev/md1 before anything
I have stopped it already and I am sure it was sdd as that was the HDD I replaced so I assume it wont hurt if it the wrong HDD and if it comes back ok then I will check sdb
sdd should be blank from the formatting
dcerouter_1024641:/home/bruce# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
dcerouter_1024641:/home/bruce# mdadm --assemble --force /dev/md1 /dev/sd[b,c,d,e,f]
mdadm: no recogniseable superblock on /dev/sdb
mdadm: /dev/sdb has no superblock - assembly aborted
dcerouter_1024641:/home/bruce#
I started checking sdd and currently it says this
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sdd
e2fsck 1.41.3 (12-Oct-2008)
/dev/sdd has gone 1270 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Should I cancel it and check sdb
no it wont hurt anything, just shouldn't have any valid superblocks on it as it was never part of the RAID. should be blank.
Thanks so do you think sdb will have bad superblock even though sdd was the faulty HDD
This is what I got below from sdd so now I am going to check sdb
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sdd
e2fsck 1.41.3 (12-Oct-2008)
/dev/sdd has gone 1270 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
11 inodes used (0.00%)
0 non-contiguous inodes (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
5798288 blocks used (1.58%)
0 bad blocks
1 large file
0 regular files
2 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
2 files
dcerouter_1024641:/home/bruce#
This is what I get for sdb, sdc,sde
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sdb
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdb
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
ok, you will have to do one disk at a time. start with sdb.
try to reassemble after you fix the superblock on sdb. you may not have to do all the disks. if that does not work then fix the superblocks on c and e and then try the reassemble command. it will work eventually.
Sorry to be a pain in the you know where..........I have tried a number of the superblocks and I am getting the same error as per below and I tried e3fsck and that didn't work either as it came back as command not found
dcerouter_1024641:/home/bruce# e2fsck -b 32768 /dev/sdb
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Bad magic number in super-block while trying to open /dev/sdb
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
did you try them all?
no did you want me to try and restore a super block on sdc & sde as well
no start with the sdb drive. after you restore one of the superblocks try the assemble command and see if that works.
tried all the superblocks for sdb and none of them want to restore
post the output of this again
sudo mdadm --assemble --force /dev/md1 /dev/sd[bcdef]1
i was looking through the thread again and there was no 1 on the end of that assemble command. that could cause a problem.
haha i think that was the problem with the assemble and superblocks commands. you always have to use the full drive name, that includes the number.
here you go
dcerouter_1024641:/home/bruce# mdadm --assemble --force /dev/md1 /dev/sd[bcdef]1
mdadm: no RAID superblock on /dev/sdf1
mdadm: /dev/sdf1 has no superblock - assembly aborted
tried putting a 1 and testing sdb,c,d & e again and below is what I got
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sdb1
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext3: Group descriptors look bad... trying backup blocks...
Superblock has an invalid ext3 journal (inode 8).
Clear<y>? no
fsck.ext3: Illegal inode number while checking ext3 journal for /dev/sdb1
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sdc1
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdc1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sdd1
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext3: No such file or directory while trying to open /dev/sdd1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
dcerouter_1024641:/home/bruce# fsck.ext3 -v /dev/sde1
e2fsck 1.41.3 (12-Oct-2008)
/dev/sde1 has unsupported feature(s): FEATURE_I26 FEATURE_R26
e2fsck: Get a newer version of e2fsck!
dcerouter_1024641:/home/bruce#
Ok try
mdadm --assemble --force /dev/md1 /dev/sd[bcde]1
got this
dcerouter_1024641:/home/bruce# mdadm --assemble --force /dev/md1 /dev/sd[bcde]1
mdadm: /dev/md1 has been started with 3 drives (out of 4).
dcerouter_1024641:/home/bruce#
LMCE now reports the RAID status as damaged and when I check the status of each HDD
sdb is ok
sbc is ok
sdd is still saying removed
sde is ok
sdf is still saying removed
There you go. You only need the three. But once its done get that data off. One drive failure and its kablooe data. You can check the rebuild with
cat /proc/mdstat
Thanks Crumble you are awesome and I really
appreciate your patience and help
Kind regards
Beeker
:)
No problem Beeker. You owe me the life debt now :-p
you bet I do...........and there was going to be a special mention to you on my headstone if all that hard work didn't pay off :)
I just browsed to the RAID array and all the data is there so I will be madly copying all the data off tonight
Hey Crumble,
Do you know if it will repair itself eventually or will it stay in damaged mode?
Don't want to push my luck though I cant get any help in getting LMCE to regonise the QNAP NAS if you see my other recent post I have tried everything I know which isn't much :)
It will stay in damaged mode until sdd1 is added. I will check the other thread never used a QNAS. Once you get that data off we can sdd1 back in if you like
Thanks............its a circus of HDD lights flashing in my office copying all that data off
Hi Crumble,
Sorry been traveling and I finally got all the data of the RAID array and I want to try and add in the sdd1 so what should we do
Cheers
Beeker
Hey Beeker, give me a fdisk -l and a df -h
No problems I will get it shortly and come back to
Cheers
Beeker :)
Hi Crumble,
Very sorry been out of action with work and an arm injury...........below is fdisk -l and the df -h
Cheers
Beeker
--------------------------------------------------------------------------------------------------------------
dcerouter_1024641:/home/# fdisk -l
Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39686ed2
Device Boot Start End Blocks Id System
/dev/sda1 * 1 90446 726507463+ 83 Linux
/dev/sda2 90447 91201 6064537+ 5 Extended
/dev/sda5 90447 91201 6064506 82 Linux swap / Solaris
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006a5c8
Device Boot Start End Blocks Id System
/dev/sdb1 1 182401 1465136001 83 Linux
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb217d64b
Device Boot Start End Blocks Id System
/dev/sdc1 1 182401 1465136001 83 Linux
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x43b7a284
Device Boot Start End Blocks Id System
/dev/sdd1 1 182401 1465136001 83 Linux
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x2848762e
Device Boot Start End Blocks Id System
/dev/sde1 1 182401 1465134080 83 Linux
dcerouter_1024641:/home/#
---------------------------------------------------------------------------------------------------
dcerouter_1024641:/home/# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 682G 44G 604G 7% /
udev 1012M 2.9M 1009M 1% /dev
/dev/disk/by-uuid/50ef0102-13de-4f96-8c0f-23a2e75ee41f
682G 44G 604G 7% /
/dev/disk/by-uuid/50ef0102-13de-4f96-8c0f-23a2e75ee41f
682G 44G 604G 7% /dev/.static/dev
tmpfs 1012M 0 1012M 0% /lib/init/rw
varrun 1012M 392K 1012M 1% /var/run
varlock 1012M 0 1012M 0% /var/lock
tmpfs 1012M 2.2M 1010M 1% /lib/modules/2.6.27-17-generic/volatile
tmpfs 1012M 0 1012M 0% /dev/shm
/dev/md1 4.1T 3.2T 703G 83% /mnt/device/31
/dev/md1 4.1T 3.2T 703G 83% /tmp/tmp.RZtaM22685
dcerouter_1024641:/home/#
ok now i need
mdadm -D /dev/md1
Will grab it now
Hi Crumble,
Here you go
dcerouter_1024641:/home/# mdadm -D /dev/md1
/dev/md1:
Version : 00.90
Creation Time : Sun Dec 27 10:14:43 2009
Raid Level : raid5
Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sun Aug 25 19:10:32 2013
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 5747d0ac:31c15bff:bd9f1658:0a1d2015 (local to host dcerouter)
Events : 0.625036
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 0 0 2 removed
3 8 49 3 active sync /dev/sdd1
mdadm --manage /dev/md1 --add /dev/sde1
then watch the rebuild with
watch cat /proc/mdstat
let me know when it is done. we may need to change the mdadm.conf
doubtful though, just dont reboot till we check it.
Tried that and got this message
dcerouter_1024641:/home/# mdadm --manage /dev/md1 --add /dev/sde1
mdadm: /dev/sde1 not large enough to join array
mdadm --add /dev/md1 /dev/sde1
try this, if it says it added it watch with
watch cat /proc/mdstat
you should see something like this
md1 : active raid5 sdb2[4] sdd2[3] sdc2[2] sda2[0]
1464765696 blocks level 5, 256k chunk, algorithm 2 [4/3] [U_UU]
[>....................] recovery = 0.0% (84068/488255232) finish=193.4min speed=42034K/sec
Tried that and still got the same thing as per below
dcerouter_1024641:/home/# mdadm --add /dev/md1 /dev/sde1
mdadm: /dev/sde1 not large enough to join array
dcerouter_1024641:/home/#
Checked this as well just to give you some more info
dcerouter_1024641:/home/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb1[0] sdd1[3] sdc1[1]
4395407808 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
unused devices: <none>
Derp. I forogt, the superblocks on that drive are bad so the partition needs to be rebuilt. Since its only part of one raid just format sde1 ext3. Then run the add command. It will rebuild itself and mdmadm.conf looks good.
Can I format it with out taking it out and putting it into another pc
yeah do this
mdadm --fail /dev/md1 /dev/sde1
mdadm: set /dev/sde1 faulty in /dev/md1
then format ext3
then run the add command
Sorry Crumble no luck with that command see below
dcerouter_1024641:/home/# mdadm --fail /dev/md1 /dev/sde1
mdadm: set device faulty failed for /dev/sde1: No such device
dcerouter_1024641:/home/# mdadm: set /dev/sde1 faulty in /dev/md1
bash: mdadm:: command not found
Cheers
Beeker
Its already marked it faulty then. Just format ext3 then run add command. There are four drives in the config so it should grow/rebuild itself.
Just confirming
mkfs.ext3 /dev/sde1
That is the one :-)
Hey beeker, quick question. Since you got all that data off and have four disks, why not go with a raid 10? You are using that qnas for your media now right?
Not a bad idea, I assume no other data will magically appear after rebuilding the RAID5 and spot on moved all the data to my QNAP with 16TB of storage in RAID6
Can I rebuild the existing RAID5 to RAID10 once I get that extra HDD back into the original RAID array
Hey Crumble,
Tried formatting sde1 and got the following then tried to add and still get that same error and I also have included a fdisk -l report just as an fyi
dcerouter_1024641:/home/# mkfs.ext3 /dev/sde1
mke2fs 1.41.3 (12-Oct-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
91578368 inodes, 366283520 blocks
18314176 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
11179 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
dcerouter_1024641:/home/#
dcerouter_1024641:/home/# mdadm --add /dev/md1 /dev/sde1
mdadm: /dev/sde1 not large enough to join array
dcerouter_1024641:/home/# fdisk -l
Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39686ed2
Device Boot Start End Blocks Id System
/dev/sda1 * 1 90446 726507463+ 83 Linux
/dev/sda2 90447 91201 6064537+ 5 Extended
/dev/sda5 90447 91201 6064506 82 Linux swap / Solaris
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006a5c8
Device Boot Start End Blocks Id System
/dev/sdb1 1 182401 1465136001 83 Linux
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb217d64b
Device Boot Start End Blocks Id System
/dev/sdc1 1 182401 1465136001 83 Linux
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x43b7a284
Device Boot Start End Blocks Id System
/dev/sdd1 1 182401 1465136001 83 Linux
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x2848762e
Device Boot Start End Blocks Id System
/dev/sde1 1 182401 1465134080 83 Linux
Your disks appear to be identical sizes, but the number of blocks available on sde are less than the others, which is the cause of the error. More than likely, your older disks were partitioned with a starting sector of 63, which was the old standard. The user-land disk partitioning tools had their defaults changed to accommodate the newer disks, where you want to partition align on a different boundary (usually 2048). change your units to sectors, and see what the starting sector is for your sde vs sdd. You'll likely have to delete the partition again, and re-create while forcing the starting sector to 63 to match your existing disks.
Hope that helps!
/Mike
mkbrown is right the block size is off by 4000 KB. That was sdf, which was never part of the raid. I checked the thread again and that has been the case all along for that drive. Looks like we need to delete the partition with fdisk not just format it. Follow these instructions to just delete the partition.
Do not create a new partition just save and exit once deleted.
http://www.howtogeek.com/106873/how-to-use-fdisk-to-manage-partitions-on-linux/
stop the raid mdadm --stop /dev/md1
then run this command, this is the easy way.
sfdisk -d /dev/sdd | sfdisk /dev/sde
then format it using mkfs
then add
Thanks guys will give it a go and let you know how it goes
Kind regards
Beeker
Quote from: Crumble on August 26, 2013, 03:54:01 PM
then run this command, this is the easy way.
sfdisk -d /dev/sdd | sfdisk /dev/sde
then format it using mkfs
then add
Actually, you don't need to format the individual members of the raid array. They contain the striped blocks of the md device, not an actual file system. It's the /dev/mdX that actually gets a file system, which in this case already exists so DON'T do a mkfs on /dev/md1.
Just copy the partition table as per the sfdisk command above, and then add /dev/sde1 into the array.
Things should work a lot better this time around.
HTH!
/Mike
OK thanks Mike :)
All fixed thanks to Crumble for a super human effort and avoid an early funeral after recovering the wedding photos :)
Thanks to Mike as well
Kind regards
Beeker
Beeker,
Glad to hear you recovered your treasured memories! RAID is meant for availability in the case of a hardware failure in one of the "spinning platters of rust", to quote a colleague. It won't ever replace a good backup, which is something you may wish to do in the near future...
Take care!
/Mike
Thanks Mike.............I do have a back up in place now to avoid sudden death in the future :)