Recovering a RAID5 mdadm array with two failed devices

Got into an interesting situation with my parents home server today (Ubuntu 10.04). Hardware wise it’s not the best setup – two of the drives are in an external enclose connected with eSATA cables. I did encourage Dad to buy a proper enclosure, but was unsuccessful. This is a demonstration of why eSATA is a very bad idea for RAID devices.

What happened was that one of the cables had been bumped, disconnecting one of the drives. Thus the array was running in a degraded state for over a month – not good. Anyway I noticed this when logging in one day to fix something else. The device wasn’t visible so I told Dad to check the cable, but unfortunately when he went to secure the cable, he must have somehow disconnected the another one. This caused a second drive to fail so the array immediately stopped.

Despite having no hardware failure, the situation is similar to someone replacing the wrong drive in a raid array. Recovering it was an interesting experience, so here I’ve documented the process.

Gathering information

The information you’ll need should be contained in the superblocks of the raid devices. First you need to find out which drive failed first, with the mdadm –examine command. My example was a raid5 array of 4 devices, sdb1, sdc1, sdd1 and sde1:

root@server:~# mdadm --examine /dev/sdb1
mdadm: metadata format 01.02 unknown, ignored.
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 87fa9a4d:d26c14f1:01f9e43d:ac30fbff (local to host server)
Creation Time : Mon Oct 11 00:13:02 2010
Raid Level : raid5
Used Dev Size : 625128960 (596.17 GiB 640.13 GB)
Array Size : 1875386880 (1788.51 GiB 1920.40 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0

Update Time : Mon Mar 21 00:03:26 2011
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Checksum : 713f331d - correct
Events : 3910

Layout : left-symmetric
Chunk Size : 512K

Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1

0 0 8 17 0 active sync /dev/sdb1
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 0 0 3 faulty removed

Look at the last part. Here we can see that this drive is in sync with /dev/sdd1 but out of sync with the other two (sdc1 and sde1) – the data indicates that sdc1 and sde1 have failed. These drives are the two in the external enclosure… but I digress.

Performing an examine on sdc1 shows “active sync” for all the other drives, clearly this disk has no idea what’s going on. Also note the update time of February 5 (it is now March!!):

root@server:~# mdadm --examine /dev/sdc1
[...]
Update Time : Sat Feb 5 11:22:29 2011
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 7105b39b - correct
Events : 218

Layout : left-symmetric
Chunk Size : 512K

Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1

This indicates that it was the first drive to be disconnected, as the drives were all in sync the last time this drive was part of the array. That leaves sde1:

root@server:~# mdadm --examine /dev/sde1
[...]
Update Time : Sun Mar 20 23:53:07 2011
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 713f30d1 - correct
Events : 3904

Layout : left-symmetric
Chunk Size : 512K

Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1

0 0 8 17 0 active sync /dev/sdb1
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1

When this drive was last part of the array, sdc1 was faulty but the other two were fine. This indicates that it was the second drive to be disconnected.

Scary stuff

Despite being marked as faulty, we have to assume that the data on /dev/sde1 is crash-consistent with sdb1 and sdd1 as the array immediately stopped upon failure. The original array won’t start because it only has two active devices. But we can create a new array with 3/4 of the drives as members and one missing.

This sounds scary and it should. If you have critical data that you’re trying to recover from this situation I would honestly be buying a whole new set of drives, cloning the data across to them and working from those. Having said that, the likelihood of permanently erasing the data is low if you’re careful and don’t trigger a rebuild with an incorrectly configured array (like I almost did).

Important information to note is the configuration of the array, in particular device order, layout and chunk size. If you’re using defaults (in hindsight probably a good idea to lessen the chance of something going wrong in situations ilke this), you don’t need to specify them. However you’ll note that in my example the chunk size in 512K, which differs from the default of 64K.

Creating a new array with old data

Here is the command I used to recreate the array:

root@server:~# mdadm --verbose --create /dev/md1 --chunk=512 --level=5 --raid-devices=4 /dev/sdb1 /dev/sdd1 /dev/sde1 missing
mdadm: metadata format 01.02 unknown, ignored.
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Mon Oct 11 00:13:02 2010
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Mon Oct 11 00:13:02 2010
mdadm: /dev/sde1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Mon Oct 11 00:13:02 2010
mdadm: size set to 625128960K
Continue creating array? y
mdadm: array /dev/md1 started.

Oops.

Can you see what I did there…. I created the array with the missing drive at the [3], when in actual fact the missing drive is [1] (the device numbering starts at 0). Thus when I tried to mount:
root@server:/# mount -r /dev/md1p1 /mnt -t ext4
mount: wrong fs type, bad option, bad superblock on /dev/md1p1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

!!

Upon realising this I looked at mdstat then stopped the array:

root@server:/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde1[2] sdd1[1] sdb1[0]
1875386880 blocks level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

unused devices:
root@server:/# mdadm -D /dev/md1
mdadm: metadata format 01.02 unknown, ignored.
/dev/md1:
Version : 00.90
Creation Time : Mon Mar 21 02:00:54 2011
Raid Level : raid5
Array Size : 1875386880 (1788.51 GiB 1920.40 GB)
Used Dev Size : 625128960 (596.17 GiB 640.13 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Mar 21 02:00:54 2011
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

UUID : e469103f:2ddf45e9:01f9e43d:ac30fbff (local to host server)
Events : 0.1

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
3 0 0 3 removed
root@server:/# mdadm --stop /dev/md1

I then recreated the array with the missing drive in the correct position:

root@server:/# mdadm --verbose --create /dev/md1 --chunk=512 --level=5 --raid-devices=4 /dev/sdb1 missing /dev/sdd1 /dev/sde1
mdadm: metadata format 01.02 unknown, ignored.
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Mon Mar 21 02:00:54 2011
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Mon Mar 21 02:00:54 2011
mdadm: /dev/sde1 appears to be part of a raid array:
level=raid5 devices=4 ctime=Mon Mar 21 02:00:54 2011
mdadm: size set to 625128960K
Continue creating array? y
mdadm: array /dev/md1 started.

And examined the situation:

root@server:/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde1[3] sdd1[2] sdb1[0]
1875386880 blocks level 5, 512k chunk, algorithm 2 [4/3] [U_UU]

unused devices:
root@server:/# fdisk /dev/md1
GNU Fdisk 1.2.4
Copyright (C) 1998 - 2006 Free Software Foundation, Inc.
This program is free software, covered by the GNU General Public License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Using /dev/md1
Command (m for help): p

Disk /dev/md1: 1920 GB, 1920389022720 bytes
255 heads, 63 sectors/track, 233474 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/md1p1 1 233475 1875387906 83 Linux
Warning: Partition 1 does not end on cylinder boundary.
Command (m for help): q
root@server:/# mount -r /dev/md1p1 /mnt
root@server:/# ls /mnt
Alex Garth Hamish Jenny lost+found Public Simon
root@server:/# umount /mnt

Phew!

So despite creating a bad array I was still able to stop it and create a new array with the correct configuration. I don’t believe there is any corruption as no writes occurred, and the array didn’t rebuild.

Adding the first-disconnected drive back in

The array is of course still in a degraded state at this point and no more secure than RAID0. We still need to add the disk that was disconnected first back in to the array. Compared to the rest of the saga this is straightforward:

root@server:/# mdadm -a /dev/md1 /dev/sdc1
mdadm: metadata format 01.02 unknown, ignored.
mdadm: added /dev/sdc1
root@server:/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdc1[4] sde1[3] sdd1[2] sdb1[0]
1875386880 blocks level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
[>....................] recovery = 0.0% (442368/625128960) finish=164.7min speed=63196K/sec

unused devices:

Here we can see a happily rebuilding RAID5 array. Note that you will need to update /etc/mdadm/mdadm.conf file with the new uuid, the line can be simply generated with:

root@server:/# mdadm --detail --scan
mdadm: metadata format 01.02 unknown, ignored.
ARRAY /dev/md1 level=raid5 num-devices=4 metadata=00.90 spares=1 UUID=7271bab9:23a4b554:01f9e43d:ac30fbff

You can keep an eye on the rebuild with ‘watch cat /proc/mdstat’.

Advertisement

2 Responses to Recovering a RAID5 mdadm array with two failed devices

  1. Steven F says:

    I’d broaden a bit and say eSATA is a risky choice for any permanent use – RAID or not.

    As someone who used much of your prior ubuntu server post as reference, I decided to go with RAID6 instead. Even though I’m only running 4 drives at the moment and RAID6 causes me to sacrifice 2 of 4, the redundancy of RAID5 is not sufficient for me. Given most RAIDs are built with drives of the same model, similar age, often the same Lot #, and experience nearly identical usage, multiple simultaneous failures are not that farfetched. I believe the odds of two drives dying at the exact same moment are low, but a full rebuild will stress-test the remaining drives at a time that I can least afford to have a second drive go.

    I do like eSATA for performing back-ups. I’d be interested in a solution that can back up the entire RAID. Is it reasonable to run a tape drive at home?

    • Alex says:

      Totally agree re eSATA.

      For me however the security of the daily backup offsets the risk of multiple drives failing, so while one failing might indicate that an additional failure from the same batch is more likely, at most you lose a day’s worth of data.

      IMHO, the only reasons to go with tape are portability and durability of the media. You can get more storage on a hard drive these days for much lower cost, and the speed and flexibility of the backups is incomparable (can’t rsync to a tape…). If you need to keep your backups for a long time and your data set isn’t too large, tapes can make sense, but for someone who just wants to ensure their data is safe a couple of 2TB (or 3TB) hard drives on rotation with the aforementioned RAID array is hard to beat.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 43 other followers