Replace a failed drive in a software RAID on linux

Software RAID is great, especially for servers. While you can use RAID for increased speed, most people use RAID mirroring so that all data is written to two identical disks.

If one disk fails, your server stays up because the 2nd drive still works. But what happens when a drive fails?

The replacement process obviously requires your hosting company to the replace the drive.

But both before and after the physical drive replacement, you’ve got some work to do!

Note: All the following commands should be run as root, so just pretend there’s ‘sudo’ in front of all of them!

Check if a RAID drive is failing

This part’s easy: just SSH into the server, and run:

# cat /proc/mdstat

You’ll see something like this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 nvme0n1p2[2] nvme1n1p2[1]
      32488448 blocks super 1.2 [2/2] [_U]

md2 : active raid1 nvme0n1p4[2] nvme1n1p4[1]
      966253888 blocks super 1.2 [2/2] [_U]
      bitmap: 7/8 pages [28KB], 65536KB chunk

md1 : active raid1 nvme0n1p3[2] nvme1n1p3[1]
      1046528 blocks super 1.2 [2/2] [_U]

unused devices:

This shows you which drive has failed, and also which drives are part of the RAID array. In our case, /dev/nvme0n1 has failed, but /dev/nvme1n1 is okay. If both drives are good, you’d see [UU] instead of [_U]. The underscore indicates a boo-boo!

Note also which partitions are involved in each RAID group:

md0 : active raid1 nvme0n1p2[2] nvme1n1p2[1]

So now we know that, for example, md0 is /dev/nvme0n1p2 and /dev/nvme1n1p2. Since we also see [_U], we know that the first partitions listed have failed – IOW, those on disk /dev/nvme0n1. You’ll need this info below!

You may also see the older SATA device names, like:

/dev/sda
/dev/sdb
/dev/sda1
/dev/sdb2

Where /dev/sda and /dev/sdb are the drives, /dev/sda1 and /dev/sdb1 are partitions on those drives.

You can see more drive/partition info by doing:

# fdisk -l

In any case, from /proc/mdstat above, we now know that the RAID partitions are as follows:

md0 : nvme0n1p2 + nvme1n1p2
md1 : nvme0n1p3 + nvme1n1p3
md2 : nvme0n1p4 + nvme1n1p4

You can also run:

# mdadm --detail /dev/md0
# mdadm --detail /dev/md1
# mdadm --detail /dev/md2

That will also show you (at the bottom) which partitions are used in md0, md1, etc.

Since this is software RAID 1, the drives are mirrored, and we can just remove the failed drive now.

Remove the failed drive from the RAID

To remove the drive in preparation for its physical replacement, we need to actually tell linux to stop using it.

In our example, we’ll run:

# mdadm /dev/md0 -r /dev/nvme0n1p2
# mdadm /dev/md1 -r /dev/nvme0n1p3
# mdadm /dev/md2 -r /dev/nvme0n1p4

This tells linux to stop including partitions 2, 3, and 4 on drive /dev/nvme0n1 – our failed disk.

Now, this command may not work for certain partitions. If the entire disk hasn’t failed, but only a few partitions, then you’ll need to mark the good partitions (U instead of _) as failed in order to remove them:

# mdadm --manage /dev/md1 --fail /dev/nvme0n1p3

Ta-DA! Now you’re ready for…

Replace the failed physical drive

One last step: You need the drive serial number so you (or your hosting company) can replace the correct drive!

You can either do:

# apt install smartmontools
# smartctl -a /dev/nvme0n1 | grep Serial

Alternatively, for SATA drives you can do:

# hdparm -i /dev/sda

Armed with the drive’s serial number, and if the server is right in front of you, you can now replace the failed physical disk yourself.

If you’re using a remote server, you’ll have to submit a support request and explain the situation.

Be advised that some hosting companies will also want you to provide your blood type, your grandmother’s shoe size, and various other bits of information… so check that out first and do not power down your server yourself!

The new drive is installed: Time to reconfigure RAID!

Okay, now comes the really fun part. If you screw this up, you’ll cry. No pressure… 🙂

First, back up the existing partition table. Most likely, you have a GPT partition table. Older machines may use MBR.

The idea here is just to take a backup of the partition table of the GOOD disk so that if anything goes wrong, you can restore it.

For GPT (try this first):

# sgdisk --backup=nvme1n1_parttable_gpt.bak /dev/nvme1n1

For MBR, you would use:

# sfdisk --dump /dev/nvme1n1 > nvme1n1_parttable_mbr.bak

All we need to do is copy the partition table of the GOOD disk to the new, empty disk:

# sgdisk --backup=nvme1n1_parttable_gpt.bak /dev/nvme1n1
# sgdisk --load-backup=nvme1n1_parttable_gpt.bak /dev/nvme0n1

In our example, the first drive failed (nvme0n1) so we copy the partition from nvme1n1 (2nd drive) to the new first drive.

Next, you need to assign a new random UUID to the new disk:

# sgdisk -G /dev/nvme0n1

For MBR, instead you’d do:

# sfdisk -d /dev/nvme1n1 | sfdisk /dev/nvme0n1

Now that the new drive has a copy of the partition table of the good drive, we need to integrate those partitions into the mirrored RAID array, like so:

# mdadm /dev/md0 -a /dev/nvme0n1p2
# mdadm /dev/md1 -a /dev/nvme0n1p3
# mdadm /dev/md2 -a /dev/nvme0n1p4

VOILA! RAID should now automatically start copying data from the good drive over to the new drive. You can see the status using the following command:

# cat /proc/mdstat

DO NOT REBOOT YET! If you do, you’ll be in big trouble.

Installing the bootloader on the new drive

If you reboot the machine now, nothing will happen. First, we need to make sure GRUB knows what’s going on.

Since the serial number of the new disk has changed, first we do:

# grub-mkdevicemap -n

Then, make sure the bootloader is on the new drive by doing:

# grub-install /dev/nvme0n1

You SHOULD be good to go now – including being able to reboot!

Oh crap, I rebooted and my server is down

Congratulations, you did exactly what I did recently! Don’t worry, it’s fixable.

You’ll need a ‘rescue disk/system’ that you can boot your server from in order to repair things. Most hosting companies offer such a thing. If it’s a local server, you can use a live CD or whatever floats your boat.

Once that’s fired up and you are in your ‘rescue’ system, you need to use a chrooted environment. What this means is that since you’re technically running the OS of your rescue system, you need to mount your broken linux and then execute commands as if you were actually booted into that normal, broken linux. We do this with chroot!

WARNING: Now, this is gonna get a little hairy, and you have to pay close attention to what you’re doing. Depending on your flavor of linux, some of the following commands may need to be different. But this should get you close enough that if it doesn’t work, you can figure the rest out on your own!!

Continuing with our example server above, we run: # lsblk:

NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme0n1     259:0    0 953.9G  0 disk
├─nvme0n1p1 259:1    0   256M  0 part  /boot/efi
├─nvme0n1p2 259:2    0    31G  0 part
│ └─md0       9:0    0    31G  0 raid1 [SWAP]
├─nvme0n1p3 259:3    0     1G  0 part
│ └─md1       9:1    0  1022M  0 raid1 /boot
└─nvme0n1p4 259:4    0 921.6G  0 part
  └─md2       9:2    0 921.5G  0 raid1 /
nvme1n1     259:5    0 953.9G  0 disk
├─nvme1n1p1 259:6    0   256M  0 part
├─nvme1n1p2 259:7    0    31G  0 part
│ └─md0       9:0    0    31G  0 raid1 [SWAP]
├─nvme1n1p3 259:8    0     1G  0 part
│ └─md1       9:1    0  1022M  0 raid1 /boot
└─nvme1n1p4 259:9    0 921.6G  0 part
  └─md2       9:2    0 921.5G  0 raid1 /

Okay, now here’s the tricky part: You may NOT see /boot. You may not even see /boot/efi.

What you should notice is that, in this case, /dev/md2 is my main linux install – IOW, it’s “/”. That’s where linux lives.

Next, notice the following:

nvme0n1p1 and nvme1n1p1 are NOT part of RAID, and they’re 256MB in size (could also be 512MB)
md0 is 32GB and may (or may not) be labeled SWAP, so we’re pretty sure that’s our swap partition
md2 is mounted at /, and is 921GB, so this is our linux install
md1 is 1G in size, so we can safely conclude this is our /boot

So, we need to mount:

/dev/md2 at /mnt (our linux install we’re saving)
/dev/md1 at /boot
/dev/nvme1n1p1 at /mnt/boot/efi

This will let us do chroot and use our broken linux as if we were booted into it – and then we can fix stuff. We also need to bind certain directories, but I won’t go into painful detail here. Without further ado:

# mount /dev/md2 /mnt
# mount /dev/md1 /boot
# mount --bind /dev /mnt/dev
# mount --bind /proc /mnt/proc
# mount --bind /sys /mnt/sys
# mount --bind /run /mnt/run
# mount /dev/nvme1n1p1 /mnt/boot/efi
# chroot-prepare /mnt
# chroot /mnt
# mount -t efivarfs none /sys/firmware/efi/efivars

PHEW! That was crazy. The last command is very important. If you don’t do that one, you’ll have problems with error messages about “No EFI” or “EFI variables not found” and that kind of thing. This is assuming, of course, that you’re working on a modern server that uses EFI – which is most of them these days.

Next, we need to copy the good EFI partition (256MB on nvme1n1p1) to the not-working EFI partition on the new disk (nvme0n1p1):

# dd if=/dev/nvme1n1p1 of=/dev/nvme0n1p1 bs=4096

Now run:

# blkid

If you see that /dev/nvme0n1p1 and /dev/nvme1n1p1 have the same UUID AND the same PARTUUID, you need to set a new PARTUUID for /dev/nvme0n1p1 (on the new drive):

# gdisk /dev/nvme0n1

The sequence of commands to enter into the gdisk prompt to set a new partition ID for partition 1 on nvme0n1 is (you can read more about this by doing ‘man gdisk’):

x, c, [Partition Number, in or case: 1], r, m, w, q

If you run blkid again, you should see two different PARTUUIDs now, like so:

/dev/nvme1n1p1: SEC_TYPE="msdos" UUID="931A-8E43" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="a46fd3a3-f428-4773-8c3e-c57f564bbad1"
/dev/nvme0n1p1: SEC_TYPE="msdos" UUID="931A-8E43" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="6216d8b1-cc59-4f25-be1f-12837532fbb1"

MAMA MIA!! Okay, almost there…

Now, you’re going to tell GRUB to try again:

# grub-mkdevicemap -n
# grub-install /dev/nvme0n1
# grub-install /dev/nvme1n1
# update-grub
# blkid [should still have 2 different PARTUUID's now!!]

Now, run the following command:

# efibootmgr -v

You should see TWO entries – one for each EFI partition on each of your drives. Most likely, you’ll only see 1 boot entry that points to your 2nd good drive only, like so:

BootCurrent: 0002
Timeout: 1 seconds
BootOrder: 0001,0002
Boot0001* UEFI: PXE IPv4 Intel(R) Ethernet Controller (3) I225-LM       PciRoot(0x0)/Pci(0x2,0x1)/Pci(0x0,0x0)/Pci(0x6,0x0)/Pci(0x0,0x0)/MAC(c87f54521d22,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0002* ubuntu        HD(1,GPT,a46fd3a3-f428-4773-8c3e-c57f564bbad,0x1000,0x80000)/File(\EFI\ubuntu\grubx64.efi)..BO

Notice that the PARTUUID points to nvme1n1p1, but there’s no entry for our shiny new nvme0n1p1!

Let’s add one. Using the same path to the .efi file from the existing entry above, we do:

# efibootmgr -c -d /dev/nvme0n1 -p 1 -L "ubuntu2" -l '\EFI\ubuntu\grubx64.efi'

Okay, now run efibootmgr -v again, and you should see two entries, each pointing to a different PARTUUID that corresponds to /dev/nvme0n1p1 and the other that matches /dev/nvme1n1p1.

Run this again:

# update-grub

You should see that it’s finding linux kernels to boot from. If not, you may need to find the latest linux kernel and reinstall it, then run update-grub again.

Reboot and pray

Now you can reboot. Note that it may take a few minutes for the server to boot, so don’t freak out and assume it failed again if you can’t SSH in right away.

At this point, if you get nothing after 5 minutes, you can Rescue Disk in again and try to repeat the above process to see what’s broken.

Most often, there is some issue with GRUB not finding linux kernels. With a bit of convincing, you should be able to salvage the install.

And next time, DO NOT FORGET to install the bootloader before rebooting the server!

Replace a failed drive in a software RAID on linux

Check if a RAID drive is failing

Remove the failed drive from the RAID

Replace the failed physical drive

The new drive is installed: Time to reconfigure RAID!

Installing the bootloader on the new drive

Oh crap, I rebooted and my server is down

Reboot and pray

Related

Submit a Comment Cancel reply

Subscribe via Email

Must Read

Most Popular

Recent Posts

Recent Comments

Archives

Replace a failed drive in a software RAID on linux

Check if a RAID drive is failing

Remove the failed drive from the RAID

Replace the failed physical drive

The new drive is installed: Time to reconfigure RAID!

Installing the bootloader on the new drive

Oh crap, I rebooted and my server is down

Reboot and pray

Related

Submit a Comment Cancel reply

Subscribe via Email

Must Read

Most Popular

Recent Posts

Recent Comments

Archives

Tags