How to replace faulty hard disk in software RAID 1 CentOS 7

RAID is preferred to bring redundancy and it saves the data if any disk fails. Currently I have CentOS 7 server which have two hard disk attached to it namely /dev/sda and /dev/sdb. Both drives are of identical in size.  You can easily see your RAID 1 configuration by checking details in /proc/mdstat

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 sdb2[2] sda2[0]
 19977216 blocks super 1.2 [2/2] [UU]
 bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdb1[2] sda1[0]
 976832 blocks super 1.0 [2/2] [UU]
 bitmap: 0/1 pages [0KB], 65536KB chunk

Now say if my one drive /dev/sdb fails then you can see error message on the screen or you can see error message in /var/log/messages file.

Jan 15 08:35:30 localhost kernel: sd 0:0:1:0: [sdb] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jan 15 08:35:30 localhost kernel: sd 0:0:1:0: [sdb] CDB: Write(10) 2a 00 00 1d d8 10 00 00 01 00
Jan 15 08:35:30 localhost kernel: blk_update_request: I/O error, dev sdb, sector 1955856
Jan 15 08:35:30 localhost kernel: md: super_written gets error=-5, uptodate=0
Jan 15 08:35:30 localhost kernel: md/raid1:md126: Disk failure on sdb2, disabling device.#012md/raid1:md126: Operation continuing on 1 devices.

You can also use command to check the RAID status which will show that RAID is degraded and disk is removed from the RAID

[root@localhost ~]# mdadm --detail /dev/md126
/dev/md126:
 Version : 1.2
 Creation Time : Sun Jan 15 06:08:06 2017
 Raid Level : raid1
 Array Size : 19977216 (19.05 GiB 20.46 GB)
 Used Dev Size : 19977216 (19.05 GiB 20.46 GB)
 Raid Devices : 2
 Total Devices : 2
 Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sun Jan 15 08:36:46 2017
 State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
 Spare Devices : 0

Name : localhost:pv00
 UUID : 0391d003:572d7ddc:6dcf78b3:7f2c737b
 Events : 179

Number Major Minor RaidDevice State
 0 8 2 0 active sync /dev/sda2
 2 0 0 2 removed

2 8 18 - faulty /dev/sdb2

As our secondary disk is failed I will remove that disk and will add new secondary disk to the server. System named that drive as /dev/sdb we can see our new disk in fdisk -l

[root@localhost ~]# fdisk -l

Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sda: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000ebada

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 1955839 976896 fd Linux raid autodetect
/dev/sda2 1955840 41943039 19993600 fd Linux raid autodetect

As our first disks have two partition and two RAID devices we will create same partitions on our secondary disk. Our primary disk have two partitions /dev/sda1 and /dev/sda2 so we will create same partitions on the secondary disk /dev/sdb

Creating /dev/sdb1

[root@localhost ~]# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0x0addb5e3.

Command (m for help): n
Partition type:
 p primary (0 primary, 0 extended, 4 free)
 e extended
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-41943039, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039): 1955839
Partition 1 of type Linux and of size 954 MiB is set

Command (m for help): t
Selected partition 1
Hex code (type L to list all codes):
Hex code (type L to list all codes): fd
Changed type of partition 'Linux' to 'Linux raid autodetect'

Command (m for help): p

Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0addb5e3

Device Boot Start End Blocks Id System
/dev/sdb1 2048 1955839 976896 fd Linux raid autodetect

Creating sdb2

Command (m for help): n
Partition type:
 p primary (1 primary, 0 extended, 3 free)
 e extended
Select (default p):
Using default response p
Partition number (2-4, default 2):
First sector (1955840-41943039, default 1955840):
Using default value 1955840
Last sector, +sectors or +size{K,M,G} (1955840-41943039, default 41943039):
Using default value 41943039
Partition 2 of type Linux and of size 19.1 GiB is set

Command (m for help): t
Partition number (1,2, default 2):
Hex code (type L to list all codes): fd
Changed type of partition 'Linux' to 'Linux raid autodetect'

Command (m for help): p

Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0addb5e3

Device Boot Start End Blocks Id System
/dev/sdb1 2048 1955839 976896 fd Linux raid autodetect
/dev/sdb2 1955840 41943039 19993600 fd Linux raid autodetect

Run command partprobe so kernel can read new partition scheme

You will now need to add the appropriate partitions in RAID configuration. Please make sure that newly created partitions have same size of your primary disks partitions

[root@localhost ~]# mdadm --manage /dev/md126 -a /dev/sdb1
mdadm: added /dev/sdb1


[root@localhost ~]# mdadm --manage /dev/md127 -a /dev/sdb2
mdadm: added /dev/sdb2

Once you added this partitions in there raid rebuild process will start automatically

[root@localhost ~]# mdadm --detail /dev/md127
/dev/md127:
 Version : 1.2
 Creation Time : Sun Jan 15 06:08:06 2017
 Raid Level : raid1
 Array Size : 19977216 (19.05 GiB 20.46 GB)
 Used Dev Size : 19977216 (19.05 GiB 20.46 GB)
 Raid Devices : 2
 Total Devices : 2
 Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sun Jan 15 09:38:43 2017
 State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 1

Rebuild Status : 21% complete

Name : localhost:pv00
 UUID : 0391d003:572d7ddc:6dcf78b3:7f2c737b
 Events : 408

Number Major Minor RaidDevice State
 0 8 2 0 active sync /dev/sda2
 2 8 18 1 spare rebuilding /dev/sdb2

Just wait for RAID rebuild process and once RAID is rebuilt you are back on your track.

 

Leave a Comment