Tuesday, March 3, 2020

zfs and megacli

Recently I had a task to replace a disk in my server that was throwing many smart errors to my email. It was actually an indicator that the disk is failing. Maybe not as critical as it seemed because it is a mirrored disk.  But i didn't like it when it happened.

The error I got in my email as below:


This message was generated by the smartd daemon running on:

   host name:  hq-hv-06
   DNS domain: mydomain.com.my

The following warning/error was logged by the smartd daemon:

Device: /dev/bus/1 [megaraid_disk_03] [SAT], 1 Offline uncorrectable sectors

Device info:
WDC WD2002FAEX-007BA0, S/N:WD-WCAY00005116, WWN:5-0014ee-206439d3b, FW:05.01D05, 2.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sun Jan 13 02:43:43 2019 +08
Another message will be sent in 24 hours if the problem persists.

To replace the disk, since I use mega RAID card, I have to use megacli utility for removal of the old disk and to replace with the new disk

These commands should be run before disk removal:

  1.  Offline the disk: megacli -PDOffline -Physdrv[252:3] -a0
  2. Mark the disk as missing: megacli -PDMarkMissing -Physdrv[252:3] -a0
  3. Prepare for removal: megacli -PDPrpRmv -Physdrv[252:3] -a0
Then, you can remove the disk from the slot and insert the new disk. When you remove the disk, the zpool status shows DEGRADED state


Let the server detects the disk and check the info of the disk by running :

megacli -PDInfo -PhysDrv[252:3] -a0

The you run this command to create virtual disk:

megacli -cfgLDAdd -r0 [252:3] -a0



and you can also use megaclisas-status command :

megaclisas-status

The output is as below: 

But, If you get this error:

The current operation is not allowed because the controller has data in cache for offline or missing virtual drives

Run this command the find what disk cause the cache:


megacli -GetPreservedCacheList -a0

The output is:

Adapter #0
Virtual Drive(Target ID 03): Missing.
Exit Code: 0x00
then run this command to discard the cache:

megacli -DiscardPreservedCache -L"03" -a0

The output:

Adapter #0
Virtual Drive(Target ID 03): Preserved Cache Data Cleared.
Exit Code: 0x00
Then you can run the command above to create a virtual disk.
Then run megaclisas-status again to check whether the virtual disk is created
If it is successfully created, for zfs, run command to replace the disk:
zpool replace rpool /dev/sdd 
(if the disk is sdd) and if it is successful, it will start the resilvering process
 

Check the status: zpool status rpool


When the resilvering process finish, check zfs zpool status



That's all there is to it

No comments:

Second monitor no display after latest update - KDE-neon

 After latest update as of Oct 3, 2023, my second monitor was undetected with latest kernel (6.2.0-33-generic). If I boot with previous kern...