NDS-4600 - SATA Drive Failures In Linux

One issue I've recently run into with a failed SATA drive in one of my NDS-4600 units is that Linux frequently tries to recover the drive by resetting the bus. This takes out a few other disks in the group with it. The resulting IO timeouts cause problems for my Ceph OSDs using those disks. 

It should be noted that only some types of disk failures cause this. The host bus resets only are done by the Linux kernel in some cases (I think) and I suspect the cause of the other disks errors is said disk. 

"error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]"

I was getting this from podman on a CentOS 8 box: 

"error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]"

It was fixed by killing off all podman and /usr/bin/conmon processes as the user that I was running the commands as. Note: Don't do that as root using killall unless you limit to only your user. 

The underlying error may have been running out of FD.

Subscribe to