NDS-4600 - SATA Drive Failures In Linux

Submitted by gpmidi on Tue, 12/01/2020 - 17:01

One issue I've recently run into with a failed SATA drive in one of my NDS-4600 units is that Linux frequently tries to recover the drive by resetting the bus. This takes out a few other disks in the group with it. The resulting IO timeouts cause problems for my Ceph OSDs using those disks. 

It should be noted that only some types of disk failures cause this. The host bus resets only are done by the Linux kernel in some cases (I think) and I suspect the cause of the other disks errors is said disk. 

Sterling, Va Wires-X Node & Repeater Is Up: 449.375- 110.9hz

Submitted by gpmidi on Thu, 10/08/2020 - 01:02

The Sterling, VA, USA repeater and WIRES-X node is now up and operational. 

It is a full duplex WIRES-X node, C4FM Repeater, and FM Repeater in FM19ha run by KG4TIH.

This Yaesu DR-2X repeater on 70cm is set for analog and C4FM mode. The node is set to auto join the VA-Sterling (28558) room after an inactivity period of 10 minutes. When in this room or others all C4FM traffic will be sent to the Wires-X room and all signals received via the room will be transmitted via C4FM. The Wires-X features on some Yaesu radios will allow the user to control what room is connected. 

Scrubbing Vs Deep Scrubbing

Submitted by gpmidi on Wed, 06/03/2020 - 13:36

Ceph has two forms of scrubbing that it runs periodically: Scrub and Deep Scrub

A Scrub is basically as fsck for replicated objects. It ensures that each object's replicas are all the latest version and exist. 

A Deep Scrub is a full checksum validation of all data. 

Tags

"error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]"

Submitted by gpmidi on Sun, 05/24/2020 - 13:30

I was getting this from podman on a CentOS 8 box: 

"error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]"

It was fixed by killing off all podman and /usr/bin/conmon processes as the user that I was running the commands as. Note: Don't do that as root using killall unless you limit to only your user. 

The underlying error may have been running out of FD.

Ceph With Many OSDs

Submitted by gpmidi on Sun, 05/24/2020 - 13:26

While setting up my Ceph cluster on a set of Dell R710s, one with 60 disks attached to it, I found that I needed to raise fs.aio-max-nr to around 1,000,000. SELinux also needed to be disabled. Once that was done the normal cephadm osd install worked great, even with 60 disks. 

$ cat /etc/sysctl.d/99-osd.conf

# For OSDs
fs.aio-max-nr=1000000

Tags