Ceph For Media Storage (Big But Slow I/O)

Submitted by gpmidi on Wed, 06/03/2020 - 12:50

My Ceph cluster at home isn't designed for performance. It's not designed to maximum availability. It's designed for a low cost per TiB while still maintaining usability and decent disk-level redundancy. Here is some recent tuning to help with performance and corruption prevention....

  1. mds
    1. mds_cache_memory_limit: 4GiB => 8GiB
      1. Target maximum memory usage of MDS cache
      2. The default wasn't enough for the CephFS instance I have. Probably related to the number objects or frequency FS scans for metadata. 
  2. osd
    1. osd_deep_scrub_interval: 1w => 8w
      1. Deep scrub each PG (i.e., verify data checksums) at least this often
      2. Scrubs on 200TiB (and about to grow another 150TiB) of data isn't feasible once a week
    2. osd_op_queue: wpq => mclock_client
      1. Which operation priority queue algorithm to use
      2. Changed from wpq to mclock_client to help with client and workload type per-osd disk queuing
    3. osd_max_backfills: 1 => 4
      1. Maximum number of concurrent local and remote backfills or recoveries per OSD
      2. While most spinning should keep this at one, this should help overcome network/cpu latency. I think. 
    4. osd_recovery_max_active: 0 => 4
      1. Number of simultaneous active recovery operations per OSD (overrides _ssd and _hdd if non-zero)
      2. Helps ensure the recovery operations go as fast as possible. This  may impact client read performance during recovery. But I'd prefer that over a higher chance of corruption, data loss, or other problems. 
    5. osd_recovery_max_single_start: 1 => 4
      1. The maximum number of recovery operations per OSD that will be newly started when an OSD is recovering
      2. See 2.3.2
    6. osd_scrub_auto_repair: false => true
      1. Automatically repair damaged objects detected during scrub
      2. If something corrupt or out of sync is found, let's get that fixed asap. 
    7. osd_scrub_during_recovery: false => true
      1. Allow scrubbing when PGs on the OSD are undergoing recovery
      2. See 2.3.2
    8. osd_scrub_load_threshold: 0.5 => 4
      1. Allow scrubbing when system load divided by number of CPUs is below this value
      2. The load on my OSD servers is usually above 1 during normal operations. It only seems to go above four during heavy recovery and other heavy ops. So this seemed a good middle ground. 
    9. osd_scrub_max_interval: 1w => 4w
      1. Scrub each PG no less often than this interval
      2. With 15 million objects / 220 million replicas as of 2020-06-03, every day seems like overkill. 
    10. osd_scrub_min_interval: 1d => 1w
      1. Scrub each PG no more often than this interval
      2. See 2.9.2
  3. global
    1. target_max_misplaced_ratio: 5% => 1%
      1. Max ratio of misplaced objects to target when throttling data rebalancing activity
      2. Given the large number of objects in my cluster I figured this should be low so the reblancing is a higher priority. 

These can be fetched with `ceph config get <level 0> <level 1>` and set with `ceph config set <level 0> <level 1> <value>` where "level 0" is the top level indent in the list and "level 1" is the second level indentation. 

Details for the OSD configuration can be found at https://docs.ceph.com/docs/octopus/rados/configuration/osd-config-ref/

Tags