diff options
author | John Wilkins <john.wilkins@inktank.com> | 2013-02-25 15:27:09 -0800 |
---|---|---|
committer | John Wilkins <john.wilkins@inktank.com> | 2013-02-25 15:27:09 -0800 |
commit | 3ef6d5bde15923315599f335eee718e3fca1e7f7 (patch) | |
tree | 84b5553eed62d572220c1a9d5781763103a6e90d | |
parent | 2b7deeea2e9b5794d6e7df5efe75c6b6a9f7e1e7 (diff) | |
download | ceph-3ef6d5bde15923315599f335eee718e3fca1e7f7.tar.gz |
doc: Updated osd configuration reference.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
-rw-r--r-- | doc/rados/configuration/osd-config-ref.rst | 625 |
1 files changed, 457 insertions, 168 deletions
diff --git a/doc/rados/configuration/osd-config-ref.rst b/doc/rados/configuration/osd-config-ref.rst index 97759652202..81df56a1489 100644 --- a/doc/rados/configuration/osd-config-ref.rst +++ b/doc/rados/configuration/osd-config-ref.rst @@ -2,35 +2,65 @@ OSD Config Reference ====================== +You can configure OSDs in the Ceph configuration file, but OSDs can use the +default values and a very minimal configuration. A minimal OSD configuration +sets ``osd journal size`` and ``osd host``, and uses default values for +nearly everything else. + + +General Settings +================ + +The following settings provide an OSD's ID, and determine paths to data and +journals. Ceph deployment scripts typically generate the UUID automatically. We +**DO NOT** recommend changing the default paths for data or journals, as it +makes it more problematic to troubleshoot Ceph later. + +The journal size should be at least twice the product of the expected drive +speed multiplied by ``filestore max sync interval``. However, the most common +practice is to partition the journal drive (often an SSD), and mount it such +that Ceph uses the entire partition for the journal. ``osd uuid`` :Description: The universally unique identifier (UUID) for the OSD. :Type: UUID -:Default: None +:Default: The UUID. ``osd data`` -:Description: The path to the OSDs data. You must create the directory. You should mount a data disk at this mount point. We do not recommend changing the default. +:Description: The path to the OSDs data. You must create the directory when + deploying Ceph. You should mount a drive for OSD data at this + mount point. We do not recommend changing the default. + :Type: String :Default: ``/var/lib/ceph/osd/$cluster-$id`` ``osd journal`` -:Description: The path to the OSD's journal. This may be a path to a file or a block device (such as a partition of an SSD). If it is a file, you must create the directory to contain it. +:Description: The path to the OSD's journal. This may be a path to a file or a + block device (such as a partition of an SSD). If it is a file, + you must create the directory to contain it. We recommend using a + drive separate from the ``osd data`` drive. + :Type: String :Default: ``/var/lib/ceph/osd/$cluster-$id/journal`` ``osd journal size`` -:Description: The size of the journal in megabytes. If this is 0, and the journal is a block device, the entire block device is used. Since v0.54, this is ignored if the journal is a block device, and the entire block device is used. +:Description: The size of the journal in megabytes. If this is 0, and the + journal is a block device, the entire block device is used. + Since v0.54, this is ignored if the journal is a block device, + and the entire block device is used. + :Type: 32-bit Integer -:Default: ``1024`` -:Recommended: Begin with 1GB. Should be at least twice the product of the expected speed multiplied by ``filestore max sync interval``. +:Default: ``5120`` +:Recommended: Begin with 1GB. Should be at least twice the product of the + expected speed multiplied by ``filestore max sync interval``. ``osd max write size`` @@ -47,179 +77,254 @@ :Default: 500MB default. ``500*1024L*1024L`` -``osd stat refresh interval`` +``osd class dir`` -:Description: The status refresh interval in seconds. -:Type: 64-bit Integer Unsigned -:Default: ``.5`` +:Description: The class path for RADOS class plug-ins. +:Type: String +:Default: ``$libdir/rados-classes`` -``osd pg bits`` +Monitor OSD Interaction +======================= -:Description: Placement group bits per OSD. +OSDs check each other's heartbeats and report to monitors periodically. Ceph can +use default values in many cases. However, if your network has latency issues, +you may need to adopt longer intervals. See `Configuring Monitor/OSD +Interaction`_ for a detailed discussion of heartbeats. + + +``osd heartbeat address`` + +:Description: An OSD's network address for heartbeats. +:Type: Address +:Default: The host address. + + +``osd heartbeat interval`` + +:Description: How often an OSD pings its peers (in seconds). :Type: 32-bit Integer -:Default: ``6`` +:Default: ``6`` -``osd pgp bits`` +``osd heartbeat grace`` -:Description: The number of bits per OSD for PGPs. +:Description: The elapsed time when an OSD hasn't shown a heartbeat that the + cluster considers it ``down``. + :Type: 32-bit Integer -:Default: ``4`` +:Default: ``20`` -``osd pg layout`` +``osd mon heartbeat interval`` -:Description: Placement group layout. +:Description: How often the OSD pings a monitor if it has no OSD peers. :Type: 32-bit Integer -:Default: ``2`` +:Default: ``30`` -``osd pool default crush rule`` +``osd mon report interval max`` + +:Description: The maximum time in seconds for an OSD to report to a monitor + before the monitor considers the OSD ``down``. -:Description: The default CRUSH rule to use when creating a pool. :Type: 32-bit Integer -:Default: ``0`` +:Default: ``120`` -``osd pool default size`` +``osd mon report interval min`` + +:Description: The minimum number of seconds for an OSD to report to a monitor + to avoid the monitor considering the OSD ``down``. -:Description: The default size of an OSD pool in gigabytes. The default value is the same as ``--size 2`` with ``mkpool``. :Type: 32-bit Integer -:Default: ``2`` +:Default: ``5`` +:Valid Range: Should be less than ``osd mon report interval max`` -``osd pool default pg num`` +``osd mon ack timeout`` + +:Description: The number of seconds to wait for a monitor to acknowledge a + request for statistics. -:Description: The default number of placement groups for a pool. The default value is the same as ``pg_num`` with ``mkpool``. :Type: 32-bit Integer -:Default: ``8`` +:Default: ``30`` -``osd pool default pgp num`` +``osd min down reporters`` -:Description: The default number of placement groups for placement for a pool. The default value is the same as ``pgp_num`` with ``mkpool``. PG and PGP should be equal (for now). +:Description: The minimum number of OSDs required to report a ``down`` OSD. :Type: 32-bit Integer -:Default: ``8`` +:Default: ``1`` -``osd map dedup`` +``osd min down reports`` -:Description: Enable removing duplicates in the OSD map. -:Type: Boolean -:Default: ``true`` +:Description: The minimum number of times an OSD must report that another + is ``down``. +:Type: 32-bit Integer +:Default: ``3`` -``osd map cache size`` -:Description: The size of the OSD map cache in megabytes. +Data Placement +============== + +When you create pools and set the number of placement groups for the pool, Ceph +uses default values when you don't specifically override the defaults. **We +recommend** overridding some of the defaults. Specifically, we recommend setting +a pools replica size and overriding the default number of placement groups. You +can specifically set these values when running `pool`_ commands. You can also +override the defaults by adding new ones in the ``[osd]`` section of your Ceph +configuration file. + + +.. literalinclude:: pool-pg.conf + :language: ini + + +``osd pg bits`` + +:Description: Placement group bits per OSD. :Type: 32-bit Integer -:Default: ``500`` +:Default: ``6`` -``osd map cache bl size`` +``osd pgp bits`` -:Description: The size of the in-memory OSD map cache in OSD daemons. +:Description: The number of bits per OSD for PGPs. :Type: 32-bit Integer -:Default: ``50`` +:Default: ``6`` -``osd map cache bl inc size`` +``osd crush chooseleaf type`` + +:Description: The bucket type to use for ``chooseleaf`` in a CRUSH rule. Uses + ordinal rank rather than name. -:Description: The size of the in-memory OSD map cache incrementals in OSD daemons. :Type: 32-bit Integer -:Default: ``100`` +:Default: ``1``. Typically a host containing one or more OSDs. -``osd map message max`` +``osd min rep`` -:Description: The maximum map entries allowed per MOSDMap message. +:Description: The minimum number of replicas for a ruleset. :Type: 32-bit Integer -:Default: ``100`` +:Default: ``1`` -``osd op threads`` +``osd max rep`` -:Description: The number of OSD operation threads. Set to ``0`` to disable it. Increasing the number may increase the request processing rate. +:Description: The maximum number of replicas for a ruleset. :Type: 32-bit Integer -:Default: ``2`` +:Default: ``10`` -``osd op thread timeout`` +``osd pool default crush rule`` -:Description: The OSD operation thread timeout in seconds. +:Description: The default CRUSH ruleset to use when creating a pool. :Type: 32-bit Integer -:Default: ``30`` +:Default: ``0`` -``osd disk threads`` +``osd pool default size`` + +:Description: Sets the number of replicas for objects in the pool. The default + value is the same as + ``ceph osd pool set {pool-name} size {size}``. -:Description: The number of disk threads, which are used to perform background disk intensive OSD operations such as scrubbing and snap trimming. :Type: 32-bit Integer -:Default: ``1`` +:Default: ``2`` -``osd recovery threads`` +``osd pool default min size`` + +:Descrption: Sets the minimum number of written replicas for objects in the + pool in order to acknowledge a write operation to the client. + If minimum is not met, Ceph will not acknowledge the write to the + client. This setting ensures a minimum number of replicas when + operating in ``degraded`` mode. -:Description: The number of threads for recovering data. :Type: 32-bit Integer -:Default: ``1`` +:Default: ``0``, which means no particular minimum. If ``0``, + minimum is ``size - (size / 2)``. -``osd backfill scan min`` +``osd pool default pg num`` + +:Description: The default number of placement groups for a pool. The default + value is the same as ``pg_num`` with ``mkpool``. -:Description: The scan interval in seconds for backfill operations. :Type: 32-bit Integer -:Default: ``64`` +:Default: ``8`` -``osd backfill scan max`` +``osd pool default pgp num`` + +:Description: The default number of placement groups for placement for a pool. + The default value is the same as ``pgp_num`` with ``mkpool``. + PG and PGP should be equal (for now). -:Description: The maximum scan interval in seconds for backfill operations. :Type: 32-bit Integer -:Default: ``512`` +:Default: ``8`` -``osd backlog thread timeout`` +``osd pool default flags`` -:Description: The maximum time in seconds before timing out a backlog thread. +:Description: The default flags for new pools. :Type: 32-bit Integer -:Default: ``60*60*1`` +:Default: ``0`` -``osd max backfills`` +``osd max pgls`` -:Description: The maximum number of backfills allowed to or from a single OSD. -:Type: 64-bit Unsigned Integer -:Default: 10 +:Description: The maximum number of placement groups to list. A client + requesting a large number can tie up the OSD. +:Type: Unsigned 64-bit Integer +:Default: ``1024`` +:Note: Default should be fine. -``osd backfill full ratio`` -:Description: Refuse to accept backfill requests when the OSD's full ratio is above this value. -:Type: Float -:Default: ``0.85`` +``osd min pg log entries`` +:Description: The minimum number of placement group logs to maintain + when trimming log files. -``osd backfill retry interval`` +:Type: 32-bit Int Unsigned +:Default: ``1000`` -:Description: The number of seconds to wait before retrying backfill requests. -:Type: Double -:Default: ``10.0`` +``osd default data pool replay window`` -``osd recovery thread timeout`` +:Description: The time (in seconds) for an OSD to wait for a client to replay + a request. -:Description: The maximum time in seconds before timing out a recovery thread. :Type: 32-bit Integer -:Default: ``30`` +:Default: ``45`` -``osd snap trim thread timeout`` +Scrubbing +========= + +In addition to making multiple copies of objects, Ceph insures data integrity by +scrubbing placement groups. Ceph scrubbing is analogous to ``fsck`` on the +object storage layer. For each placement group, Ceph generates a catalog of all +objects and compares each primary object and its replicas to ensure that no +objects are missing or mismatched. Light scrubbing (daily) checks the object +size and attributes. Deep scrubbing (weekly) reads the data and uses checksums +to ensure data integrity. + +Scrubbing is important for maintaining data integrity, but it can reduce +performance. You can adjust the following settings to increase or decrease +scrubbing operations. -:Description: The maximum time in seconds before timing out a snap trim thread. -:Type: 32-bit Integer -:Default: ``60*60*1`` + +``osd max scrubs`` + +:Description: The maximum number of scrub operations for an OSD. +:Type: 32-bit Int +:Default: ``1`` ``osd scrub thread timeout`` @@ -231,99 +336,278 @@ ``osd scrub finalize thread timeout`` -:Description: The maximum time in seconds before timing out a scrub finalize thread. +:Description: The maximum time in seconds before timing out a scrub finalize + thread. + :Type: 32-bit Integer -:Default: 60*10 +:Default: ``60*10`` -``osd remove thread timeout`` +``osd scrub load threshold`` + +:Description: The maximum CPU load. Ceph will not scrub when the CPU load is + higher than this number. Default is 50%. + +:Type: Float +:Default: ``0.5`` + + +``osd scrub min interval`` + +:Description: The maximum interval in seconds for scrubbing the OSD when the + cluster load is low. + +:Type: Float +:Default: 5 minutes. ``300`` + + +``osd scrub max interval`` + +:Description: The maximum interval in seconds for scrubbing the OSD + irrespective of cluster load. + +:Type: Float +:Default: Once per day. ``60*60*24`` + + +``osd deep scrub interval`` + +:Description: The interval for "deep" scrubbing (fully reading all data). +:Type: Float +:Default: Once per week. ``60*60*24*7`` + + +``osd deep scrub stride`` + +:Description: Read size when doing a deep scrub. +:Type: 32-bit Int +:Default: 512 KB. ``524288`` + + +Operations +========== + +Operations settings allow you to configure the number of threads for servicing +requests. If you set ``osd op threads`` to ``0``, it disables multi-threading. +By default, Ceph uses two threads with a 30 second timeout and a 30 second +complaint time if an operation doesn't complete within those time parameters. +You can set operations priority weights between client operations and +recovery operations to ensure optimal performance during recovery. + + +``osd op threads`` + +:Description: The number of threads to service OSD operations. Set to ``0`` to + disable it. Increasing the number may increase the request + processing rate. -:Description: The maximum time in seconds before timing out a remove OSD thread. :Type: 32-bit Integer -:Default: 60*60 +:Default: ``2`` -``osd command thread timeout`` +``osd client op priority`` + +:Description: The priority set for client operations. It is relative to + ``osd recovery op priority``. -:Description: The maximum time in seconds before timing out a command thread. :Type: 32-bit Integer -:Default: ``10*60`` +:Default: ``63`` +:Valid Range: 1-63 -``osd heartbeat address`` +``osd recovery op priority`` -:Description: An OSD's network address for heartbeats. -:Type: Address -:Default: The host address. +:Description: The priority set for recovery operations. It is relative to + ``osd client op priority``. + +:Type: 32-bit Integer +:Default: ``10`` +:Valid Range: 1-63 -``osd heartbeat interval`` +``osd op thread timeout`` -:Description: How often an OSD pings its peers (in seconds). +:Description: The OSD operation thread timeout in seconds. :Type: 32-bit Integer -:Default: ``6`` +:Default: ``30`` -``osd heartbeat grace`` +``osd op complaint time`` + +:Description: An operation becomes complaint worthy after the specified number + of seconds have elapsed. + +:Type: Float +:Default: ``30`` + + +``osd disk threads`` + +:Description: The number of disk threads, which are used to perform background + disk intensive OSD operations such as scrubbing and snap + trimming. -:Description: The elapsed time when an OSD hasn't shown a heartbeat that the cluster considers it ``down``. :Type: 32-bit Integer +:Default: ``1`` + + +``osd op history size`` + +:Description: The maximum number of completed operations to track. +:Type: 32-bit Unsigned Integer :Default: ``20`` -``osd mon heartbeat interval`` +``osd op history duration`` -:Description: How often the OSD pings a monitor if it has no OSD peers. +:Description: The oldest completed operation to track. +:Type: 32-bit Unsigned Integer +:Default: ``600`` + + +``osd op log threshold`` + +:Description: How many operations logs to display at once. :Type: 32-bit Integer -:Default: ``30`` +:Default: ``5`` -``osd mon report interval max`` -:Description: The maximum time in seconds for an OSD to report to a monitor before the monitor considers the OSD ``down``. +Backfilling +=========== + +When you add or remove OSDs to a cluster, the CRUSH algorithm will want to +rebalance the cluster by moving placement groups to or from OSDs to restore the +balance. The process of migrating placement groups and the objects they contain +can reduce the cluster's operational performance considerably. To maintain +operational performance, Ceph performs this migration with 'backfilling', which +allows Ceph to set backfill operations to a lower priority than requests to read +or write data. + + +``osd max backfills`` + +:Description: The maximum number of backfills allowed to or from a single OSD. +:Type: 64-bit Unsigned Integer +:Default: ``10`` + + +``osd backfill scan min`` + +:Description: The scan interval in seconds for backfill operations when cluster + load is low. :Type: 32-bit Integer -:Default: ``120`` +:Default: ``64`` -``osd mon report interval min`` +``osd backfill scan max`` + +:Description: The maximum scan interval in seconds for backfill operations + irrespective of cluster load. -:Description: The minimum number of seconds for an OSD to report to a monitor to avoid the monitor considering the OSD ``down``. :Type: 32-bit Integer -:Default: ``5`` -:Valid Range: Should be less than ``osd mon report interval max`` +:Default: ``512`` -``osd mon ack timeout`` +``osd backfill full ratio`` + +:Description: Refuse to accept backfill requests when the OSD's full ratio is + above this value. +:Type: Float +:Default: ``0.85`` -:Description: The number of seconds to wait for a monitor to acknowledge a request for statistics. + +``osd backfill retry interval`` + +:Description: The number of seconds to wait before retrying backfill requests. +:Type: Double +:Default: ``10.0`` + + + + +OSD Map +======= + +OSD maps reflect the OSD daemons operating in the cluster. Over time, the +number of map epochs increases. Ceph provides some settings to ensure that +Ceph performs well as the OSD map grows larger. + + +``osd map dedup`` + +:Description: Enable removing duplicates in the OSD map. +:Type: Boolean +:Default: ``true`` + + +``osd map cache size`` + +:Description: The size of the OSD map cache in megabytes. :Type: 32-bit Integer -:Default: ``30`` +:Default: ``500`` -``osd min down reporters`` +``osd map cache bl size`` -:Description: The minimum number of OSDs required to report a ``down`` OSD. +:Description: The size of the in-memory OSD map cache in OSD daemons. :Type: 32-bit Integer -:Default: ``1`` +:Default: ``50`` -``osd min down reports`` +``osd map cache bl inc size`` + +:Description: The size of the in-memory OSD map cache incrementals in + OSD daemons. -:Description: The minimum number of times an OSD must report that another is ``down``. :Type: 32-bit Integer -:Default: ``3`` +:Default: ``100`` + + +``osd map message max`` + +:Description: The maximum map entries allowed per MOSDMap message. +:Type: 32-bit Integer +:Default: ``100`` + + + +Recovery +======== + +When the cluster starts or when an OSD crashes and restarts, the OSD begins +peering with other OSDs before writes can occur. See `Monitoring OSDs and PGs`_ +for details. + +If an OSD crashed and comes back online, usually it will be out of sync with +other OSDs containing more recent versions of objects in the placement groups. +When this happens, the OSD goes into recovery mode and seeks to get the latest +copy of the data and bring its map back up to date. Depending upon how long the +OSD was down, the OSD's objects and placement groups may be significantly out of +date. Also, if a failure domain went down (e.g., a rack), more than one OSD may +come back online at the same time. This can make the recovery process time +consuming and resource intensive. + +To maintain operational performance, Ceph performs recovery with limitations on +the number recovery requests, threads and object chunk sizes which allows Ceph +perform well in a degraded state. ``osd recovery delay start`` -:Description: After peering completes, Ceph will delay for the specified number of seconds before starting to recover objects. +:Description: After peering completes, Ceph will delay for the specified number + of seconds before starting to recover objects. + :Type: Float :Default: ``15`` ``osd recovery max active`` -:Description: The number of active recovery requests per OSD at one time. More accelerates recovery, but places an increased load on the cluster. +:Description: The number of active recovery requests per OSD at one time. More + requests will accelerate recovery, but the requests places an + increased load on the cluster. + :Type: 32-bit Integer :Default: ``5`` @@ -335,53 +619,53 @@ :Default: ``1 << 20`` -``osd max scrubs`` +``osd recovery threads`` -:Description: The maximum number of scrub operations for an OSD. -:Type: 32-bit Int -:Default: ``1`` +:Description: The number of threads for recovering data. +:Type: 32-bit Integer +:Default: ``1`` -``osd scrub load threshold`` +``osd recovery thread timeout`` -:Description: The maximum CPU load. Ceph will not scrub when the CPU load is higher than this number. Default is 50%. -:Type: Float -:Default: ``0.5`` +:Description: The maximum time in seconds before timing out a recovery thread. +:Type: 32-bit Integer +:Default: ``30`` -``osd scrub min interval`` +``osd recover clone overlap`` -:Description: The maximum interval in seconds for scrubbing the OSD. -:Type: Float -:Default: 5 minutes. ``300`` +:Description: Preserves clone overlap during recovery. Should always be set + to ``true``. +:Type: Boolean +:Default: ``true`` -``osd scrub max interval`` -:Description: The maximum interval in seconds for scrubbing the OSD. -:Type: Float -:Default: Once per day. ``60*60*24`` +Miscellaneous +============= -``osd deep scrub interval`` -:Description: The interval for "deep" scrubbing (fully reading all data). -:Type: Float -:Default: Once per week. ``60*60*24*7`` +``osd snap trim thread timeout`` +:Description: The maximum time in seconds before timing out a snap trim thread. +:Type: 32-bit Integer +:Default: ``60*60*1`` -``osd deep scrub stride`` -:Description: Read size when doing a deep scrub. -:Type: 32-bit Int -:Default: 512 KB. ``524288`` +``osd backlog thread timeout`` + +:Description: The maximum time in seconds before timing out a backlog thread. +:Type: 32-bit Integer +:Default: ``60*60*1`` -``osd class dir`` +``osd default notify timeout`` -:Description: The class path for RADOS class plug-ins. -:Type: String -:Default: ``$libdir/rados-classes`` +:Description: The OSD default notification timeout (in seconds). +:Type: 32-bit Integer Unsigned +:Default: ``30`` ``osd check for log corruption`` @@ -391,25 +675,18 @@ :Default: ``false`` -``osd default notify timeout`` - -:Description: The OSD default notification timeout (in seconds). -:Type: 32-bit Integer Unsigned -:Default: ``30`` - - -``osd min pg log entries`` +``osd remove thread timeout`` -:Description: The minimum number of placement group logs to maintain when trimming log files. -:Type: 32-bit Int Unsigned -:Default: 1000 +:Description: The maximum time in seconds before timing out a remove OSD thread. +:Type: 32-bit Integer +:Default: ``60*60`` -``osd op complaint time`` +``osd command thread timeout`` -:Description: An operation becomes complaint worthy after the specified number of seconds have elapsed. -:Type: Float -:Default: ``30`` +:Description: The maximum time in seconds before timing out a command thread. +:Type: 32-bit Integer +:Default: ``10*60`` ``osd command max records`` @@ -432,3 +709,15 @@ :Type: Boolean :Default: ``false`` + +``osd preserve trimmed log`` + +:Description: Preserves trimmed log files, but uses more disk space. +:Type: Boolean +:Default: ``false`` + + + +.. _pool: ../../operations/pools +.. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction +.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering
\ No newline at end of file |