diff options
author | Sage Weil <sage@inktank.com> | 2013-01-01 10:36:57 -0800 |
---|---|---|
committer | Sage Weil <sage@inktank.com> | 2013-01-01 10:36:57 -0800 |
commit | eb02eaede53c03579d015ca00a888a48dbab739a (patch) | |
tree | 428ec34918b85f44d213d48a0b984e168205410b | |
parent | f1196c7e93af83405ea5082030a7899e256ded7a (diff) | |
parent | 4aa6af76e1c838e3458929091ad2c6ad029b4f34 (diff) | |
download | ceph-eb02eaede53c03579d015ca00a888a48dbab739a.tar.gz |
Merge remote-tracking branch 'gh/wip-bobtail-docs'
-rw-r--r-- | PendingReleaseNotes | 3 | ||||
-rw-r--r-- | doc/install/index.rst | 1 | ||||
-rw-r--r-- | doc/install/upgrading-ceph.rst | 190 | ||||
-rw-r--r-- | doc/rados/operations/add-or-rm-osds.rst | 4 | ||||
-rw-r--r-- | doc/rados/operations/crush-map.rst | 29 | ||||
-rw-r--r-- | doc/release-notes.rst | 272 |
6 files changed, 480 insertions, 19 deletions
diff --git a/PendingReleaseNotes b/PendingReleaseNotes index adaef8d0e67..e69de29bb2d 100644 --- a/PendingReleaseNotes +++ b/PendingReleaseNotes @@ -1,3 +0,0 @@ -The 'ceph osd create' command now rejects an argument that is not a -UUID. The syntax has been 'ceph osd create [uuid]' since 0.47, but the -older 'ceph osd create [id]' was not rejected until 0.55. diff --git a/doc/install/index.rst b/doc/install/index.rst index 7fc8e2821d0..c93537d39d1 100644 --- a/doc/install/index.rst +++ b/doc/install/index.rst @@ -37,6 +37,7 @@ may install development release and testing packages. Installing Debian/Ubuntu Packages <debian> Installing RPM Packages <rpm> + Upgrading Ceph <upgrading-ceph> .. raw:: html diff --git a/doc/install/upgrading-ceph.rst b/doc/install/upgrading-ceph.rst new file mode 100644 index 00000000000..10b0d06f1c7 --- /dev/null +++ b/doc/install/upgrading-ceph.rst @@ -0,0 +1,190 @@ +================ + Upgrading Ceph +================ + +You can upgrade daemons in your Ceph cluster one-by-one while the cluster is +online and in service! The upgrade process is relatively simple: + +#. Login to a host and upgrade the Ceph package. +#. Restart the daemon. +#. Ensure your cluster is healthy. + +.. important:: Once you upgrade a daemon, you cannot downgrade it. + +Certain types of daemons depend upon others. For example, metadata servers and +RADOS gateways depend upon Ceph monitors and OSDs. We recommend upgrading +daemons in this order: + +#. Monitors (or OSDs) +#. OSDs (or Monitors) +#. Metadata Servers +#. RADOS Gateway + +As a general rule, we recommend upgrading all the daemons of a specific type +(e.g., all ``ceph-osd`` daemons, all ``ceph-mon`` daemons, etc.) to ensure that +they are all on the same release. We also recommend that you upgrade all the +daemons in your cluster before you try to excercise new functionality in a +release. + +The following sections describe the upgrade process. + +.. important:: Each release of Ceph may have some additional steps. Refer to + release-specific sections for details BEFORE you begin upgrading daemons. + +Upgrading an OSD +================ + +To upgrade an OSD peform the following steps: + +#. Upgrade the OSD package:: + + ssh {osd-host} + sudo apt-get update && sudo apt-get install ceph-osd + +#. Restart the OSD, where ``N`` is the OSD number:: + + service ceph restart osd.N + +#. Ensure the upgraded OSD has rejoined the cluster:: + + ceph osd stat + +Once you have successfully upgraded an OSD, you may upgrade another OSD until +you have completed the upgrade cycle for all of your OSDs. + + +Upgrading a Monitor +=================== + +To upgrade a monitor, perform the following steps: + +#. Upgrade the ceph package:: + + ssh {mon-host} + sudo apt-get update && sudo apt-get install ceph + +#. Restart the monitor:: + + service ceph restart mon.{name} + +#. Ensure the monitor has rejoined the quorum. :: + + ceph mon stat + +Once you have successfully upgraded a monitor, you may upgrade another monitor +until you have completed the upgrade cycle for all of your monitors. + + +Upgrading a Metadata Server +=========================== + +To upgrade an MDS, perform the following steps: + +#. Upgrade the ceph package:: + + ssh {mds-host} + sudo apt-get update && sudo apt-get install ceph ceph-mds + +#. Restart the metadata server:: + + service ceph restart mds.{name} + +#. Ensure the metadata server is up and running:: + + ceph mds stat + +Once you have successfully upgraded a metadata, you may upgrade another metadata +server until you have completed the upgrade cycle for all of your metadata +servers. + +Upgrading a Client +================== + +Once you have upgraded the packages and restarted daemons on your Ceph +cluster, we recommend upgrading ``ceph-common`` and client libraries +(``librbd1`` and ``librados2``) on your client nodes too. + +#. Upgrade the package:: + + ssh {client-host} + apt-get update && sudo apt-get install ceph-common librados2 librbd1 python-ceph + +#. Ensure that you have the latest version:: + + ceph --version + + +Upgrading from Argonaut to Bobtail +================================== + +When upgrading from Argonaut to Bobtail, you need to be aware of three things: + +#. Authentication now defaults to **ON**, but used to default to off. +#. Monitors use a new internal on-wire protocol +#. RBD ``format2`` images require updgrading all OSDs before using it. + +See the following sections for details. + + +Authentication +-------------- + +The Ceph Bobtail release enables authentication by default. Bobtail also has +finer-grained authentication configuration settings. In previous versions of +Ceph (i.e., actually v 0.55 and earlier), you could simply specify:: + + auth supported = [cephx | none] + +This option still works, but is deprecated. New releases support +``cluster``, ``service`` and ``client`` authentication settings as +follows:: + + auth cluster required = [cephx | none] # default cephx + auth service required = [cephx | none] # default cephx + auth client required = [cephx | none] # default cephx,none + +.. important:: If your cluster does not currently have an ``auth + supported`` line that enables authentication, you must explicitly + turn it off in Bobtail using the settings below.:: + + auth cluster required = none + auth service required = none + + This will disable authentication on the cluster, but still leave + clients with the default configuration where they can talk to a + cluster that does enable it, but do not require it. + +.. important:: If your cluster already has an ``auth supported`` option defined in + the configuration file, no changes are necessary. + +See `Ceph Authentication - Backward Compatibility`_ for details. + +.. _Ceph Authentication: ../../rados/operations/authentication/ +.. _Ceph Authentication - Backward Compatibility: ../../rados/operations/authentication/#backward-compatibility + +Monitor On-wire Protocol +------------------------ + +We recommend upgrading all monitors to Bobtail. A mixture of Bobtail and +Argonaut monitors will not be able to use the new on-wire protocol, as the +protocol requires all monitors to be Bobtail or greater. Upgrading only a +majority of the nodes (e.g., two out of three) may expose the cluster to a +situation where a single additional failure may compromise availability (because +the non-upgraded daemon cannot participate in the new protocol). We recommend +not waiting for an extended period of time between ``ceph-mon`` upgrades. + + +RBD Images +---------- + +The Bobtail release supports ``format 2`` images! However, you should not create +or use ``format 2`` RBD images until after all ``ceph-osd`` daemons have been +upgraded. Note that ``format 1`` is still the default. You can use the new +``ceph osd ls`` and ``ceph tell osd.N version`` commands to doublecheck your +cluster. ``ceph osd ls`` will give a list of all OSD IDs that are part of the +cluster, and you can use that to write a simple shell loop to display all the +OSD version strings: :: + + for i in $(ceph osd ls); do + ceph tell osd.${i} version + done diff --git a/doc/rados/operations/add-or-rm-osds.rst b/doc/rados/operations/add-or-rm-osds.rst index f60ddc6970f..daa86226c55 100644 --- a/doc/rados/operations/add-or-rm-osds.rst +++ b/doc/rados/operations/add-or-rm-osds.rst @@ -189,7 +189,7 @@ hard disks than older hosts in the cluster. also decompile the CRUSH map edit the file, recompile it and set it. See `Add/Move an OSD`_ for details. :: - ceph osd crush set {id} {name} {weight} pool={pool-name} [{bucket-type}={bucket-name} ...] + ceph osd crush set {name} {weight} [{bucket-type}={bucket-name} ...] Starting the OSD @@ -286,7 +286,7 @@ After you take an OSD out of the cluster, it may still be running. That is, the OSD may be ``up`` and ``out``. You must stop your OSD before you remove it from the configuration. :: - ssh {new-osd-host} + ssh {osd-host} sudo /etc/init.d/ceph stop osd.{osd-num} Once you stop your OSD, it is ``down``. diff --git a/doc/rados/operations/crush-map.rst b/doc/rados/operations/crush-map.rst index c9fe8728dab..43e9a9bad3f 100644 --- a/doc/rados/operations/crush-map.rst +++ b/doc/rados/operations/crush-map.rst @@ -192,21 +192,22 @@ types. +------+-------------+----------------------------------------------------+ | 5 | Data Center | A physical data center containing rooms. | +------+-------------+----------------------------------------------------+ -| 6 | Pool | A data storage pool for storing objects. | +| 6 | Root | The root node in a tree. | +------+-------------+----------------------------------------------------+ .. tip:: You can remove these types and create your own bucket types. Ceph's deployment tools generate a CRUSH map that contains a bucket for each -host, and a pool named "default," which is useful for the default ``data``, +host, and a root named "default," which is useful for the default ``data``, ``metadata`` and ``rbd`` pools. The remaining bucket types provide a means for storing information about the physical location of nodes/buckets, which makes cluster administration much easier when OSDs, hosts, or network hardware malfunction and the administrator needs access to physical hardware. -.. tip: The term "bucket" used in the context of CRUSH means a Ceph pool, a - location, or a piece of physical hardware. It is a different concept from - the term "bucket" when used in the context of RADOS Gateway APIs. +.. tip: The term "bucket" used in the context of CRUSH means a node in + the hierarchy, i.e. a location or a piece of physical hardware. It + is a different concept from the term "bucket" when used in the + context of RADOS Gateway APIs. A bucket has a type, a unique name (string), a unique ID expressed as a negative integer, a weight relative to the total capacity/capability of its item(s), the @@ -225,7 +226,7 @@ relative weight of the item. item [item-name] weight [weight] } -The following example illustrates how you can use buckets to aggregate a pool and +The following example illustrates how you can use buckets to aggregate physical locations like a datacenter, a room, a rack and a row. :: host ceph-osd-server-1 { @@ -293,7 +294,7 @@ physical locations like a datacenter, a room, a rack and a row. :: item server-room-2 weight 30.00 } - pool data { + root default { id -10 alg straw hash 0 @@ -387,12 +388,12 @@ A rule takes the following form:: ``step emit`` -:Description: Outputs the current value and empties the stack. Typically used at the end of a rule, but may also be used to from different trees in the same rule. +:Description: Outputs the current value and empties the stack. Typically used at the end of a rule, but may also be used to pick from different trees in the same rule. :Purpose: A component of the rule. :Prerequisite: Follows ``step choose``. :Example: ``step emit`` -.. important:: To activate one or more rules with a common ruleset number to a pool, set the ruleset number to the pool. +.. important:: To activate one or more rules with a common ruleset number to a pool, set the ruleset number of the pool. Placing Different Pools on Different OSDS: ========================================== @@ -537,7 +538,7 @@ Add/Move an OSD To add or move an OSD in the CRUSH map of a running cluster, execute the following:: - ceph osd crush set {id} {name} {weight} pool={pool-name} [{bucket-type}={bucket-name} ...] + ceph osd crush set {name} {weight} [{bucket-type}={bucket-name} ...] Where: @@ -565,12 +566,12 @@ Where: :Example: ``2.0`` -``pool`` +``root`` -:Description: By default, the CRUSH hierarchy contains the pool default as its root. +:Description: The root of the tree in which the OSD resides. :Type: Key/value pair. :Required: Yes -:Example: ``pool=default`` +:Example: ``root=default`` ``bucket-type`` @@ -584,7 +585,7 @@ Where: The following example adds ``osd.0`` to the hierarchy, or moves the OSD from a previous location. :: - ceph osd crush set 0 osd.0 1.0 pool=data datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1 + ceph osd crush set osd.0 1.0 root=default datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1 Adjust an OSD's CRUSH Weight diff --git a/doc/release-notes.rst b/doc/release-notes.rst index 73bf5eccd47..b17cf0f9576 100644 --- a/doc/release-notes.rst +++ b/doc/release-notes.rst @@ -2,6 +2,278 @@ Release Notes =============== +v0.56 "bobtail" +--------------- + +Bobtail is the second stable release of Ceph, named in honor of the +`Bobtail Squid`: http://en.wikipedia.org/wiki/Bobtail_squid. + +Key features since v0.48 "argonaut" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* Object Storage Daemon (OSD): improved threading, small-io performance, and performance during recovery +* Object Storage Daemon (OSD): regular "deep" scrubbing of all stored data to detect latent disk errors +* RADOS Block Device (RBD): support for copy-on-write clones of images. +* RADOS Block Device (RBD): better client-side caching. +* RADOS Block Device (RBD): advisory image locking +* Rados Gateway (RGW): support for efficient usage logging/scraping (for billing purposes) +* Rados Gateway (RGW): expanded S3 and Swift API coverage (e.g., POST, multi-object delete) +* Rados Gateway (RGW): improved striping for large objects +* Rados Gateway (RGW): OpenStack Keystone integration +* RPM packages for Fedora, RHEL/CentOS, OpenSUSE, and SLES +* mkcephfs: support for automatically formatting and mounting XFS and ext4 (in addition to btrfs) + +Upgrading +~~~~~~~~~ + +Please refer to the document `Upgrading from Argonaut to Bobtail`_ for details. + +.. _Upgrading from Argonaut to Bobtail: ../install/upgrading-ceph/#upgrading-from-argonaut-to-bobtail + +* Cephx authentication is now enabled by default (since v0.55). + Upgrading a cluster without adjusting the Ceph configuration will + likely prevent the system from starting up on its own. We recommend + first modifying the configuration to indicate that authentication is + disabled, and only then upgrading to the latest version.:: + + auth client required = none + auth service required = none + auth cluster required = none + +* Ceph daemons can be upgraded one-by-one while the cluster is online + and in service. + +* The ``ceph-osd`` daemons must be upgraded and restarted *before* any + ``radosgw`` daemons are restarted, as they depend on some new + ceph-osd functionality. (The ``ceph-mon``, ``ceph-osd``, and + ``ceph-mds`` daemons can be upgraded and restarted in any order.) + +* Once each individual daemon has been upgraded and restarted, it + cannot be downgraded. + +* The cluster of ``ceph-mon`` daemons will migrate to a new internal + on-wire protocol once all daemons in the quorum have been upgraded. + Upgrading only a majority of the nodes (e.g., two out of three) may + expose the cluster to a situation where a single additional failure + may compromise availability (because the non-upgraded daemon cannot + participate in the new protocol). We recommend not waiting for an + extended period of time between ``ceph-mon`` upgrades. + +* The ops log and usage log for radosgw are now off by default. If + you need these logs (e.g., for billing purposes), you must enable + them explicitly. For logging of all operations to objects in the + ``.log`` pool (see ``radosgw-admin log ...``):: + + rgw enable ops log = true + + For usage logging of aggregated bandwidth usage (see ``radosgw-admin + usage ...``):: + + rgw enable usage log = true + +* You should not create or use "format 2" RBD images until after all + ``ceph-osd`` daemons have been upgraded. Note that "format 1" is + still the default. You can use the new ``ceph osd ls`` and + ``ceph tell osd.N version`` commands to doublecheck your cluster. + ``ceph osd ls`` will give a list of all OSD IDs that are part of the + cluster, and you can use that to write a simple shell loop to display + all the OSD version strings: :: + + for i in $(ceph osd ls); do + ceph tell osd.${i} version + done + + +Compatibility changes +~~~~~~~~~~~~~~~~~~~~~ + +* The 'ceph osd create [<uuid>]' command now rejects an argument that + is not a UUID. (Previously it would take take an optional integer + OSD id.) This correct syntax has been 'ceph osd create [<uuid>]' + since v0.47, but the older calling convention was being silently + ignored. + +* The CRUSH map root nodes now have type ``root`` instead of type + ``pool``. This avoids confusion with RADOS pools, which are not + directly related. Any scripts or tools that use the ``ceph osd + crush ...`` commands may need to be adjusted accordingly. + +* The ``ceph osd pool create <poolname> <pgnum>`` command now requires + the ``pgnum`` argument. Previously this was optional, and would + default to 8, which was almost never a good number. + +* Degraded mode (when there fewer than the desired number of replicas) + is now more configurable on a per-pool basis, with the min_size + parameter. By default, with min_size 0, this allows I/O to objects + with N - floor(N/2) replicas, where N is the total number of + expected copies. Argonaut behavior was equivalent to having min_size + = 1, so I/O would always be possible if any completely up to date + copy remained. min_size = 1 could result in lower overall + availability in certain cases, such as flapping network partitions. + +* The sysvinit start/stop script now defaults to adjusting the max + open files ulimit to 16384. On most systems the default is 1024, so + this is an increase and won't break anything. If some system has a + higher initial value, however, this change will lower the limit. + The value can be adjusted explicitly by adding an entry to the + ``ceph.conf`` file in the appropriate section. For example:: + + [global] + max open files = 32768 + +* 'rbd lock list' and 'rbd showmapped' no longer use tabs as + separators in their output. + +* There is configurable limit on the number of PGs when creating a new + pool, to prevent a user from accidentally specifying a ridiculous + number for pg_num. It can be adjusted via the 'mon max pool pg num' + option on the monitor, and defaults to 65536 (the current max + supported by the Linux kernel client). + +* The osd capabilities associated with a rados user have changed + syntax since 0.48 argonaut. The new format is mostly backwards + compatible, but there are two backwards-incompatible changes: + + * specifying a list of pools in one grant, i.e. + 'allow r pool=foo,bar' is now done in separate grants, i.e. + 'allow r pool=foo, allow r pool=bar'. + + * restricting pool access by pool owner ('allow r uid=foo') is + removed. This feature was not very useful and unused in practice. + + The new format is documented in the ceph-authtool man page. + +* 'rbd cp' and 'rbd rename' use rbd as the default destination pool, + regardless of what pool the source image is in. Previously they + would default to the same pool as the source image. + +* 'rbd export' no longer prints a message for each object written. It + just reports percent complete like other long-lasting operations. + +* 'ceph osd tree' now uses 4 decimal places for weight so output is + nicer for humans + +* Several monitor operations are now idempotent: + + * ceph osd pool create + * ceph osd pool delete + * ceph osd pool mksnap + * ceph osd rm + * ceph pg <pgid> revert + +Notable changes +~~~~~~~~~~~~~~~ + +* auth: enable cephx by default +* auth: expanded authentication settings for greater flexibility +* auth: sign messages when using cephx +* build fixes for Fedora 18, CentOS/RHEL 6 +* ceph: new 'osd ls' and 'osd tell <osd.N> version' commands +* ceph-debugpack: misc improvements +* ceph-disk-prepare: creates and labels GPT partitions +* ceph-disk-prepare: support for external journals, default mount/mkfs options, etc. +* ceph-fuse/libcephfs: many misc fixes, admin socket debugging +* ceph-fuse: fix handling for .. in root directory +* ceph-fuse: many fixes (including memory leaks, hangs) +* ceph-fuse: mount helper (mount.fuse.ceph) for use with /etc/fstab +* ceph.spec: misc packaging fixes +* common: thread pool sizes can now be adjusted at runtime +* config: $pid is now available as a metavariable +* crush: default root of tree type is now 'root' instead of 'pool' (to avoid confusiong wrt rados pools) +* crush: fixed retry behavior with chooseleaf via tunable +* crush: tunables documented; feature bit now present and enforced +* libcephfs: java wrapper +* librados: several bug fixes (rare races, locking errors) +* librados: some locking fixes +* librados: watch/notify fixes, misc memory leaks +* librbd: a few fixes to 'discard' support +* librbd: fine-grained striping feature +* librbd: fixed memory leaks +* librbd: fully functional and documented image cloning +* librbd: image (advisory) locking +* librbd: improved caching (of object non-existence) +* librbd: 'flatten' command to sever clone parent relationship +* librbd: 'protect'/'unprotect' commands to prevent clone parent from being deleted +* librbd: clip requests past end-of-image. +* librbd: fixes an issue with some windows guests running in qemu (remove floating point usage) +* log: fix in-memory buffering behavior (to only write log messages on crash) +* mds: fix ino release on abort session close, relative getattr path, mds shutdown, other misc items +* mds: misc fixes +* mkcephfs: fix for default keyring, osd data/journal locations +* mkcephfs: support for formatting xfs, ext4 (as well as btrfs) +* init: support for automatically mounting xfs and ext4 osd data directories +* mon, radosgw, ceph-fuse: fixed memory leaks +* mon: improved ENOSPC, fs error checking +* mon: less-destructive ceph-mon --mkfs behavior +* mon: misc fixes +* mon: more informative info about stuck PGs in 'health detail' +* mon: information about recovery and backfill in 'pg <pgid> query' +* mon: new 'osd crush create-or-move ...' command +* mon: new 'osd crush move ...' command lets you rearrange your CRUSH hierarchy +* mon: optionally dump 'osd tree' in json +* mon: configurable cap on maximum osd number (mon max osd) +* mon: many bug fixes (various races causing ceph-mon crashes) +* mon: new on-disk metadata to facilitate future mon changes (post-bobtail) +* mon: election bug fixes +* mon: throttle client messages (limit memory consumption) +* mon: throttle osd flapping based on osd history (limits osdmap ΄thrashing' on overloaded or unhappy clusters) +* mon: 'report' command for dumping detailed cluster status (e.g., for use when reporting bugs) +* mon: osdmap flags like noup, noin now cause a health warning +* msgr: improved failure handling code +* msgr: many bug fixes +* osd, mon: honor new 'nobackfill' and 'norecover' osdmap flags +* osd, mon: use feature bits to lock out clients lacking CRUSH tunables when they are in use +* osd: backfill reservation framework (to avoid flooding new osds with backfill data) +* osd: backfill target reservations (improve performance during recovery) +* osd: better tracking of recent slow operations +* osd: capability grammar improvements, bug fixes +* osd: client vs recovery io prioritization +* osd: crush performance improvements +* osd: default journal size to 5 GB +* osd: experimental support for PG "splitting" (pg_num adjustment for existing pools) +* osd: fix memory leak on certain error paths +* osd: fixed detection of EIO errors from fs on read +* osd: major refactor of PG peering and threading +* osd: many bug fixes +* osd: more/better dump info about in-progress operations +* osd: new caps structure (see compatibility notes) +* osd: new 'deep scrub' will compare object content across replicas (once per week by default) +* osd: new 'lock' rados class for generic object locking +* osd: optional 'min' pg size +* osd: recovery reservations +* osd: scrub efficiency improvement +* osd: several out of order reply bug fixes +* osd: several rare peering cases fixed +* osd: some performance improvements related to request queuing +* osd: use entire device if journal is a block device +* osd: use syncfs(2) when kernel supports it, even if glibc does not +* osd: various fixes for out-of-order op replies +* rados: ability to copy, rename pools +* rados: bench command now cleans up after itself +* rados: 'cppool' command to copy rados pools +* rados: 'rm' now accepts a list of objects to be removed +* radosgw: POST support +* radosgw: REST API for managing usage stats +* radosgw: fix bug in bucket stat updates +* radosgw: fix copy-object vs attributes +* radosgw: fix range header for large objects, ETag quoting, GMT dates, other compatibility fixes +* radosgw: improved garbage collection framework +* radosgw: many small fixes, cleanups +* radosgw: openstack keystone integration +* radosgw: stripe large (non-multipart) objects +* radosgw: support for multi-object deletes +* radosgw: support for swift manifest objects +* radosgw: vanity bucket dns names +* radosgw: various API compatibility fixes +* rbd: import from stdin, export to stdout +* rbd: new 'ls -l' option to view images with metadata +* rbd: use generic id and keyring options for 'rbd map' +* rbd: don't issue usage on errors +* udev: fix symlink creation for rbd images containing partitions +* upstart: job files for all daemon types (not enabled by default) +* wireshark: ceph protocol dissector patch updated + + v0.54 ----- |