Conversation
❌MegaLinter analysis: Error
Detailed Issues❌ SPELL / lychee - 1 errorSee detailed reports in MegaLinter artifacts Your project could benefit from a custom flavor, which would allow you to run only the linters you need, and thus improve runtime performances. (Skip this info by defining
|
99f0866 to
5a08a58
Compare
AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
jklare
left a comment
There was a problem hiding this comment.
some comments, thoughts and questions
| | ceph-ansible (before) | cephadm (after) | | ||
| |:----------------------------------|:----------------------------------------| | ||
| | `osism apply ceph-mons` | `ceph orch apply mon` | | ||
| | `osism apply ceph-mgrs` | `ceph orch apply mgr` | | ||
| | `osism apply ceph-osds` | `ceph orch apply osd` | | ||
| | `osism apply ceph-rgws` | `ceph orch apply rgw` | | ||
| | `osism apply ceph-mdss` | `ceph orch apply mds` | | ||
| | Editing `configuration.yml` | `ceph config set <section> <key> <val>` | | ||
| | `osism apply ceph-rolling_update` | `ceph orch upgrade start --image <img>` | |
There was a problem hiding this comment.
are theres 1:1 mappings or are there any significant difference that we should mention here?
There was a problem hiding this comment.
What does that mean? Are they or are they not 1:1 mappings?
| Ensure that the daemon is still running under the legacy systemd unit before attempting | ||
| adoption. Check with: | ||
|
|
||
| ```bash | ||
| sudo systemctl status ceph-<type>@<id> | ||
| ``` |
There was a problem hiding this comment.
what is the expected outpt and how does the wrong one look like?
- Add known limitations info box at the top of the guide - Add TODOs for backup guidance, safety measures, and readiness checks - Fix misleading wording about systemd (cephadm replaces ceph-ansible, not systemd) - Clarify "manager node" as "OSISM manager node" throughout - Rename RGW/MDS sections from "Adopting" to "Migrating" (they can't be adopted in-place) - Move SSL setup before the RGW deploy command - Add concrete instructions for RGW service ID, placement, and port - Add concrete commands for reverting an adopted daemon AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
Add example command outputs for prepare-host, host registration, daemon adoption (mon, mgr, osd), and verification steps. Clarify that certain commands only need to run on one monitor node. Use existing operator SSH key by default. Move crash daemon migration before RGW/MDS. Store autoscale pool list in /home/dragon/ to survive reboots. AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
- Replace manual placeholder variables with automated commands using osism/ceph/radosgw-admin CLI for RGW, MDS, mon, and mgr sections - Fix RGW migration order: stop legacy daemons before deploying new ones to avoid port conflicts, migrate sequentially node by node - Add info box explaining MDS standby coexistence during migration - Split ceph orch ps calls to use --daemon-type filters with matching example outputs - Fix systemd cleanup to explicitly list legacy units instead of using a wildcard that would also remove cephadm-managed units AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
Add detailed explanations for prepare-host checks, daemon adoption processes (MON/MGR/OSD), OSD managed state transition, small vs large environment adoption strategy, UCA package workaround context, proxy error example, and verification cross-references. AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
| TODO: Add specific guidance on recommended backup strategies (e.g. cluster-wide backups | ||
| to a separate cluster) and concrete safety measures to take before starting the migration. | ||
|
|
||
| ::: | ||
|
|
||
| :::info Known limitations | ||
|
|
||
| This guide is a work in progress. The following areas are **not yet covered or tested**: | ||
|
|
||
| * **Multi-site RGW**: Only single-site RGW deployments have been tested. Multi-site migration instructions will be added in a future update. | ||
| * **Backup and safety measures**: Specific guidance on recommended backup strategies and concrete pre-migration safety measures is still being prepared. |
There was a problem hiding this comment.
The TODO seems redundant considering that "Known limitations" talks again about the very same thing.
|
|
||
| * A running Ceph cluster deployed with ceph-ansible via OSISM. | ||
| * All Ceph daemons are healthy (`ceph -s` reports `HEALTH_OK` or only expected warnings). | ||
| * SSH access to all Ceph nodes from the node where cephadm will be run. |
There was a problem hiding this comment.
How does the user know at this point which node that is? We have not told them anything yet.
| ## Step 1: Verify cluster health | ||
|
|
||
| Before starting the migration, ensure the cluster is in a healthy state. Run the following | ||
| commands on the OSISM manager node . |
There was a problem hiding this comment.
| commands on the OSISM manager node . | |
| commands on the OSISM manager node. |
| ## Prerequisites | ||
|
|
||
| * A running Ceph cluster deployed with ceph-ansible via OSISM. | ||
| * All Ceph daemons are healthy (`ceph -s` reports `HEALTH_OK` or only expected warnings). |
There was a problem hiding this comment.
Not sure if this should be in Prerequisites if we are going to check the same thing again in step 1.
|
|
||
| ### OSD nodes | ||
|
|
||
| The UCA cephadm package for Reef (18.2.4) contains a |
There was a problem hiding this comment.
So the bug only affects Reef but we install for Quincy and Squid from the Ceph Git repo as well. Is that correct?
| In a typical single-site deployment, the default values are `default` for all three. | ||
| The service ID for the `ceph orch apply rgw` command is composed as | ||
| `<realm_name>.<zone_name>` (e.g. `default.default`). |
There was a problem hiding this comment.
In my testbed, the realm is empty. The code below for setting RGW_REALM handles this, but the description above may confuse the user.
| service_type: osd | ||
| service_id: default_drive_group | ||
| placement: | ||
| host_pattern: '*' | ||
| data_devices: | ||
| paths: | ||
| - /dev/sdb | ||
| - /dev/sdc |
There was a problem hiding this comment.
Yes, such a file can be built from ceph osd tree and ceph device ls, but it may not be obvious to users who have not done it before. I wonder if we can offer them some help (more information or a script).
| Apply the spec: | ||
|
|
||
| ```bash | ||
| ceph orch apply -i osd_spec.yaml |
There was a problem hiding this comment.
This would need the same pattern used for the ssh keys to mount the file into the container, otherwise the file cannot be found. Something like:
cp osd_spec.yaml /opt/cephclient/data/
ceph orch apply -i /data/osd_spec.yaml
rm /opt/cephclient/data/osd_spec.yaml
But even with this, "ceph orch ls" showed my osds as "unmanaged" (plus a managed osd.default_drive_group with 0 members). Maybe we should advise users to leave the osds unmanaged for now.
| 1. Remove old systemd unit files that are no longer used. Do **not** use a wildcard | ||
| like `ceph-*.service` as this would also remove the cephadm-managed units | ||
| (e.g. `ceph-<FSID>@.service`). Remove only the legacy units: |
There was a problem hiding this comment.
We should probably mention on which nodes these commands are to be executed.
|
|
||
| | ceph-ansible (before) | cephadm (after) | | ||
| |:----------------------------------|:----------------------------------------| | ||
| | `osism apply ceph-mons` | `ceph orch apply mon` | |
There was a problem hiding this comment.
What happens if a user issues an old osism apply ceph-* command out of habit? Ideally, running osism apply ceph-mons should now tell the user something like "Please run ceph orch apply mon", but the command looks like it is running as before. Maybe we should tell users if this is is dangerous or not and if they can safely abort the commands once they realize their mistake.
|
For some reason, the Conversation page shows only about half of my review comments, the other half can be found in https://github.com/osism/osism.github.io/pull/957/changes (verified as anonymous user). |
| ```bash | ||
| CEPH_RELEASE=$(docker inspect $(docker ps --filter "name=ceph" --format "{{.Names}}" | head -1) --format '{{.Config.Image}}' | cut -d: -f2) | ||
| curl --silent --remote-name --location https://raw.githubusercontent.com/ceph/ceph/${CEPH_RELEASE}/src/cephadm/cephadm.py | ||
| chmod +x cephadm.py |
There was a problem hiding this comment.
Add missing v here: https://github.com/ceph/ceph/tags
| ceph cephadm set-pub-key -i /data/id_rsa.operator.pub | ||
| rm /opt/cephclient/data/id_rsa.operator* | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Maybe add here how to generate id_rsa.operator.pub if only id_rsa.operator exists.
| Or use a loop to register all Ceph nodes at once: | ||
|
|
||
| ```bash | ||
| for node in $(osism get hosts -l ceph | awk 'NR>3 && /\|/ {print $2}'); do |
There was a problem hiding this comment.
Check first if osism get hosts is usable (it's a pretty new command).

AI-assisted: Claude Code