Skip to content

Add migration guide from ceph-ansible to cephadm#957

Open
berendt wants to merge 5 commits intomainfrom
cephadm
Open

Add migration guide from ceph-ansible to cephadm#957
berendt wants to merge 5 commits intomainfrom
cephadm

Conversation

@berendt
Copy link
Copy Markdown
Member

@berendt berendt commented Apr 1, 2026

AI-assisted: Claude Code

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

MegaLinter analysis: Error

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ ACTION actionlint 3 0 0 0.04s
✅ JSON jsonlint 4 0 0 0.09s
✅ JSON prettier 4 0 0 0.44s
✅ JSON v8r 4 0 0 6.4s
✅ MARKDOWN markdownlint 155 0 0 2.36s
✅ MARKDOWN markdown-table-formatter 155 0 0 0.43s
✅ REPOSITORY checkov yes no no 16.87s
✅ REPOSITORY git_diff yes no no 0.04s
✅ REPOSITORY secretlint yes no no 1.54s
✅ REPOSITORY trufflehog yes no no 3.86s
✅ SPELL codespell 163 0 0 0.49s
❌ SPELL lychee 163 1 0 48.03s
✅ YAML prettier 4 0 0 0.35s
✅ YAML v8r 4 0 0 5.18s
✅ YAML yamllint 4 0 0 0.47s

Detailed Issues

❌ SPELL / lychee - 1 error
[TIMEOUT] https://adam.younglogic.com/2022/03/generating-a-clouds-yaml-file | Timeout
📝 Summary
---------------------
🔍 Total..........788
✅ Successful.....735
⏳ Timeouts.........1
🔀 Redirected.......0
👻 Excluded........52
❓ Unknown..........0
🚫 Errors...........0

Errors in docs/guides/user-guide/openstack/openstackclient.md
[TIMEOUT] https://adam.younglogic.com/2022/03/generating-a-clouds-yaml-file | Timeout

See detailed reports in MegaLinter artifacts

Your project could benefit from a custom flavor, which would allow you to run only the linters you need, and thus improve runtime performances. (Skip this info by defining FLAVOR_SUGGESTIONS: false)

  • Documentation: Custom Flavors
  • Command: npx mega-linter-runner@9.4.0 --custom-flavor-setup --custom-flavor-linters ACTION_ACTIONLINT,JSON_JSONLINT,JSON_V8R,JSON_PRETTIER,MARKDOWN_MARKDOWNLINT,MARKDOWN_MARKDOWN_TABLE_FORMATTER,REPOSITORY_CHECKOV,REPOSITORY_GIT_DIFF,REPOSITORY_SECRETLINT,REPOSITORY_TRUFFLEHOG,SPELL_LYCHEE,SPELL_CODESPELL,YAML_PRETTIER,YAML_YAMLLINT,YAML_V8R

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

@berendt berendt force-pushed the cephadm branch 3 times, most recently from 99f0866 to 5a08a58 Compare April 2, 2026 06:53
AI-assisted: Claude Code

Signed-off-by: Christian Berendt <berendt@osism.tech>
Copy link
Copy Markdown
Contributor

@jklare jklare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments, thoughts and questions

Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment on lines +369 to +377
| ceph-ansible (before) | cephadm (after) |
|:----------------------------------|:----------------------------------------|
| `osism apply ceph-mons` | `ceph orch apply mon` |
| `osism apply ceph-mgrs` | `ceph orch apply mgr` |
| `osism apply ceph-osds` | `ceph orch apply osd` |
| `osism apply ceph-rgws` | `ceph orch apply rgw` |
| `osism apply ceph-mdss` | `ceph orch apply mds` |
| Editing `configuration.yml` | `ceph config set <section> <key> <val>` |
| `osism apply ceph-rolling_update` | `ceph orch upgrade start --image <img>` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are theres 1:1 mappings or are there any significant difference that we should mention here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to do here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean? Are they or are they not 1:1 mappings?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1:1 mappings.

Comment on lines +386 to +391
Ensure that the daemon is still running under the legacy systemd unit before attempting
adoption. Check with:

```bash
sudo systemctl status ceph-<type>@<id>
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected outpt and how does the wrong one look like?

Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Copy link
Copy Markdown
Contributor

@jklare jklare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md
Comment thread docs/guides/migration-guide/cephadm/index.md Outdated
Comment thread docs/guides/migration-guide/cephadm/index.md
berendt added 3 commits April 2, 2026 16:07
- Add known limitations info box at the top of the guide
- Add TODOs for backup guidance, safety measures, and readiness checks
- Fix misleading wording about systemd (cephadm replaces ceph-ansible, not systemd)
- Clarify "manager node" as "OSISM manager node" throughout
- Rename RGW/MDS sections from "Adopting" to "Migrating" (they can't be adopted in-place)
- Move SSL setup before the RGW deploy command
- Add concrete instructions for RGW service ID, placement, and port
- Add concrete commands for reverting an adopted daemon

AI-assisted: Claude Code

Signed-off-by: Christian Berendt <berendt@osism.tech>
Add example command outputs for prepare-host, host registration, daemon
adoption (mon, mgr, osd), and verification steps. Clarify that certain
commands only need to run on one monitor node. Use existing operator SSH
key by default. Move crash daemon migration before RGW/MDS. Store
autoscale pool list in /home/dragon/ to survive reboots.

AI-assisted: Claude Code

Signed-off-by: Christian Berendt <berendt@osism.tech>
- Replace manual placeholder variables with automated commands using
  osism/ceph/radosgw-admin CLI for RGW, MDS, mon, and mgr sections
- Fix RGW migration order: stop legacy daemons before deploying new ones
  to avoid port conflicts, migrate sequentially node by node
- Add info box explaining MDS standby coexistence during migration
- Split ceph orch ps calls to use --daemon-type filters with matching
  example outputs
- Fix systemd cleanup to explicitly list legacy units instead of using
  a wildcard that would also remove cephadm-managed units

AI-assisted: Claude Code

Signed-off-by: Christian Berendt <berendt@osism.tech>
@berendt berendt requested a review from jklare April 8, 2026 10:01
Add detailed explanations for prepare-host checks, daemon adoption
processes (MON/MGR/OSD), OSD managed state transition, small vs large
environment adoption strategy, UCA package workaround context, proxy
error example, and verification cross-references.

AI-assisted: Claude Code

Signed-off-by: Christian Berendt <berendt@osism.tech>
Comment on lines +17 to +27
TODO: Add specific guidance on recommended backup strategies (e.g. cluster-wide backups
to a separate cluster) and concrete safety measures to take before starting the migration.

:::

:::info Known limitations

This guide is a work in progress. The following areas are **not yet covered or tested**:

* **Multi-site RGW**: Only single-site RGW deployments have been tested. Multi-site migration instructions will be added in a future update.
* **Backup and safety measures**: Specific guidance on recommended backup strategies and concrete pre-migration safety measures is still being prepared.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO seems redundant considering that "Known limitations" talks again about the very same thing.


* A running Ceph cluster deployed with ceph-ansible via OSISM.
* All Ceph daemons are healthy (`ceph -s` reports `HEALTH_OK` or only expected warnings).
* SSH access to all Ceph nodes from the node where cephadm will be run.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the user know at this point which node that is? We have not told them anything yet.

## Step 1: Verify cluster health

Before starting the migration, ensure the cluster is in a healthy state. Run the following
commands on the OSISM manager node .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
commands on the OSISM manager node .
commands on the OSISM manager node.

## Prerequisites

* A running Ceph cluster deployed with ceph-ansible via OSISM.
* All Ceph daemons are healthy (`ceph -s` reports `HEALTH_OK` or only expected warnings).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should be in Prerequisites if we are going to check the same thing again in step 1.


### OSD nodes

The UCA cephadm package for Reef (18.2.4) contains a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the bug only affects Reef but we install for Quincy and Squid from the Ceph Git repo as well. Is that correct?

Comment on lines +660 to +662
In a typical single-site deployment, the default values are `default` for all three.
The service ID for the `ceph orch apply rgw` command is composed as
`<realm_name>.<zone_name>` (e.g. `default.default`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testbed, the realm is empty. The code below for setting RGW_REALM handles this, but the description above may confuse the user.

Comment on lines +859 to +866
service_type: osd
service_id: default_drive_group
placement:
host_pattern: '*'
data_devices:
paths:
- /dev/sdb
- /dev/sdc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, such a file can be built from ceph osd tree and ceph device ls, but it may not be obvious to users who have not done it before. I wonder if we can offer them some help (more information or a script).

Apply the spec:

```bash
ceph orch apply -i osd_spec.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need the same pattern used for the ssh keys to mount the file into the container, otherwise the file cannot be found. Something like:

cp osd_spec.yaml /opt/cephclient/data/
ceph orch apply -i /data/osd_spec.yaml
rm /opt/cephclient/data/osd_spec.yaml

But even with this, "ceph orch ls" showed my osds as "unmanaged" (plus a managed osd.default_drive_group with 0 members). Maybe we should advise users to leave the osds unmanaged for now.

Comment on lines +966 to +968
1. Remove old systemd unit files that are no longer used. Do **not** use a wildcard
like `ceph-*.service` as this would also remove the cephadm-managed units
(e.g. `ceph-<FSID>@.service`). Remove only the legacy units:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably mention on which nodes these commands are to be executed.


| ceph-ansible (before) | cephadm (after) |
|:----------------------------------|:----------------------------------------|
| `osism apply ceph-mons` | `ceph orch apply mon` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user issues an old osism apply ceph-* command out of habit? Ideally, running osism apply ceph-mons should now tell the user something like "Please run ceph orch apply mon", but the command looks like it is running as before. Maybe we should tell users if this is is dangerous or not and if they can safely abort the commands once they realize their mistake.

@ideaship
Copy link
Copy Markdown
Contributor

ideaship commented Apr 9, 2026

For some reason, the Conversation page shows only about half of my review comments, the other half can be found in https://github.com/osism/osism.github.io/pull/957/changes (verified as anonymous user).

```bash
CEPH_RELEASE=$(docker inspect $(docker ps --filter "name=ceph" --format "{{.Names}}" | head -1) --format '{{.Config.Image}}' | cut -d: -f2)
curl --silent --remote-name --location https://raw.githubusercontent.com/ceph/ceph/${CEPH_RELEASE}/src/cephadm/cephadm.py
chmod +x cephadm.py
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing v here: https://github.com/ceph/ceph/tags

ceph cephadm set-pub-key -i /data/id_rsa.operator.pub
rm /opt/cephclient/data/id_rsa.operator*
```

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add here how to generate id_rsa.operator.pub if only id_rsa.operator exists.

Or use a loop to register all Ceph nodes at once:

```bash
for node in $(osism get hosts -l ceph | awk 'NR>3 && /\|/ {print $2}'); do
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check first if osism get hosts is usable (it's a pretty new command).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants