Skip to content

feat: add registry role for disconnected deployment#866

Draft
fabiendupont wants to merge 9 commits intoseapath:mainfrom
fabiendupont:feat/add-registry-role
Draft

feat: add registry role for disconnected deployment#866
fabiendupont wants to merge 9 commits intoseapath:mainfrom
fabiendupont:feat/add-registry-role

Conversation

@fabiendupont
Copy link
Copy Markdown

@fabiendupont fabiendupont commented Feb 18, 2026

The current disconnected setup embeds container images at OS build time (e.g. via build_debian_iso), which works well for initial deployment. However, day-2 operations — upgrading Ceph, rolling out new container images, or adding services — require either repackaging the ISO or manually transferring images to each node. A local registry provides a persistent, updatable image source that's independent of the installation media, and aligns with Ceph's recommended approach for isolated environments.

This commit introduces a registry role that deploys docker.io/registry:v2 and allows importing images from internet (pull) or from an exported tarball (load). The seapath_setup_disconnected.yaml playbook installs the registry on the Ansible control node as a singleton.

TLS is enabled by default: the registry auto-generates a self-signed CA and server certificate when no user-provided certs are given. The CA is distributed to all cluster nodes so they trust the registry over HTTPS. The registry listens on port 443 to avoid specifying the port in image names.

The *_physical_machine roles are updated to use that registry as a mirror, which doesn't require changing the images names, both for Docker and Podman. They install the registry CA certificate in certs.d and set insecure = false when TLS is enabled.

The cephadm role is updated to remove image management, which is now handled by the registry role, so cephadm is focused on Ceph cluster management.

Contributes to #442

@insatomcat
Copy link
Copy Markdown
Member

insatomcat commented Feb 21, 2026

Thanks for the PR, this is an interesting and well-structured proposal 👍

A few points I’d like to clarify and discuss.


1️⃣ Fully disconnected is already possible in the current setup

In the current implementation, it is possible to be fully disconnected, provided that images are made available at OS installation time.

For example, with build_debian_iso on Debian:

  • When the ISO is built (with internet access), required container images are loaded into the ISO.
  • During installation (without internet), those images are deployed locally.
  • No external pull is required afterward at the OS level.

I assume a similar approach is feasible for:

  • Red Hat Enterprise Linux–like distributions (at ISO/image build stage),
  • or Yocto-based images (embedding container images at image generation time).

So strictly speaking, the setup is not inherently “internet-dependent” if the images are preloaded properly.


2️⃣ The real issue: cephadm’s pull behavior

The actual difficulty is not the base OS installation, but the behavior of cephadm.

Even if images are already present locally:

  • The bootstrap command allows skipping certain pulls.
  • However, later lifecycle events (deploying osd, mon, mgr, etc.) still trigger a podman pull check from the cephadm mgr.

see https://marc.info/?l=ceph-users&m=164399318917018

To be truly disconnected, we therefore need:

  • Either a local registry on each node (current setup),
  • Or a central registry (as proposed in the PR).

Before deciding on registry topology, I would really like to confirm something:

Is there absolutely no way to completely skip the podman pull check that cephadm performs when deploying components?

If such an option exists (or could exist), we could:

  • Preload all images at OS installation time (as done with build_debian_iso),
  • Avoid any registry entirely,
  • And remain fully disconnected without additional infrastructure.

Right now, the registry requirement seems to stem from cephadm enforcing the pull validation step.

If you have more information on whether this behavior is configurable or patchable, that would be very helpful.


3️⃣ Registry location: node-local vs controller-based

Regarding the architectural choice:

  • Current approach: registry on each node.
  • PR proposal: single registry on the Ansible controller.

Both are technically valid trade-offs:

  • Node-local registry → more autonomous nodes, no central dependency.
  • Controller-based registry → simpler, more resource-efficient, centralized management.

From my perspective, either:

  • The PR supports both models and lets the user choose,
  • Or we align on a community-level decision about the preferred architecture.

But I think we should make that decision explicitly rather than implicitly switching models.


Summary

  • Fully disconnected installs are already achievable if images are embedded at OS build time.
  • The real blocker is cephadm’s pull behavior.
  • If we could completely disable pull checks, we might not need a registry at all.
  • Otherwise, we need to consciously decide between distributed vs centralized registry architecture (or support both).

Looking forward to your feedback, especially regarding cephadm’s pull enforcement.

@fabiendupont
Copy link
Copy Markdown
Author

Thanks for the detailed review and the questions.

On point 1 — Fully disconnected is already possible

You're right, and I should have been clearer about the motivation. The initial deployment is already covered by embedding images at OS build time (e.g. build_debian_iso). his PR offers an alternative approach and addresses day-2 operations: upgrading Ceph, rolling out new container images, or adding services currently requires either repackaging the ISO or manually transferring images to each node. A registry provides a persistent, updatable image source that's independent of the installation media.

I've updated the commit message and PR description to reflect this.

On point 2 — Cephadm's pull behavior

Good question. From what I could find, cephadm bootstrap does have a --skip-pull flag, but it only covers the bootstrap step itself — the mgr module may still attempt pulls during subsequent daemon operations. There's also mgr/cephadm/use_repo_digest (see ceph/ceph#50311) which can reduce pull attempts when images are already local.

That said, Ceph's own documentation for isolated environments points toward using a local registry as the supported path. A preload-only approach may work in practice, but registries are still predominant in the container space.

With this PR, we add an alternative and follow Ceph's documentation for disconnected environments.

On point 3 — Registry topology

Supporting both models makes sense. The registry role as written is already fairly decoupled — it deploys a registry wherever you point it. Making it work as either a centralized controller-based registry or a per-node local registry would mainly be a matter of inventory configuration and playbook targeting.

One argument for a centralized registry is that it doesn't become a noisy neighbor on cluster nodes, which already need to carve resources for Ceph itself, pacemaker, etc... reducing the resources available for vIEDs.

@fabiendupont fabiendupont force-pushed the feat/add-registry-role branch from 5799123 to 043937e Compare February 24, 2026 08:21
@insatomcat
Copy link
Copy Markdown
Member

Thanks for the clarification and for updating the commit message — I agree that the day-2 operations aspect (Ceph upgrades, new images, additional services) is a valid motivation for introducing a registry.

That said, my concern is not only about the description, but about the scope and positioning of the PR.

With the current implementation, we are already able to support a fully disconnected deployment by embedding container images at OS build time (e.g. via build_debian_iso). The registry is therefore not a prerequisite for “disconnected deployment”, but rather an additional mechanism that improves operational flexibility for day-2.

In this PR, we are not just adding the option of running a registry on the Ansible control node — we are also:

  • Introducing a new seapath-cluster-disconnected.yaml playbook
  • Introducing a dedicated seapath-cluster-disconnected.yaml inventory
  • Adding a full "SEAPATH Disconnected Deployment Guide"

This effectively reframes the disconnected model around the registry-based approach, whereas in reality:

  • Disconnected deployment is already possible without a persistent registry.
  • The registry on the nodes during installation is temporary.
  • A persistent registry on the controller is an optional architectural choice for day-2 convenience.

I think the PR would be clearer and more aligned with the existing design if it focused strictly on:

Adding the possibility to deploy a persistent registry on the Ansible controller, and letting the user choose whether to use:

  • preloaded images only (current model), or
  • a persistent local registry for day-2 operations.

The documentation could then explain:

  • The two approaches (embedded images vs persistent registry),
  • Their respective pros and cons,
  • The lifecycle implications (installation-time vs day-2),
  • How to enable either model via the inventory.

In other words, I believe this should be presented as an optional enhancement to the existing disconnected strategy, not as a new disconnected deployment model.

Let me know what you think.

@eroussy
Copy link
Copy Markdown
Member

eroussy commented Mar 10, 2026

Hi @fabiendupont

I agree with insatomcat here, that this CEPH container deployment method should come along with the previously available embedded containers on SEAPATH.
This would imply

  • Creating a variable like setup_ansible_cephadm_registry to control if we create that repository or not. It could be to false by default.
  • Removing your seapath-cluster-disconnected.yaml inventory. The variables must be documented in the associated roles, or in the classic seapath-setup-cluster.yaml inventory if they are necessary
  • Removing the seapath_setup_disconnected.yaml playbook. The registry setup role should be applied directly in the seapath_setup_main.yaml playbook, if the user choose to setup this registry ( setup_ansible_cephadm_registry)
  • Making all the registry tasks in the physical machines roles optional ( setup_ansible_cephadm_registry)
  • Making the registry part of cephadm role optional ( setup_ansible_cephadm_registry)
  • Moving DISCONNECTED_DEPLOYEMENT documentation in cephadm role and in the SEAPATH wiki

What do you think ?
Can you do that ? Otherwise, I have some available time in the upcoming days to work on that.

@fabiendupont
Copy link
Copy Markdown
Author

@insatomcat, @eroussy, sorry for the delay. I had to deal with other projects.
I am out-of-office the next 10 days, so if you have time to propose an alternative, feel free.

@eroussy
Copy link
Copy Markdown
Member

eroussy commented Mar 17, 2026

After some consideration, I don't think it is SEAPATH's role to provide full management of a container registry on the Ansible machine.
Many scripts are provided (backup_registry, export_image, restore_registry ...) for registry management, but this should be the user's responsibility.

If the user wants a fully functional, manageable container registry, they should handle it themselves.
In SEAPATH, we should only provide a very basic registry to transmit the Ceph image to the hypervisors for users who don't already have a registry.
I will propose something in that direction, at least for the first implementation.

@eroussy eroussy force-pushed the feat/add-registry-role branch from 043937e to 8dd8d45 Compare March 19, 2026 14:46
@eroussy
Copy link
Copy Markdown
Member

eroussy commented Mar 19, 2026

Here is my proposition for this subject.

After some discussions, we advise not to handle the registry part in SEAPATH at all. This is the user's concern and we will explain newcomers how to deploy one in the wiki.
I refactored the rest of the original core in roles and tested that on SEAPATH Yocto for now. I still need to do some tests before merging.

For the record, this is how I deployed the registry (insecure for now):

cqfd -b pull_ceph
podman pull docker.io/library/registry:2
sudo podman run --privileged -p 443:5000 -v ./files/registry_config.yml:/etc/docker/registry/config.yml:ro registry:2

podman tag quay.io/ceph/ceph:v20.2.0 <my-ip>:443/ceph/ceph:v20.2.0
podman push --tls-verify=false <my-ip>:443/ceph/ceph:v20.2.0

Then the registry config file I used :

version: 0.1
log:
  level: info
  fields:
    service: registry
storage:
  filesystem:
    rootdirectory: /var/lib/registry
  cache:
    blobdescriptor: inmemory
  delete:
    enabled: true
http:
  addr: 0.0.0.0:5000
  headers:
    X-Content-Type-Options: [nosniff]
    X-Frame-Options: [DENY]
    X-XSS-Protection: [1; mode=block]
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3

Comment thread inventories/examples/seapath-cluster.yaml Outdated
eroussy and others added 9 commits March 30, 2026 15:29
Create a role to add a Podman registry mirror on the SEAPATH machine.
Add the ability to enable TLS certificates.

Add validate step and Molecule tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Fabien Dupont <fdupont@redhat.com
Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Add the capability to connect to an external registry to fetch the Ceph
image.
This is optional, but activated y default. If the associated variable is
activated, the images will be fetched from a localhost registry that
should pre-exists on the machine.

Also add a validate.yaml to check for variable definition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Fabien Dupont <fdupont@redhat.com>
Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Integrate podman_registry_mirror to the main playbook.
They are only called if the registry url is given, otherwise it fallback
to the local repository method.

Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
This is now the default method.
Cephadm deployment is also the default method now.

Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Add a flavor to pull the ceph image and store it in files.

Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Correct a when condition that was also targeting ceph-ansible services.

Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Rename hostname and cluster_ip_addr variable to follow the role prefix
rule.
Link them to the global variables in the inventory as discussed in seapath#903

Also add a validation step for seapath_distro that needs to be set for
this role.

Signed-off-by: Erwann Roussy <erwann.roussy@savoirfairelinux.com>
Copilot AI review requested due to automatic review settings March 30, 2026 13:32
@eroussy eroussy force-pushed the feat/add-registry-role branch from 8dd8d45 to ef8bfc6 Compare March 30, 2026 13:32
@eroussy eroussy marked this pull request as draft March 30, 2026 13:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve disconnected/isolated deployments by introducing Podman registry mirroring configuration and adjusting Cephadm to work with registry-based image flows (including a localhost fallback), alongside inventory/playbook updates to wire these changes into the main setup flow.

Changes:

  • Add a new podman_registry_mirror role (with Molecule scenarios) to configure docker.io/quay.io mirroring and optional TLS CA installation.
  • Refactor cephadm role to validate required host vars, use cephadm_hostname/cephadm_ip_addr, and support a localhost registry mode via local_registry_mirroring.yml.
  • Add a Cephadm purge playbook and update example inventory / main setup playbook to enable the new flow.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
roles/podman_registry_mirror/tasks/validate.yml Validates mirror inputs (URL + optional CA path).
roles/podman_registry_mirror/tasks/main.yml Configures /etc/containers/registries.conf mirroring + CA installation under certs.d.
roles/podman_registry_mirror/defaults/main.yml Adds default variables for the new role.
roles/podman_registry_mirror/meta/main.yml Role metadata for Galaxy/Ansible.
roles/podman_registry_mirror/README.md Documents usage and variables.
roles/podman_registry_mirror/molecule/** Adds Molecule coverage for TLS and non-TLS scenarios.
roles/cephadm/templates/spec.yaml.j2 Switch OSD host targeting to cephadm_hostname.
roles/cephadm/tasks/validate.yml Adds validation for required Cephadm host/global variables.
roles/cephadm/tasks/main.yml Removes embedded image/registry management from main flow; uses registry selection and localhost mirroring import.
roles/cephadm/tasks/local_registry_mirroring.yml New helper tasks to run/push to a localhost registry for Ceph image.
roles/cephadm/defaults/main.yml Adds defaults for new Cephadm-related variables and localhost registry toggle.
roles/cephadm/README.md Updates documentation to reflect new required host vars and registry mode.
roles/ceph_expansion_lv/tasks/main.yml Tightens service matching for cephadm OSD units.
playbooks/seapath_setup_podman_registry_mirror.yaml New playbook to apply Podman mirror role to hosts.
playbooks/seapath_setup_main.yaml Wires the new mirror playbook into the main setup sequence and fixes naming.
playbooks/cluster_setup_cephadm.yaml Sets cephadm_use_localhost_registry based on container_registry_url presence.
playbooks/purge_ceph_cephadm.yaml New purge playbook for Cephadm-managed clusters.
inventories/examples/seapath-cluster.yaml Updates example vars for cephadm + registry mirroring.
.gitignore Ensures new purge playbook isn’t ignored.
.cqfdrc Adds a helper flavor to pull and export a Ceph container image tarball.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +37
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}
[[registry.mirror]]
location = "{{ podman_registry_mirror_url }}"
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}

[[registry]]
location = "quay.io"
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When TLS is disabled, this block sets insecure = true for the primary registries (docker.io / quay.io) as well as for the mirror. Marking upstream registries as insecure can cause Podman to allow plain HTTP / skip TLS verification when falling back to the upstream, which is a security regression. Keep upstream insecure as false and only toggle insecure on the mirror entry based on podman_registry_mirror_tls_enabled.

Suggested change
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}
[[registry.mirror]]
location = "{{ podman_registry_mirror_url }}"
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}
[[registry]]
location = "quay.io"
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}
insecure = false
[[registry.mirror]]
location = "{{ podman_registry_mirror_url }}"
insecure = {{ 'false' if podman_registry_mirror_tls_enabled else 'true' }}
[[registry]]
location = "quay.io"
insecure = false

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +55
- name: Import seapath_setup_podman_registry_mirror playbook
import_playbook: seapath_setup_podman_registry_mirror.yaml

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/title mention introducing a dedicated registry role (docker.io/registry:v2) with TLS on port 443 and CA distribution. In this change set, only a Podman mirror configuration role is added; there is no role/playbook that actually deploys a registry service. Either the registry deployment role is missing from the PR, or the PR description should be adjusted to match the delivered changes.

Copilot uses AI. Check for mistakes.
block: |
[[registry]]
insecure = true
location = "localhost"
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insecure registry entry is configured as location = "localhost", but the actual registry is used as localhost:5000/.... In registries.conf v2, the location must match the host:port to apply, otherwise Podman will still treat localhost:5000 as secure and push/pull can fail. Update the entry to use location = "localhost:5000" (or the appropriate port).

Suggested change
location = "localhost"
location = "localhost:5000"

Copilot uses AI. Check for mistakes.
Comment thread roles/podman_registry_mirror/tasks/validate.yml
@@ -48,7 +48,6 @@ all:
ansible_remote_tmp: /tmp/.ansible/tmp
ansible_user: ansible
ip_addr: "{{ ansible_host }}"
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example inventory no longer defines the hostname hostvar (it used to default to inventory_hostname). However deploy_cephfs still expects hostvars[...]['hostname'] when building the monitor list, and cluster_setup_cephadm.yaml includes deploy_cephfs. Re-add hostname: "{{ inventory_hostname }}" here, or update the roles to stop depending on hostname.

Suggested change
ip_addr: "{{ ansible_host }}"
ip_addr: "{{ ansible_host }}"
hostname: "{{ inventory_hostname }}"

Copilot uses AI. Check for mistakes.
node3:
vars:
# Registry for ceph container image on ansible machine
# You can remove thoses two variables if you are on Debian. It will fallback to ceph image embedded in SEAPATH Debian iso.
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "thoses" should be "those".

Suggested change
# You can remove thoses two variables if you are on Debian. It will fallback to ceph image embedded in SEAPATH Debian iso.
# You can remove those two variables if you are on Debian. It will fallback to ceph image embedded in SEAPATH Debian iso.

Copilot uses AI. Check for mistakes.
Comment thread playbooks/purge_ceph_cephadm.yaml
Comment thread roles/cephadm/tasks/validate.yml
containers.podman.podman_image:
name: quay.io/ceph/ceph:v{{ cephadm_release }}
push: true
push_args: localhost:5000/ceph:v{{ cephadm_release }}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containers.podman.podman_image push is configured with push_args as a string destination. In current containers.podman this parameter expects a dict, with the destination provided via push_args.dest/destination. As-is, this push step is likely to fail and the localhost registry won't contain the Ceph image.

Suggested change
push_args: localhost:5000/ceph:v{{ cephadm_release }}
push_args:
dest: "localhost:5000/ceph:v{{ cephadm_release }}"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants