Skip to content

contribute debian#89

Merged
fenekku merged 10 commits intomasterfrom
contribute_debian
Apr 2, 2026
Merged

contribute debian#89
fenekku merged 10 commits intomasterfrom
contribute_debian

Conversation

@fenekku
Copy link
Copy Markdown
Contributor

@fenekku fenekku commented Mar 3, 2026

  • feat: contribute Debian Dockerfile
  • feat: add dependabot checking

This is to move the recent Docker discussions to more concrete territory. I've outlined some of the current limitations and areas where more thoughts are welcome. Maybe you got something else from those discussions, so please share/correct!

Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread debian/Dockerfile Outdated
Copy link
Copy Markdown
Contributor

@mfenner mfenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use trixie (debian 13) as the Debian version.

My Debian Dockerfile looks rather different for mutliple reasons, mainly because it builds the finished image in one Dockerfile and is a multi-stage image. There is a long list of dev depencies installed which are not needed in the final runtime image.

Installing a single node version rather than nvm makes sense to me.

The Dockerfile for InvenioRDM Starter is at https://github.com/front-matter/invenio-rdm-starter/blob/main/Dockerfile

@Samk13
Copy link
Copy Markdown
Member

Samk13 commented Mar 3, 2026

Thanks for this!
I have two broader alignment questions:

  1. Runtime vs. build tooling:
    Should the default foundational image we publish be strictly runtime-only (shipping just the virtualenv and built static artifacts), or should it also include build tooling like Node and uv for downstream customization?

In the Alpine variant, I moved all build tooling to a separate builder stage and copied only the artifacts into runtime. Do we want that to be the standard pattern across all OS images?

  1. Foundation image contract:
    Should we define a clear contract for what a "foundation image" guarantees and how it is meant to be extended?

If we support multiple base OS images (Alma, Debian, Alpine), is switching the FROM line intended to be sufficient, or do we need to explicitly define how downstream Dockerfiles are expected to integrate with each base image?

@Samk13 Samk13 added this to v14 Mar 3, 2026
@mfenner
Copy link
Copy Markdown
Contributor

mfenner commented Mar 3, 2026

@Samk13 good questions. I want multi-stage images because of image size and attack surface. One approach would be to to provide the builder stage as foundational image, and to build the custom runtime image based on that, as at least the invenio.cfg and site folder will be different.

The runtime image doesn't need dev dependencies, a package manager, pnpm or uv.

@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Mar 4, 2026

If we pick one Python I would probably use 3.14.

and

I would use trixie (debian 13) as the Debian version.

Yes that makes sense. I will change it for those tomorrow 👍.

  1. Runtime vs. build tooling

My thinking: the foundational images should provide build tooling and let downstream do that last mile "optimization" work. For instance, we install more packages, use those build tools to have the container build the assets and store them in a separate volume, do some RUN --mount=type=ssh ... stuff and ssh-keyscan... . Other institutions do other things. A foundational image that messes with the build tools or package artifacts (premature optimization steps) often breaks downstream.

The cookiecutter Dockerfile for an InvenioRDM project could introduce a 2-step build or other optimizations since that Dockerfile will be in full control of the instance operator. I think that could be a good location for those because of locality of information and operator control. I see it as a step in the cookiecuter init to select the image since the FROM and follow up commands in the instance Dockerfile would depend on OS. We talked about limiting those in a previous meeting, but since we should be removing the database choice at least, we still end up with limited decision making by installer.

  1. Foundation image contract:

Yes, exactly. This was brought up during the Docker session at the partner meeting as well and I tried to list these expectations in the README (but some alignment is needed for that to be clear). This way, whatever foundational image is chosen by an instance, the overall context is the same. Obviously there are also command differences across the different OSes, so the content change would be more than just the FROM line, but it should really just be "equivalences".

@mfenner
Copy link
Copy Markdown
Contributor

mfenner commented Mar 4, 2026

I am still not sure about the scope of this work:

  1. a well-documented Dockerfile following best practices that everyone can use and adapt, or
  2. a well-documented Dockerfile that generates a base image that every instance can then build on.

My practice the last year and my preference going forward would be the former. With the Dockerfile in the CERN registry building an image that can be used out of the box, basically the demo server.

I think the major pain point is how to build and not the resources needed to build and/or store an image. The GitHub registry is working fine for me and integrates nicely with GitHub Actions. I build images both locally (MacOS but linux/amd64 platform) and with GitHub Actions without issues. I also see an advantage for security considerations if the base image is the official Debian Python image rather than an InvenioRDM base image.

@mfenner
Copy link
Copy Markdown
Contributor

mfenner commented Mar 4, 2026

I would rephrase the README to not give the impression that this is primarily for Kubernetes and OpenShift.

@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Mar 4, 2026

  1. a well-documented Dockerfile following best practices that everyone can use and adapt, or
  2. a well-documented Dockerfile that generates a base image that every instance can then build on.

Within that lens, I guess I see it as docker-invenio's Dockerfile addressing 2. and cookiecutter-invenio-rdm addressing 1. (obviously best practices would be nice in both given that split structure).

From the workshop and talking to prospective adopters, the following has crystallized for me: early-stage adopters/prospectors want a running demo instance of InvenioRDM on a server accessible to their team. That's the target an initial documented install should strive for. Actual production-level instance and tweaks would come after (and be wholly dependent on their context) which means they would need control on the last mile Dockerfile + Docker Compose files. The Docker Compose file is that unit of runnable entity for me. Docker best practices do encourage splitting the different apps (at least celery and web server) like we do, so we are not too far off (and this repo currently covers InvenioILS too, so it can't be too specialized ­— not that we can't change it but it seems like a wider change then).

@mfenner
Copy link
Copy Markdown
Contributor

mfenner commented Mar 4, 2026

@fenekku thanks for the clarification. We can see where this leads us but for now the next steps are clear.

Comment thread debian/Dockerfile Outdated
Comment thread .github/dependabot.yml Outdated
Comment thread debian/Dockerfile Outdated
Comment thread README.md Outdated
Comment thread debian/Dockerfile Outdated
@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Mar 27, 2026

Thanks @mfenner @Samk13 for comments. I've pushed some updates (small but most of the work was testing all the downstream implications).

Notably I've added uv per the idea that it will be the default recommended for v14. Now to be very clear, the container commands of invenio-cli do not support uv currently. execute_cli_command does not prefix container commands with uv and/or passed commands do not start with uv. To do so, we either need to make uv install project/dependencies globally inside the container (like we used to do with pipenv which I am fine with) or add uv prefix in execute_cli_command with a little abstraction layer.

So current status before merging this:

  • add pnpm if drop-in replacement can be done straightforwardly

Then (and not a blocker to merging this)

  • update invenio-cli and cookiecutter accordingly

I know Sam has inveniosoftware/cookiecutter-invenio-rdm#304 . It's a bit intense 😅 . Personally I would go first for a PR in cookiecutter that lets the installer choose the base image (debian, almalinux...) and has the barebone debian equivalent of what was there before. Another separate PR like Sam's can be done after with more involved changes to the dockerfile (we can have more conversation about that one then/there).

@Samk13
Copy link
Copy Markdown
Member

Samk13 commented Mar 27, 2026

Thanks for the update @fenekku.
I would avoid making the installer choose between base images at this stage.

I think that adds a level of customization we do not want to expose yet, especially while the uv/pnpm workflow is still being aligned across the base image, cookiecutter, and invenio-cli.

My preference would be to converge on a single path first: merge the Debian image, use that as the baseline downstream, and stabilize the uv/pnpm setup there. Once that is working coherently end-to-end, we can revisit whether offering multiple base-image options is actually worth the added surface area.

IIRC, we agreed that the recommended default is also pnpm, so we can go with that here as well. I already have a PR for adapting pnpm in the CLI: inveniosoftware/invenio-cli#392

On my side in the coockicutter PR, I would rather simplify than widen scope here: remove the Alpine-specific setup I introduced, import/adapt this Debian approach there, and continue from that single baseline for now.

@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Mar 30, 2026

Added pnpm to the Dockerfile + little fixes. I believe this concludes the changes for this PR (comment if otherwise). It should be good to be merged.

For the debian based cookiecutter choice and so on, I would see these base images (e.g., debian in this case) as a choice there in the cookiecutter. There would still be a default for those who don't know. But it would provide visibility on possible supported base images and a complete developer experience (Dockerfile there depends on chosen base image). If we only provide 1 (e.g., almalinux) it makes the work here "hidden" or at least not as publicized as it should be to adopters. (but I see it out-of-scope of this specific PR so we can have that discussion elsewhere).

@fenekku fenekku requested review from Samk13 and mfenner March 30, 2026 17:26
Copy link
Copy Markdown
Contributor

@mfenner mfenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Consider hard-coding PYTHON_VERSION to 3.14 as this will be fixed beyond v14. Consider NODE 24 (LTS) as active support for 22 has ended in December 2025. Consider avoiding the duplicate apt-get install step for nodejs npm as discussed. Consider pinning uv version.

Line 18 should say Debian- trixie (not bookworm). ...

- [x] trixie in README
- [x] node 24
- [x] pinned uv to 0.11 for compromize btw automatically getting patches and stability
@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Mar 31, 2026

  • trixie
  • ❌ keeping the ARG PYTHON_VERSION is nice to be able to test multiple versions of python without editing the file again (and there is no real downside). Also not just for RDM for now.
  • node 24 (I have not tested 24 though, but I trust you did with no problem?)
  • duplicate apt-get install for nodejs npm : oh what I did I miss about this one? We need to install curl first, then run the setup_${NODE_VERSION}.x and then only install nodejs + npm . Since they are all part of the same layer it should be fine too, no?
  • pinned uv to 0.11 (compromise to get patches without manual edits at least)

@mfenner
Copy link
Copy Markdown
Contributor

mfenner commented Mar 31, 2026

Minor point, but curl is included in python:3.14-trixie (https://hub.docker.com/layers/library/python/3.14-trixie/images/sha256-47df6219349e4a639f7c021ad13e63b2942429b4bc91df723d89ef9b229098b0), so we can shorten the apt-get install section. Other listed packages are probably also already included in the 602 installed packages, ca-certificates for example is.

@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Apr 1, 2026

Ok last changes made. Do a last check @Samk13 @mfenner and I am merging. We can always refine even more later (I may want to compare against slim at a later date).

Copy link
Copy Markdown
Contributor

@mfenner mfenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread debian/Dockerfile Outdated
@Samk13
Copy link
Copy Markdown
Member

Samk13 commented Apr 2, 2026

I tested this setup against the cookiecutter PR (inveniosoftware/cookiecutter-invenio-rdm#304), and the final image size is around ~4.99 GB.

Using a multi-stage build approach (example: https://github.com/Samk13/cookiecutter-invenio-rdm/blob/d2e9e92c54396ad55002fd150dcb3aeeecf3c54d/{{cookiecutter.project_shortname}}/Dockerfile), I was able to reduce this to ~2.86 GB. There is likely still room for further optimization.

This reinforces the value of multi-stage builds discussed earlier, especially for keeping runtime images smaller.

Based on this, it may also be worth reconsidering whether maintaining an Alpine variant is necessary, though that decision probably belongs in a separate discussion in cookiecutter scope.

Co-authored-by: Sam Arbid <36583694+Samk13@users.noreply.github.com>
@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Apr 2, 2026

I will merge this, but I am putting the image through our full build "pipeline" today and testing it out more fully. So if anything else comes up, I will advise.

Thanks for the comments and contributions!

@fenekku
Copy link
Copy Markdown
Contributor Author

fenekku commented Apr 2, 2026

@Samk13

I tested this setup against the cookiecutter PR (inveniosoftware/cookiecutter-invenio-rdm#304), and the final image size is around ~4.99 GB.

Using a multi-stage build approach (example: https://github.com/Samk13/cookiecutter-invenio-rdm/blob/d2e9e92c54396ad55002fd150dcb3aeeecf3c54d/{{cookiecutter.project_shortname}}/Dockerfile), I was able to reduce this to ~2.86 GB. There is likely still room for further optimization.

This reinforces the value of multi-stage builds discussed earlier, especially for keeping runtime images smaller.

Based on this, it may also be worth reconsidering whether maintaining an Alpine variant is necessary, though that decision probably belongs in a separate discussion in cookiecutter scope.

Ok good to know. For our instance, I build assets on a volume outside the image. The preliminary numbers I get (I still have to fully switch over with that final base image):

> docker images
my-debian-trixie-3.14:latest              ca99b0c64adc        1.4GB  # <-- this base image
prism-from-inveniordm-debian-local:latest 1bbe98bc96cc       2.57GB  # <-- my application image

@fenekku fenekku merged commit 0ff1dbc into master Apr 2, 2026
@github-project-automation github-project-automation Bot moved this to To release 🤖 in v14 Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To release 🤖

Development

Successfully merging this pull request may close these issues.

3 participants