Skip to content

[RFC-0012] External Artifact API#5292

Merged
stefanprodan merged 7 commits intomainfrom
rfc-external-artifact
Sep 3, 2025
Merged

[RFC-0012] External Artifact API#5292
stefanprodan merged 7 commits intomainfrom
rfc-external-artifact

Conversation

@stefanprodan
Copy link
Copy Markdown
Member

@stefanprodan stefanprodan commented Apr 8, 2025

This RFC proposes the introduction of a new API called ExternalArtifact that would allow 3rd party controllers to act as a source of truth for the cluster desired state. In effect, the ExternalArtifact API acts as an extension of the existing source.toolkit.fluxcd.io APIs that enables Flux kustomize-controller and helm-controller to consume artifacts from external source types that are not natively supported by source-controller.

Preview: https://github.com/fluxcd/flux2/blob/rfc-external-artifact/rfcs/0012-external-artifact/README.md

@stefanprodan stefanprodan added the area/rfc Feature request proposals in the RFC format label Apr 8, 2025
@stefanprodan stefanprodan force-pushed the rfc-external-artifact branch 5 times, most recently from 16ba533 to 9751327 Compare April 8, 2025 11:49
Copy link
Copy Markdown

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there 👋 Thanks for sharing this in Draft State, cool to see such a development!
(FYI @Skarlso @fabianburth who also worked on the proposal)

This looks very similar to https://github.com/openfluxcd / #5058 original proposal with the main difference being that you use the ExternalArtifact instead of any arbitrary SourceRef.

I am wondering how this would be different to having a CRD that pulls OCI artifacts and then referencing those...

Maybe its even worth thinking about standardizing ExternalArtifacts completely to OCI? I think we talked about this in the Flux community call when the original proposal of openflux was discussed.

The main drawback we mentioned back then still persists:
If we have an ExternalArtifact, Flux will always need to reference ExternalArtifact, and users will not be able to use their origin CRDs. In your Example release-controller would work on a GitHubRelease but the Kustomization still contains ExternalArtifact.

At this point, why not think about having release-controller directly download OCIArtifacts, expose a registry endpoint, and then use the already available Flux types?

This is ultimately what caused us to not follow up on the design.

If there are cases when working with an OCIRegistry is not desired, or unfeasible (e.g. working with SFTP and converting that to OCI Artifacts may be unfeasible / undesirable), then I still see value in this CRD even with the indirection in place!

Looking forward to see where this goes! 🎉

@stefanprodan
Copy link
Copy Markdown
Member Author

stefanprodan commented Apr 8, 2025

If we have an ExternalArtifact, Flux will always need to reference ExternalArtifact, and users will not be able to use their origin CRDs.

@jakobmoellerdev an arbitrary sourceRef would imply opening up RBAC and the Flux CRD to unknown kinds, which would imply a v2 of Flux Kustomization and a v3 of HelmRelease. With ExternalArtifact the Flux RBAC does not changes, as Flux controllers already have all the permissions needed for the source.toolkit.fluxcd.io APIs and instead of dropping the allow list in the Flux CRDs we will just add another owned Kind to that which does not changes the security stance of Flux.

At this point, why not think about having release-controller directly download OCIArtifacts, expose a registry endpoint, and then use the already available Flux types?

Hosting an OCI registry inside the cluster and pushing content to it from a controller is way more challenging and involving than embedding a Go file server in a custom controller. Given that source-controller contains a storage implementation that can easily be copied in any other controller, the effort to expose artifacts to Flux is significant lower. If running a registry inside the cluster is feasible for you, then using OCIRepository is the right way to go, this RFC is for offering an alternative solution which IMO is easier to implement for 3rd party controllers.

@stealthybox
Copy link
Copy Markdown
Member

stealthybox commented May 23, 2025

an arbitrary sourceRef would imply opening up RBAC and the Flux CRD to unknown kinds, which would imply a v2 of Flux Kustomization and a v3 of HelmRelease. With ExternalArtifact the Flux RBAC does not changes

Controller authors who want extend Flux can add to a flux controller's RBAC themselves with ClusterRole aggregation.
There may be other reasons why we want a shim API within our own namespace (maybe so Notification controller only has to watch a single Kind of list), but RBAC may not be a sufficient reason on its own.

There could be separate security concerns with the code of multiple controller authors all having write access to the same status objects of a shared shim API.
ex: if ControllerA for SourceTypeA writes to the status of ExternalArtifacts for SourceTypeB, a flux controller could be confused into reconciling an incorrect or malicious set of resources.
There may be ways to prevent that sort of issue with admission on UserInfo using new CEL features in Kubernetes, but we don't currently do anything like that, and it would have to be configured per-controller and likely distributed by the add-on author themselves.

@matheuscscp
Copy link
Copy Markdown
Member

There may be other reasons why we want a shim API within our own namespace (maybe so Notification controller only has to watch a single Kind of list), but RBAC may not be a sufficient reason on its own.

Not sure what you mean by "NC watching a single kind of list", but NC does not watch any resources 🤔 The other Flux controllers send the event payload to NC via HTTP and external webhooks cause the Receiver controller to annotate the Flux resources for requesting out-of-schedule reconciliations. But all of that can be kustomized to work with Flux-external types by just patching the CRDs and RBAC like we do for flux-operator, no code changes are required:

https://github.com/controlplaneio-fluxcd/flux-operator/blob/07a37e692ed2d7ee6b16cd3c8f8231a134a9fbdc/internal/builder/profiles.go#L65-L138

External source controller authors must document how to do this 👆 for their controllers/CRDs.

There could be separate security concerns with the code of multiple controller authors all having write access to the same status objects of a shared shim API. ex: if ControllerA for SourceTypeA writes to the status of ExternalArtifacts for SourceTypeB, a flux controller could be confused into reconciling an incorrect or malicious set of resources.

Not sure I follow this example. I think the proposal here is that ControllerA for SourceTypeA will create an ExternalArtifact object with ownerRef/controllerRef set to an object of SourceTypeA, and only ControllerA will manage this ExternalArtifact object, no other controllers will. Each external controller will manage its own source types and ExternalArtifact object instances for its source types. If you install a malicious controller in the cluster that messes with it, I think that's on you, and that's the risk of opening up Flux for external source controllers regardless of particular details of the implementation we choose?

@matheuscscp
Copy link
Copy Markdown
Member

I think maybe a strong reason to choose an ExternalArtifact CRD is that we can easily watch it in kustomize-controller. I heard dynamic watches are possible for unknown types, but we are also indexing the managed Kustomizations by source type during manager setup, etc. Not sure how complex/doable/maintainable all of that would be for dynamically watching unknown types.

@matheuscscp
Copy link
Copy Markdown
Member

Actually, thinking again, I think I understand your concern. By accepting kinds that are external to Flux directly in .sourceRef cluster admins can have a fine control over what kinds are allowed in the Kustomization CRD, while with a single built-in kind ExternalArtifact this kind of fine control is not possible. I wonder, however, how much protection this really brings e.g. if a malicious controller starts patching the status of a CRD object whose kind is allowed in the Kustomization CRD, I suppose the honest controller will start fighting with the malicious controller and reconcile the object, then the malicious controller will reconcile again, and so on, so the protection we're providing here is actually this DDoS on the objects? 🤔

@stefanprodan stefanprodan force-pushed the rfc-external-artifact branch 4 times, most recently from 1eb7ac3 to b71b762 Compare July 15, 2025 14:41
@phoban01
Copy link
Copy Markdown

cc @mandelsoft 🙂

@stefanprodan stefanprodan force-pushed the rfc-external-artifact branch 4 times, most recently from be39788 to a9411d1 Compare July 15, 2025 17:21
@stefanprodan
Copy link
Copy Markdown
Member Author

stefanprodan commented Jul 15, 2025

@matheuscscp @stealthybox I've added a policy example for protecting a cluster against malicious tenants trying to hijack ExternalArtifacts.

@stefanprodan stefanprodan force-pushed the rfc-external-artifact branch from a9411d1 to d5ef363 Compare July 15, 2025 17:50
@stefanprodan stefanprodan marked this pull request as ready for review July 15, 2025 18:23
@stefanprodan stefanprodan requested a review from a team July 15, 2025 18:23
Comment thread rfcs/0000-external-artifact/README.md
@stefanprodan stefanprodan force-pushed the rfc-external-artifact branch from d5ef363 to 95b38b1 Compare July 15, 2025 20:25
Copy link
Copy Markdown
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@ebourgeois ebourgeois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and it seems like a key part to some work I am looking to do!

cc: @monadic , @bgrant0607

@bgrant0607
Copy link
Copy Markdown

The source controller seems like the most needed extensible integration point in Flux. I'd even put other configuration format renders as lower priority, because other kinds of automation, notably CI tools, can perform rendering (c.f., the rendered manifest pattern). Being able to pull configuration from arbitrary repositories, cache the files persistently, and then apply from the unpacked files would open up new possibilities.

@matheuscscp
Copy link
Copy Markdown
Member

The source controller seems like the most needed extensible integration point in Flux. I'd even put other configuration format renders as lower priority, because other kinds of automation, notably CI tools, can perform rendering (c.f., the rendered manifest pattern). Being able to pull configuration from arbitrary repositories, cache the files persistently, and then apply from the unpacked files would open up new possibilities.

The ideal solution for artifacts produced in CI is OCIRepository. The use case of this RFC is for when using OCIRepository is impossible.

@ebourgeois
Copy link
Copy Markdown

The source controller seems like the most needed extensible integration point in Flux. I'd even put other configuration format renders as lower priority, because other kinds of automation, notably CI tools, can perform rendering (c.f., the rendered manifest pattern). Being able to pull configuration from arbitrary repositories, cache the files persistently, and then apply from the unpacked files would open up new possibilities.

The ideal solution for artifacts produced in CI is OCIRepository. The use case of this RFC is for when using OCIRepository is impossible.

That is correct, that is what we are thinking and needing.

@stefanprodan
Copy link
Copy Markdown
Member Author

stefanprodan commented Jul 31, 2025

@bgrant0607 @ebourgeois are you using some kind of proprietary storage and you can't run flux push artifact in your system? Trying to understand why would you need a 3rd-party source instead of using a container registry.

@bgrant0607
Copy link
Copy Markdown

@stefanprodan @matheuscscp

Correct, we have other storage. A more generic HTTPS client than the S3 implementation which supported more general authentication methods might also work.

Another issue is that we're breaking up configuration artifacts into more granular pieces and haven't found an acceptable way to package and deploy them as OCI artifacts for Flux. It doesn't look like it supports applying multiple layers from a single image or watching a whole repo prefix, for instance.

S3 bucket is better in that regard because they can be pushed individually but reconciled as a group. S3 has other issues, though. For instance, I only just started to look at it, but I didn't see a way to sign artifacts and verify signatures.

We need to pull groups of files from our storage, persistently cache the files, and verify signatures before applying them.

@matheuscscp
Copy link
Copy Markdown
Member

Correct, we have other storage.

Where do you store the container images of your apps?

@matheuscscp
Copy link
Copy Markdown
Member

S3 bucket is better in that regard because they can be pushed individually but reconciled as a group.

Not sure I consider this an advantage

@bgrant0607
Copy link
Copy Markdown

Where do you store the container images of your apps?

This is not a point you need to debate.
https://itnext.io/advantages-of-storing-configuration-in-container-registries-rather-than-git-b4266dc0c79f

We have N files that are updated independently. We want O(1) configuration complexity in the deployment mechanism. Right now that has N^2 cost to update if we pack them into a single OCI image but update them independently. If we could add them as layers, this would not be an issue.

We could cache them in some other storage and push a new image after they are all updated, but at that point we don't need the OCI repository. The history is redundant and a management burden. We're trying to remove dominos from the Rube Goldberg Machine.

I have been an open-source maintainer and understand the burden of accepting new features. Good extension APIs reduce maintainer burden by pushing whole categories of features into the ecosystem. Honestly with the toolkit approach of Flux v2 I was surprised to find that pluggable source controllers weren't supported already.

We could use an alternative to Flux, but since there was a proposal that appeared to address our needs, I wanted to support it and find out if it was going anywhere. I'm not attached to this specific proposal, however.

Copy link
Copy Markdown
Member

@cappyzawa cappyzawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice additions

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
@stefanprodan stefanprodan changed the title [RFC] External Artifact API [RFC-0012] External Artifact API Sep 2, 2025
Comment thread rfcs/0012-external-artifact/README.md
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Copy link
Copy Markdown
Member

@souleb souleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@stefanprodan stefanprodan merged commit 6125991 into main Sep 3, 2025
4 checks passed
@stefanprodan stefanprodan deleted the rfc-external-artifact branch September 3, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/rfc Feature request proposals in the RFC format

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants