RFC: Full mTLS for Diego Container-to-Container Traffic#1437
RFC: Full mTLS for Diego Container-to-Container Traffic#1437
Conversation
Add draft RFC proposing implementation of full mutual TLS (mTLS) for container-to-container traffic in Diego. The proposal introduces a new HTTP-based listener on port 62443 alongside the existing TCP-based port (61443), providing a dual opt-in model for operators and app authors. Key features: - Phase 1: Server-side HTTP-based C2C mTLS with XFCC header forwarding - Phase 2: Client-side egress proxy for automatic cert injection - Phase 3: Full zero-trust app-to-app communication integration Maintains backwards compatibility with existing deployments.
Use HTML <br/> tags instead of \n for line breaks in mermaid diagram labels to ensure proper rendering on GitHub and in markdown viewers.
| **BOSH Properties**: | ||
|
|
||
| ```yaml | ||
| containers.proxy.enable_egress_proxy: |
There was a problem hiding this comment.
Hey @rkoster, thx a lot for raising this RFC! I was wondering if the egress-part could also be enabled independent of the c2c-parts in this RFC. So, when only enabling enable_egress_proxy and an app-developer adding the Env CF_INSTANCE_MTLS_PROXY=http://127.0.0.1:61445 without the properties from step 1 and 3, would the egress proxy add the instance-cert in the mTLS connection towards gorouter (or a TLS-terminating component in front of gorouter)?
In our setup, we do not enable c2c as network.write is not exposed to the app developers. Yet, there is demand for platform features to support authentication from AppA to AppB.
We might consider implementing a per-route feature as per rfc-0027 in gorouter to even parse the client-cert data on platform-side, without additional logic within each app.
| - The receiving app gets an `X-Forwarded-Client-Cert` header with caller identity: | ||
|
|
||
| ``` | ||
| X-Forwarded-Client-Cert: Hash=abc123;Subject="CN=instance-guid,OU=app:client-app-guid,OU=space:space-guid" |
There was a problem hiding this comment.
The haproxy-boshrelease and gorouter (not sure where exactly the logic for that is) already include code to handle XFCC headers, we should consider how these should interact. Having all use the same header names but using different formats seems like a bad idea (though that may already be the case for HAProxy and gorouter). My first thought was to add another header that is set by envoy to indicate that this is app2app traffic, like cf-app-to-app: 1. This is then used to select the right context in which to evaluate the provided metadata. For the values themselves I feel like using the same format would be preferable, there's also quite some history to the format we currently use in HAProxy as we had to go through incompatible changes when we learned that certificates allow all sorts of special characters which HTTP headers don't.
There was a problem hiding this comment.
This is not a generic mtls feature. The scope is only for CF generated Identity certs to be used. So the special characters should not be an issue. When traffic comes in via the static mtls envoy port it must be from an other app using an app instance identity cert.
There was a problem hiding this comment.
Hi @rkoster, I really like this idea. I only wonder if we implement this, how can we do another iteration to support bring your own certs for apps for c2c networking, where the developers specify ca, cert and key for both communicating parties. Have you maybe done some thoughts about that?
There was a problem hiding this comment.
When traffic comes in via the static mtls envoy port it must be from an other app using an app instance identity cert.
Right, but an app might receive outside mTLS traffic as well as app2app mTLS traffic. Depending on where traffic comes from different rules apply on what constitutes an authorized request, doesn't it?
There was a problem hiding this comment.
yes, so when traffic comes from external to CF it will follow the existing route integrity path and get the header set by gorouter. So different rules based on different ports for internal vs external mtls traffic.
There was a problem hiding this comment.
@chombium why would app developers want to bring their own certificates instead of relying on the already provided certs by the platform. I don't really understand the usecase for this.
There was a problem hiding this comment.
Also please take a look at: #1438, which was created based on the feedback on this PR.
There was a problem hiding this comment.
Most apps have single HTTP listener on port 8080 and don't know what Envoy port the request came from so they can't distinguish requests based on that. I think what @maxmoehl suggested about second header makes sense.
|
Based on the feedback on this PR I have created an other RFC focused on app 2 app mtls using the gorouter: #1438 |
|
|
||
| - **SNI Handling**: Envoy needs to extract the target hostname for proper TLS handshake | ||
| - **NO_PROXY**: Applications should configure `NO_PROXY` for traffic that should not go through the proxy | ||
| - **Non-HTTP Traffic**: The HTTP CONNECT-based egress proxy only supports HTTP/HTTPS traffic. Support for TCP-based protocols could be addressed in a follow-up RFC. |
There was a problem hiding this comment.
I had a hard time wrapping my head around all the different permutations and did some research, if I got something wrong please correct me. The following variations exist:
HTTP_PROXY=http://localhost:61445HTTP_PROXY=https://localhost:61445HTTPS_PROXY=http://localhost:61445HTTPS_PROXY=https://localhost:61445
The HTTP / HTTPS in the *_PROXY variable tells the client when to use which, when making http:// requests it'll use HTTP_PROXY and https:// uses HTTPS_PROXY. The second variation is the protocol to talk to the proxy which is set in the value of the env var. This only controls whether the client speaks plain TCP or puts TLS on top for the connection to the proxy.
Now, while these two env vars look the same they trigger completely different behavior. When the client wants to talk http:// it sends the request to the proxy with just one minor adjustment: the target URI is sent in absolute-form meaning it includes scheme and host instead of just the path. For https:// this is not the case, the client will issue a CONNECT request to the proxy to establish a TCP tunnel to the target and then perform the TLS handshake on top of that. This requires the client to handle all the (m)TLS which is not what we want.
So to recap:
- We should only set
HTTP_PROXY, https traffic will always require the client to take part in the TLS handshake. - A client needs to deliberately speak http for this scenario to select the proxy and have it upgrade the connection to mTLS.
- The setup we want is not a
HTTP CONNECT-based egress proxy, it's a ...HTTP-based egress proxy? The wording is all over the place.
By not setting HTTPS_PROXY the risk of mangling some internet traffic because the user forgot to set NO_PROXY is also reduced as https traffic will never pass through the proxy.
| - No HTTP connection manager | ||
| - No XFCC header support | ||
| - No mTLS client certificate validation | ||
| - No HTTP/2 support |
There was a problem hiding this comment.
Can you expand on this? Wouldn't the app still receive HTTP/2 traffic via the TCP proxy?
|
|
||
| Both certificates include app identity in the certificate's `Subject.OrganizationalUnit` field (separate from the SAN). These claims are set by Cloud Controller and passed through the [BBS CertificateProperties](https://github.com/cloudfoundry/bbs/blob/main/models/certificate_properties.pb.go) model: | ||
|
|
||
| - `app:<app-guid>` |
There was a problem hiding this comment.
App guid is a bit awkward, since it would vary across environments and could change if the app is re-created (e.g. during a blue-green deploy). Ideally this could be something more durable.
| **New HTTP-based Listener**: Create a new Envoy listener with: | ||
| - HTTP connection manager (not TCP proxy) | ||
| - `DownstreamTlsContext` with `RequireClientCertificate: true` | ||
| - Validation context trusting the instance identity CA |
There was a problem hiding this comment.
what certificate will be presented by the server? same as C2C? this should be stated in RFC.
| #### Result | ||
|
|
||
| When an operator enables `containers.proxy.enable_c2c_mtls_listener`: | ||
| - Port 62443 becomes available on all containers |
There was a problem hiding this comment.
Currently, port 61443 only becomes available if container port is listening on a default port 8080. Otherwise, C2C port is not added. https://github.com/cloudfoundry/executor/blob/main/depot/containerstore/proxy_config_handler.go#L190 Will this also be a limitation for 62443?
| ``` | ||
|
|
||
| **Port Reservation**: Add validation to prevent applications from using port 62443, similar to the existing reservation for port 61443. | ||
|
|
There was a problem hiding this comment.
What will happen to applications that use port 62443 already?
| - The receiving app gets an `X-Forwarded-Client-Cert` header with caller identity: | ||
|
|
||
| ``` | ||
| X-Forwarded-Client-Cert: Hash=abc123;Subject="CN=instance-guid,OU=app:client-app-guid,OU=space:space-guid" |
There was a problem hiding this comment.
Most apps have single HTTP listener on port 8080 and don't know what Envoy port the request came from so they can't distinguish requests based on that. I think what @maxmoehl suggested about second header makes sense.
|
|
||
| This RFC proposes implementing full mutual TLS (mTLS) for container-to-container (C2C) traffic in Diego, enabling applications to both authenticate themselves and verify the identity of connecting applications. | ||
|
|
||
| The approach introduces a **new HTTP-based listener on port 62443** that runs alongside the existing TCP-based C2C port (61443), providing a dual opt-in model for both operators and application authors. This enables zero-trust networking where all app-to-app traffic can be authenticated and authorization decisions can be made based on app/space/org identity. |
There was a problem hiding this comment.
TCP is not covered in this RFC. Should "all app-to-app traffic" be "HTTP app-to-app traffic"?
| CF_INSTANCE_MTLS_PROXY=http://127.0.0.1:61445 | ||
| ``` | ||
|
|
||
| **Enforcing Proxy Usage**: Operators can make the egress proxy mandatory for all applications by setting `HTTP_PROXY` and `HTTPS_PROXY` to `http://127.0.0.1:61445` using [running environment variable groups](https://docs.cloudfoundry.org/devguide/deploy-apps/environment-variable.html#evgroups). This ensures all HTTP/HTTPS traffic from applications is routed through the egress proxy, with instance identity certificates injected only for C2C mTLS backends. |
There was a problem hiding this comment.
What if app sets its own HTTP_PROXY? Will it be overridden by global setting?
| #### Considerations | ||
|
|
||
| - **SNI Handling**: Envoy needs to extract the target hostname for proper TLS handshake | ||
| - **NO_PROXY**: Applications should configure `NO_PROXY` for traffic that should not go through the proxy |
There was a problem hiding this comment.
If applications configure NO_PROXY as *.apps.internal or * would this then make all requests bypass egress envoy proxy even if operator configured it to be set globally?
| ### Phase 2: Client-Side Egress Proxy (Port 61445) | ||
|
|
||
| Enable Envoy to act as an HTTP proxy for outbound connections, automatically injecting the instance identity certificate. | ||
|
|
There was a problem hiding this comment.
I wonder if this would limit apps from setting their certificates to talk to some external services or apps running in multi-platform deployments?
Summary
This RFC proposes implementing full mutual TLS (mTLS) for container-to-container (C2C) traffic in Diego, enabling applications to both authenticate themselves and verify the identity of connecting applications.
View the full RFC
The approach introduces:
Key Points
X-Forwarded-Client-CertheaderImplementation Phases
cc @cloudfoundry/toc @cloudfoundry/wg-app-runtime-platform