Skip to content

[WIP] tpmmgr: do not persist device cert in TPM NVRAM#5728

Open
shjala wants to merge 2 commits intolf-edge:masterfrom
shjala:fix.nvram.use
Open

[WIP] tpmmgr: do not persist device cert in TPM NVRAM#5728
shjala wants to merge 2 commits intolf-edge:masterfrom
shjala:fix.nvram.use

Conversation

@shjala
Copy link
Copy Markdown
Member

@shjala shjala commented Apr 1, 2026

Description

Device certificates can exceed 500 bytes which is significant in terms
of TPM NVRAM capacity, particularly on fTPM implementations with limited
NV storage. Persisting the cert in NVRAM risks exhausting this resource
and causing failures.

Instead, reconstruct the device cert on demand from the TPM-persisted
device key: if the device key handle exists in TPM, derive its public
key and generate a self-signed cert. This removes TpmDeviceCertHdl and
writeDeviceCert/readDeviceCert functions while preserving the ability
to re-use an existing key across reboots.

The TpmDeviceCertHdl is not freed to presver backwards compatibility in
case of upgrade failures and rollback scenarios.

PR dependencies

List all dependencies of this PR (when applicable, otherwise remove this
section).

How to test and validate this PR

Please describe how the changes in this PR can be validated or verified. For
example:

  • If your PR fixes a bug, outline the steps to confirm the issue is resolved.
  • If your PR introduces a new feature, explain how to test and validate it.

This will be used

  1. to provide test scenarios for the QA team
  2. by a reviewer to validate the changes in this PR.

The first is especially important, so, please make sure to provide as much
detail as possible.

If it's covered by an automated test, please mention it here.

Changelog notes

Text in this section will be used to generate the changelog entry for
release notes. The consumers of this are end users, not developers.
So, provide a clear and short description of what is changed in the PR from
the end user perspective. If it changes only tooling or some internal
implementation, put a note like "No user-facing changes" or "None".

PR Backports

For all current LTS branches, please state explicitly if this PR should be
backported or not. This section is used by our scripts to track the backports,
so, please, do not omit it.

Here is the list of current LTS branches (it should be always up to date):

  • 16.0-stable
  • 14.5-stable
  • 13.4-stable

For example, if this PR fixes a bug in a feature that was introduced in 14.5,
you can write:

- 16.0-stable: To be backported.
- 14.5-stable: No, as the feature is not available there.
- 13.4-stable: No, as the feature is not available there.

Also, to the PRs that should be backported into any stable branch, please
add a label stable.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

shjala added 2 commits April 1, 2026 12:05
Device certificates can exceed 500 bytes which is significant in terms
of TPM NVRAM capacity, particularly on fTPM implementations with limited
NV storage. Persisting the cert in NVRAM risks exhausting this resource
and causing failures.

Instead, reconstruct the device cert on demand from the TPM-persisted
device key: if the device key handle exists in TPM, derive its public
key and generate a self-signed cert. This removes TpmDeviceCertHdl and
writeDeviceCert/readDeviceCert functions while preserving the ability
to re-use an existing key across reboots.

The TpmDeviceCertHdl is not freed to presver backwards compatibility in
case of upgrade failures and rollback scenarios.

Signed-off-by: Shahriyar Jalayeri <shahriyar@posteo.de>
just "make bump-eve-pillar".

Signed-off-by: Shahriyar Jalayeri <shahriyar@posteo.de>
@shjala shjala changed the title tpmmgr: do not persist device cert in TPM NVRAM [WIP] tpmmgr: do not persist device cert in TPM NVRAM Apr 1, 2026
@shjala
Copy link
Copy Markdown
Member Author

shjala commented Apr 1, 2026

@rene please test this and let me know.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 29.87%. Comparing base (2281599) to head (13cd5c5).
⚠️ Report is 550 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5728       +/-   ##
===========================================
+ Coverage   19.52%   29.87%   +10.34%     
===========================================
  Files          19       18        -1     
  Lines        3021     2417      -604     
===========================================
+ Hits          590      722      +132     
+ Misses       2310     1549      -761     
- Partials      121      146       +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eriknordmark
Copy link
Copy Markdown
Contributor

@shjala I think moving in this direction makes sense since we get the freedom to generate larger device certs (On TPM chips we've seen limited around 576 bytes which means we can't put more information in e.g., the CN in the device cert).

However, for this to work the controllers need to identify the device not based on the device cert (and internally use the hash of that to do a lookup) but instead identify the device based on the public key in the cert (and if a hash is used for internal lookups, use a hash of the public key).

Thus it makes sense to do a discrete test which regenerates the device cert based on the keys in the TPM and make sure that can be run without affecting any operation with the controllers.

@shjala
Copy link
Copy Markdown
Member Author

shjala commented Apr 2, 2026

However, for this to work the controllers need to identify the device not based on the device cert (and internally use the hash of that to do a lookup) but instead identify the device based on the public key in the cert (and if a hash is used for internal lookups, use a hash of the public key).

Good point! because we use random serial numbers and current date and that produces a different cert everytime.

I'm no expecting this to generate the device cert at every boot (but only on first boot or installer), because it checks for cert existence in /config before regen, but I guess we can't rely on this.

I think eve-api needs rework to make this happen, we specifically ask for senderCerthash.

@eriknordmark
Copy link
Copy Markdown
Contributor

I'm no expecting this to generate the device cert at every boot (but only on first boot or installer), because it checks for cert existence in /config before regen, but I guess we can't rely on this.

Initial install would work fine.
We need to make sure there isn't anything we break for various reuse workflows.

Today one can reuse in a few different ways:

  1. offboard in the controller and onboard to a different enterprise (in the same controller)
  2. Complete reinstall of EVE-OS (which AFAIK is documented as assuming a TPM clear)
  3. Copy /config/soft-serial or /config/device.cert.pem from device. Use that to onboard to a different controller (e.g., by manually modifying (using eve config mount etc) /config/server.

AFAIK all of those will continue to work as today, but it makes sense to verify.

  1. Clear the (/persist) disk and/or complete re-install without a TPM clear.

#4 does not actually work today since the commercial controller does not like that some of the additional certs (in /persist/certs) change since that looks like a potential attack.

With this PR #4 will fail in different ways since it will be a different device certificate (even if it reuses the same device key pair in the TPM).

So key is verifying the documentation to make sure we say that #4 does not work and folks must do a TPM clear in that case (i.e., turning it into case #2).

@eriknordmark
Copy link
Copy Markdown
Contributor

@shjala do we still need this or is the fTPM getting more space? Close/mark as draft as appropriate,

@shjala
Copy link
Copy Markdown
Member Author

shjala commented Apr 17, 2026

@eriknordmark you said this is good idea in general, so I'm keeping it WIP to get back to it when time permits.

@rene
Copy link
Copy Markdown
Contributor

rene commented Apr 22, 2026

@eriknordmark, @shjala , fTPM is fixed. But as @eriknordmark pointed out (and I agreed as well), that's a good direction to take... no rush though....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants