Open
Conversation
Serializing and deserializing raw JSON to the Crucible Agent's data file
creates the potential for a versioning problem that would be hidden from
the normal validation performed to make sure that our API surface
doesn't change.
Consider the following:
- a normal update occurs that changes the Crucible Agent's version
- that newer version's serialized data file is not compatible with the
older version
- some Bad Situation occurs and an operator decides to MUPdate back to
the previous version
In this scenario the older Crucible Agent will not be able to
deserialize what the newer Agent serialized, adding to the existing Bad
Situation. In a future when Nexus pays attention to Crucible health and
there are Agents that cannot start, this would result in potentially
stuck Upstairs and that may cascade to interrupt other operations while
Nexus waits for repairs or reconciliations.
This commit separates the in-memory types used in the DataFile struct
from the types committed to disk (and adds a comment that warns against
changing said type), and further distinguishes between Region states and
RunningSnapshot states.
This work is a prerequisite to getting the Agent to work on multiple
read-only region clones concurrently as that work adds a new State, and
internally discussing the Bad Situation scenario referenced earlier in
this message lead to the inspiration to use separate in-memory types.
Additionally, fix a bug where the `key_pem` field stored on disk would
be output in the error message that would result from requesting a
region that matches an existing one except for that field. Luckily no
production system uses Crucible's support for X509 yet and this endpoint
is not exposed to users.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Serializing and deserializing raw JSON to the Crucible Agent's data file creates the potential for a versioning problem that would be hidden from the normal validation performed to make sure that our API surface doesn't change.
Consider the following:
a normal update occurs that changes the Crucible Agent's version
some Bad Situation occurs and an operator decides to MUPdate back to the previous version
In this scenario the older Crucible Agent will not be able to deserialize what the newer Agent serialized, adding to the existing Bad Situation. In a future when Nexus pays attention to Crucible health and there are Agents that cannot start, this would result in potentially stuck Upstairs and that may cascade to interrupt other operations while Nexus waits for repairs or reconciliations.
This commit separates the in-memory types used in the DataFile struct from the types committed to disk (and adds a comment that warns against changing said type), and further distinguishes between Region states and RunningSnapshot states.
This work is a prerequisite to getting the Agent to work on multiple read-only region clones concurrently as that work adds a new State, and internally discussing the Bad Situation scenario referenced earlier in this message lead to the inspiration to use separate in-memory types.
Additionally, fix a bug where the
key_pemfield stored on disk would be output in the error message that would result from requesting a region that matches an existing one except for that field. Luckily no production system uses Crucible's support for X509 yet and this endpoint is not exposed to users.