feat(kvp): add store trait and Hyper-V KVP storage crate#288
feat(kvp): add store trait and Hyper-V KVP storage crate#288peytonr18 wants to merge 10 commits intoAzure:mainfrom
Conversation
b134a98 to
f435f70
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new standalone workspace crate (libazureinit-kvp) that defines a storage abstraction (KvpStore) and a production Hyper-V pool-file implementation (HyperVKvpStore), along with updated technical reference documentation.
Changes:
- Introduces
libazureinit-kvpcrate withKvpStoreandKvpLimits(Hyper-V/Azure presets). - Implements Hyper-V binary pool-file backend with flock-based concurrency and stale-file truncation support.
- Rewrites
doc/libazurekvp.mdas a technical reference for the new crate/API.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
Cargo.toml |
Adds libazureinit-kvp to the workspace members. |
libazureinit-kvp/Cargo.toml |
Defines the new crate and its dependencies (fs2, sysinfo). |
libazureinit-kvp/src/lib.rs |
Defines KvpStore, KvpLimits, and exports HyperVKvpStore plus size constants. |
libazureinit-kvp/src/hyperv.rs |
Implements Hyper-V pool-file encoding/decoding and the KvpStore backend, plus unit tests. |
doc/libazurekvp.md |
Updates documentation to describe the new crate’s record format, semantics, truncation behavior, and limits. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ation; add PoolMode to handle size limits
…ics and JSON dump Overhaul the KvpStore trait into a pure customer-facing interface with no default implementations or backend_* methods. All logic (validation, file I/O, locking) now lives in the KvpPoolStore implementation. Key changes: - Replace append-based write with upsert (insert-or-update) so each key has exactly one record in the file. Eliminates entries_raw and last-write-wins deduplication. - Move validate_key/validate_value from trait defaults to private KvpPoolStore methods. - Decouple read validation from PoolMode: reads accept keys up to the full wire-format max (512 bytes) regardless of mode; only writes are restricted by PoolMode. - Add len(), is_empty(), and dump() (JSON via serde_json) to the trait. - Add 4 multi-threaded concurrency tests covering concurrent upserts to different keys, same key, mixed readers/writers, and key cap boundary.
… locks Use libc fcntl(F_OFD_SETLKW/F_OFD_SETLK) for shared/exclusive/unlock operations so pool file locking interoperates with Linux KVP consumers that use fcntl record locks. Also replace fs2 with libc and keep existing pool behavior/tests intact.
| use crate::{KvpError, KvpStore}; | ||
|
|
||
| /// Default KVP pool path when none is provided. | ||
| pub const DEFAULT_KVP_POOL_PATH: &str = "/var/lib/hyperv/.kvp_pool_1"; |
There was a problem hiding this comment.
Is there any chance for multiple KVP pools where filename would need to be distinct? (ex. kvp_pool_2)
There was a problem hiding this comment.
Technically other filenames exist (.kvp_pool_2 does sometimes exist), but the only one hyperV looks at is .kvp_pool_1. I gave the option to accept a different pool path in case the user wants to not use hyperV and just do their own thing but otherwise, the only filename we would want to target for HyperV ingestion is pool_1.
Let me know if that answers your question!!
| Err(e) => return Err(e.into()), | ||
| }; | ||
| let boot = Self::boot_time()?; | ||
| Ok(metadata.mtime() < boot) |
There was a problem hiding this comment.
Is this function intended to return true when metadata timestamp is less than time since boot in order to consider the pool stale?
There was a problem hiding this comment.
Yes!
The boot timestamp is compared to the ,etadata.mtime and if mtime >= boot_time, then the file is treated as current. If not, it's stale!
| Err(ref e) if e.kind() == ErrorKind::NotFound => return Ok(false), | ||
| Err(e) => return Err(e.into()), | ||
| }; | ||
| Ok(metadata.mtime() < boot_time) |
There was a problem hiding this comment.
What's the functional difference between this function and the above function if both return on the evaluation of metadata time vs boot time? Are both needed?
There was a problem hiding this comment.
boot_is_stale(&self) is the production path that computes the boot time internally via Self::boot_time() and then compares it to the metadata.
pool_is_stale_at_boot(&self, boot_time: i64 is a test only helper that takes in the boot time as an argument so that the tests can inject a fixed value and avoid the issue of not having a real system clock/uptime.
So, they functionally do the same comparison, one is just for the testing env. But, if this is confusing, I could extract the shared logic into a helper function!!
There was a problem hiding this comment.
Makes sense to me -- I noticed the separate var for boot time in the latter, but just needed clarification on intent. Maybe a comment/docstring would be helpful but I wouldn't ask for functional changes here.
libazureinit-kvp/src/kvp_pool.rs
Outdated
| } | ||
|
|
||
| map.insert(key.to_string(), value.to_string()); | ||
| rewrite_file(&mut file, &map)?; |
There was a problem hiding this comment.
If the map contains all records in this case, the whole file would be rewritten. Is this necessary/cause any unneeded latency?
There was a problem hiding this comment.
Great question!!! Chris and I talked about this at length offline, and I ran some small tests to evaluate some of the other approaches (memory mapping, indexed in-place inserts or key updates, etc). but based on those tests, I thought this was the best approach when thinking about speed and safety.
The good news is because there is a hard-cap of 1024 records, the rewrite cost is bounded. I think the idea was that we start with this but if we start seeing latency issues, we can go to a more complex approach with memory mapping or whatever alternative.
There was a problem hiding this comment.
Great, as long as it was discussed/tested -- It's not a whole lot to ever write given the record cap plus k/v size limits, but since this is involved in provisioning, I wanted to be sure we had considered even minor latency factors.
bpryan99
left a comment
There was a problem hiding this comment.
My comments were addressed, LGTM! Great work.
Introduce explicit insert (overwrite existing key) and append (always add record) APIs, refactor pool operations to iterator-based in-place updates/deletes, and document record-count vs unique-key behavior with expanded concurrency/integrity test coverage.
| /// Append a new record at the end of the file, zero-padded to the | ||
| /// fixed-width wire format. | ||
| fn append(&mut self, key: &str, value: &str) -> io::Result<()> { | ||
| self.file.seek(io::SeekFrom::End(0))?; |
There was a problem hiding this comment.
we could do an assertion here to ensure we are properly aligned
|
|
||
| let mut iter = self.iter_mut()?; | ||
| iter.append(key, value)?; | ||
| iter.flush()?; |
There was a problem hiding this comment.
the flush story needs to be worked out. if kvpd will read from this file while we hold the lock, then every write should probably be followed up by flush. if not, then maybe we only flush before we drop the lock
| @@ -0,0 +1,74 @@ | |||
| # PR: Add `libazureinit-kvp` with unified KVP pool store | |||
There was a problem hiding this comment.
file seems out of date - is it going to be deleted?
| use crate::{KvpError, KvpStore}; | ||
|
|
||
| /// Default KVP pool path when none is provided. | ||
| pub const DEFAULT_KVP_POOL_PATH: &str = "/var/lib/hyperv/.kvp_pool_1"; |
There was a problem hiding this comment.
let's do an enum for these, see https://github.com/Azure/azure-init/pull/290/changes#diff-0e11ae6109310d637614a88f49107f5f77f3f8adb457d64e32d83cf37b6875e0R64
| /// Policy mode controlling key/value size limits for writes. | ||
| #[derive(Clone, Copy, Debug, PartialEq, Eq)] | ||
| pub enum PoolMode { | ||
| /// Conservative limits for Azure compatibility |
There was a problem hiding this comment.
the limits are for linux kernel compatibility
| pub enum PoolMode { | ||
| /// Conservative limits for Azure compatibility | ||
| /// (key <= 254 bytes, value <= 1022 bytes). | ||
| Restricted, |
There was a problem hiding this comment.
we might be able to figure out how to detect the kernel's limits and set behavior for Safe accordingly
| Restricted, | ||
| /// Full Hyper-V wire-format limits | ||
| /// (key <= 512 bytes, value <= 2048 bytes). | ||
| Full, |
| fn append(&self, key: &str, value: &str) -> Result<(), KvpError>; | ||
|
|
||
| /// Read the value for a key. Returns `Ok(None)` when absent. | ||
| fn read(&self, key: &str) -> Result<Option<String>, KvpError>; |
There was a problem hiding this comment.
this should document behavior if multiple matches for "key" are found. (last one wins)
| fn read(&self, key: &str) -> Result<Option<String>, KvpError>; | ||
|
|
||
| /// Remove a key. Returns `true` if the key was present. | ||
| fn delete(&self, key: &str) -> Result<bool, KvpError>; |
There was a problem hiding this comment.
similarly document behavior if multiple matches are found (all are delted)
| fn clear(&self) -> Result<(), KvpError>; | ||
|
|
||
| /// Return all key-value pairs. | ||
| fn entries(&self) -> Result<HashMap<String, String>, KvpError>; |
There was a problem hiding this comment.
can we get an interface that provides a list of key-values, so we can validate when multiple keys are present?
it'd be a useful interface to test with.
e.g.
append("key", "val1")
append("key", "val2")
append("key", "val3")
assert entries() == {"key": "val3"}
assert dump() == [("key", "val1"), ("key", "val2"), ("key", "val3")]
| } | ||
|
|
||
| fn is_stale(&self) -> Result<bool, KvpError> { | ||
| self.pool_is_stale() |
There was a problem hiding this comment.
i think pool_is_stale() can be moved into here. no need for the jump and "pool" is implied
| } | ||
|
|
||
| fn is_empty(&self) -> Result<bool, KvpError> { | ||
| match std::fs::metadata(&self.path) { |
There was a problem hiding this comment.
would it make sense to check len() == 0?
| Ok(metadata.mtime() < boot_time) | ||
| } | ||
|
|
||
| fn truncate_pool(&self) -> Result<(), KvpError> { |
There was a problem hiding this comment.
"_pool" is implied, maybe just remove it
There was a problem hiding this comment.
actually maybe just move all this into clear
What this PR introduces
libazureinit-kvp.libazureinit-kvp/src/lib.rsandlibazureinit-kvp/src/kvp_pool.rs.libc,serde,serde_json,sysinfo.KVP design
KvpStore(pure interface; no default method implementations).KvpPoolStore(file-backed KVP pool).KvpError.KvpStoreAPI:max_key_size() -> usizemax_value_size() -> usizeinsert(key, value) -> Result<(), KvpError>append(key, value) -> Result<(), KvpError>read(key) -> Result<Option<String>, KvpError>delete(key) -> Result<bool, KvpError>clear() -> Result<(), KvpError>entries() -> Result<HashMap<String, String>, KvpError>len() -> Result<usize, KvpError>is_empty() -> Result<bool, KvpError>is_stale() -> Result<bool, KvpError>dump(path) -> Result<(), KvpError>On-disk format and limits
PoolModecontrols write limits:Restricted: key <= 254 bytes, value <= 1022 bytesFull: key <= 512 bytes, value <= 2048 bytespath == None:/var/lib/hyperv/.kvp_pool_1.Locking change (
flock->fcntl)fs2locks withlibc::fcntllocks inlibazureinit-kvp.fcntl(F_OFD_SETLKW, F_RDLCK)fcntl(F_OFD_SETLKW, F_WRLCK)fcntl(F_OFD_SETLK, F_UNLCK)flockandfcntllocks are separate lock domains and do not coordinate with each other.fcntl-style locks (for examplehv_kvp_daemon/cloud-init workflows).File I/O behavior
OpenOptions,read_exact,write_all,set_len,seek,flush).open_for_read_write_create()explicitly sets.truncate(false)to avoid implicit truncation on open.insertflow:appendflow:deleteflow:read/entriesflow:Stale-data behavior
KvpPoolStore::new(path, mode, truncate_on_stale).truncate_on_stale == trueand file mtime predates current boot time, the pool is truncated.Error model
KvpErrorvariants:EmptyKeyKeyContainsNullKeyTooLarge { max, actual }ValueTooLarge { max, actual }MaxUniqueKeysExceeded { max }Io(io::Error)Test coverage
43 tests cover:
appendsemantics (no cap, duplicate-key first-wins on read, size validation),delete,clear,len,is_empty, stale checks,