Skip to content

feat(kvp): add store trait and Hyper-V KVP storage crate#288

Open
peytonr18 wants to merge 10 commits intoAzure:mainfrom
peytonr18:probertson/kvp-store-trait
Open

feat(kvp): add store trait and Hyper-V KVP storage crate#288
peytonr18 wants to merge 10 commits intoAzure:mainfrom
peytonr18:probertson/kvp-store-trait

Conversation

@peytonr18
Copy link
Copy Markdown
Contributor

@peytonr18 peytonr18 commented Mar 9, 2026

What this PR introduces

  • Adds workspace crate libazureinit-kvp.
  • Adds libazureinit-kvp/src/lib.rs and libazureinit-kvp/src/kvp_pool.rs.
  • Adds crate dependencies: libc, serde, serde_json, sysinfo.

KVP design

  • Public trait: KvpStore (pure interface; no default method implementations).
  • Concrete implementation: KvpPoolStore (file-backed KVP pool).
  • Error type: KvpError.

KvpStore API:

  • max_key_size() -> usize
  • max_value_size() -> usize
  • insert(key, value) -> Result<(), KvpError>
  • append(key, value) -> Result<(), KvpError>
  • read(key) -> Result<Option<String>, KvpError>
  • delete(key) -> Result<bool, KvpError>
  • clear() -> Result<(), KvpError>
  • entries() -> Result<HashMap<String, String>, KvpError>
  • len() -> Result<usize, KvpError>
  • is_empty() -> Result<bool, KvpError>
  • is_stale() -> Result<bool, KvpError>
  • dump(path) -> Result<(), KvpError>

On-disk format and limits

  • Hyper-V wire format: fixed-size records.
  • key field: 512 bytes
    • value field: 2048 bytes
    • record size: 2560 bytes
  • Maximum unique keys: 1024.
  • PoolMode controls write limits:
    • Restricted: key <= 254 bytes, value <= 1022 bytes
    • Full: key <= 512 bytes, value <= 2048 bytes
  • Reads accept keys up to 512 bytes in both modes.
  • Default path when path == None: /var/lib/hyperv/.kvp_pool_1.

Locking change (flock -> fcntl)

  • Replaces fs2 locks with libc::fcntl locks in libazureinit-kvp.
  • New lock behavior:
    • shared read lock: fcntl(F_OFD_SETLKW, F_RDLCK)
    • exclusive write lock: fcntl(F_OFD_SETLKW, F_WRLCK)
    • unlock: fcntl(F_OFD_SETLK, F_UNLCK)
  • Why: Linux flock and fcntl locks are separate lock domains and do not coordinate with each other.
  • Result: the KVP store now coordinates correctly with Linux components using fcntl-style locks (for example hv_kvp_daemon/cloud-init workflows).

File I/O behavior

  • Uses regular file I/O (OpenOptions, read_exact, write_all, set_len, seek, flush).
  • open_for_read_write_create() explicitly sets .truncate(false) to avoid implicit truncation on open.
  • insert flow:
    1. open file read+write+create,
    2. take exclusive lock,
    3. iterate records, collecting unique keys into a set: if key found, overwrite value in-place; else check unique-key cap and append new record at EOF,
    4. flush,
    5. unlock (on iterator drop).
  • append flow:
    1. open file read+write+create,
    2. take exclusive lock,
    3. seek to EOF and write a new record (no duplicate check, no unique-key cap),
    4. flush,
    5. unlock (on iterator drop).
  • delete flow:
    1. open file read+write,
    2. take exclusive lock,
    3. iterate records: if key found, swap with last record and truncate,
    4. flush,
    5. unlock (on iterator drop).
  • read/entries flow:
    1. open file read-only,
    2. take shared lock,
    3. read/decode all records,
    4. unlock (on iterator drop).

Stale-data behavior

  • Constructor: KvpPoolStore::new(path, mode, truncate_on_stale).
  • If truncate_on_stale == true and file mtime predates current boot time, the pool is truncated.
  • Stale check is explicit and caller-controlled via constructor flag.

Error model

KvpError variants:

  • EmptyKey
  • KeyContainsNull
  • KeyTooLarge { max, actual }
  • ValueTooLarge { max, actual }
  • MaxUniqueKeysExceeded { max }
  • Io(io::Error)

Test coverage

43 tests cover:

  • key validation rules (empty, null, size boundaries in both modes),
  • path/mode behavior,
  • overwrite semantics, unique-key cap (insert, delete-then-insert, overwrite-at-limit),
  • append semantics (no cap, duplicate-key first-wins on read, size validation),
  • delete, clear, len, is_empty, stale checks,
  • JSON dump behavior,
  • iterator correctness (yield order, empty file, malformed file, mid-iteration drop releases lock),
  • in-place overwrite (adjacent records untouched, multiple overwrites in one pass),
  • remove/swap-and-continue, remove-last, remove-only-record,
  • file-size invariants (unchanged on overwrite, grows on new key, preserves other records),
  • file integrity after mixed operations,
  • concurrent reader/writer, writer/writer, append, and mixed insert+append scenarios.

@peytonr18 peytonr18 force-pushed the probertson/kvp-store-trait branch from b134a98 to f435f70 Compare March 9, 2026 19:19
@cadejacobson cadejacobson self-requested a review March 9, 2026 20:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new standalone workspace crate (libazureinit-kvp) that defines a storage abstraction (KvpStore) and a production Hyper-V pool-file implementation (HyperVKvpStore), along with updated technical reference documentation.

Changes:

  • Introduces libazureinit-kvp crate with KvpStore and KvpLimits (Hyper-V/Azure presets).
  • Implements Hyper-V binary pool-file backend with flock-based concurrency and stale-file truncation support.
  • Rewrites doc/libazurekvp.md as a technical reference for the new crate/API.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Cargo.toml Adds libazureinit-kvp to the workspace members.
libazureinit-kvp/Cargo.toml Defines the new crate and its dependencies (fs2, sysinfo).
libazureinit-kvp/src/lib.rs Defines KvpStore, KvpLimits, and exports HyperVKvpStore plus size constants.
libazureinit-kvp/src/hyperv.rs Implements Hyper-V pool-file encoding/decoding and the KvpStore backend, plus unit tests.
doc/libazurekvp.md Updates documentation to describe the new crate’s record format, semantics, truncation behavior, and limits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

peytonr18 and others added 3 commits March 19, 2026 11:56
…ics and JSON dump

Overhaul the KvpStore trait into a pure customer-facing interface with
no default implementations or backend_* methods. All logic (validation,
file I/O, locking) now lives in the KvpPoolStore implementation.

Key changes:
- Replace append-based write with upsert (insert-or-update) so each key
  has exactly one record in the file. Eliminates entries_raw and
  last-write-wins deduplication.
- Move validate_key/validate_value from trait defaults to private
  KvpPoolStore methods.
- Decouple read validation from PoolMode: reads accept keys up to the
  full wire-format max (512 bytes) regardless of mode; only writes are
  restricted by PoolMode.
- Add len(), is_empty(), and dump() (JSON via serde_json) to the trait.
- Add 4 multi-threaded concurrency tests covering concurrent upserts to
  different keys, same key, mixed readers/writers, and key cap boundary.
… locks

Use libc fcntl(F_OFD_SETLKW/F_OFD_SETLK) for shared/exclusive/unlock operations so pool file locking interoperates with Linux KVP consumers that use fcntl record locks. Also replace fs2 with libc and keep existing pool behavior/tests intact.
@peytonr18 peytonr18 requested a review from cadejacobson March 27, 2026 19:25
use crate::{KvpError, KvpStore};

/// Default KVP pool path when none is provided.
pub const DEFAULT_KVP_POOL_PATH: &str = "/var/lib/hyperv/.kvp_pool_1";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance for multiple KVP pools where filename would need to be distinct? (ex. kvp_pool_2)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically other filenames exist (.kvp_pool_2 does sometimes exist), but the only one hyperV looks at is .kvp_pool_1. I gave the option to accept a different pool path in case the user wants to not use hyperV and just do their own thing but otherwise, the only filename we would want to target for HyperV ingestion is pool_1.

Let me know if that answers your question!!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes complete sense, thanks!

Err(e) => return Err(e.into()),
};
let boot = Self::boot_time()?;
Ok(metadata.mtime() < boot)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function intended to return true when metadata timestamp is less than time since boot in order to consider the pool stale?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!
The boot timestamp is compared to the ,etadata.mtime and if mtime >= boot_time, then the file is treated as current. If not, it's stale!

Err(ref e) if e.kind() == ErrorKind::NotFound => return Ok(false),
Err(e) => return Err(e.into()),
};
Ok(metadata.mtime() < boot_time)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the functional difference between this function and the above function if both return on the evaluation of metadata time vs boot time? Are both needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boot_is_stale(&self) is the production path that computes the boot time internally via Self::boot_time() and then compares it to the metadata.

pool_is_stale_at_boot(&self, boot_time: i64 is a test only helper that takes in the boot time as an argument so that the tests can inject a fixed value and avoid the issue of not having a real system clock/uptime.

So, they functionally do the same comparison, one is just for the testing env. But, if this is confusing, I could extract the shared logic into a helper function!!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- I noticed the separate var for boot time in the latter, but just needed clarification on intent. Maybe a comment/docstring would be helpful but I wouldn't ask for functional changes here.

}

map.insert(key.to_string(), value.to_string());
rewrite_file(&mut file, &map)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the map contains all records in this case, the whole file would be rewritten. Is this necessary/cause any unneeded latency?

Copy link
Copy Markdown
Contributor Author

@peytonr18 peytonr18 Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question!!! Chris and I talked about this at length offline, and I ran some small tests to evaluate some of the other approaches (memory mapping, indexed in-place inserts or key updates, etc). but based on those tests, I thought this was the best approach when thinking about speed and safety.

The good news is because there is a hard-cap of 1024 records, the rewrite cost is bounded. I think the idea was that we start with this but if we start seeing latency issues, we can go to a more complex approach with memory mapping or whatever alternative.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, as long as it was discussed/tested -- It's not a whole lot to ever write given the record cap plus k/v size limits, but since this is involved in provisioning, I wanted to be sure we had considered even minor latency factors.

Copy link
Copy Markdown

@bpryan99 bpryan99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments were addressed, LGTM! Great work.

Introduce explicit insert (overwrite existing key) and append (always add record) APIs, refactor pool operations to iterator-based in-place updates/deletes, and document record-count vs unique-key behavior with expanded concurrency/integrity test coverage.
/// Append a new record at the end of the file, zero-padded to the
/// fixed-width wire format.
fn append(&mut self, key: &str, value: &str) -> io::Result<()> {
self.file.seek(io::SeekFrom::End(0))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do an assertion here to ensure we are properly aligned


let mut iter = self.iter_mut()?;
iter.append(key, value)?;
iter.flush()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flush story needs to be worked out. if kvpd will read from this file while we hold the lock, then every write should probably be followed up by flush. if not, then maybe we only flush before we drop the lock

@@ -0,0 +1,74 @@
# PR: Add `libazureinit-kvp` with unified KVP pool store
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file seems out of date - is it going to be deleted?

use crate::{KvpError, KvpStore};

/// Default KVP pool path when none is provided.
pub const DEFAULT_KVP_POOL_PATH: &str = "/var/lib/hyperv/.kvp_pool_1";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Policy mode controlling key/value size limits for writes.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum PoolMode {
/// Conservative limits for Azure compatibility
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the limits are for linux kernel compatibility

pub enum PoolMode {
/// Conservative limits for Azure compatibility
/// (key <= 254 bytes, value <= 1022 bytes).
Restricted,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this "Safe"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might be able to figure out how to detect the kernel's limits and set behavior for Safe accordingly

Restricted,
/// Full Hyper-V wire-format limits
/// (key <= 512 bytes, value <= 2048 bytes).
Full,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this "Unsafe"

fn append(&self, key: &str, value: &str) -> Result<(), KvpError>;

/// Read the value for a key. Returns `Ok(None)` when absent.
fn read(&self, key: &str) -> Result<Option<String>, KvpError>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should document behavior if multiple matches for "key" are found. (last one wins)

fn read(&self, key: &str) -> Result<Option<String>, KvpError>;

/// Remove a key. Returns `true` if the key was present.
fn delete(&self, key: &str) -> Result<bool, KvpError>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly document behavior if multiple matches are found (all are delted)

fn clear(&self) -> Result<(), KvpError>;

/// Return all key-value pairs.
fn entries(&self) -> Result<HashMap<String, String>, KvpError>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get an interface that provides a list of key-values, so we can validate when multiple keys are present?

it'd be a useful interface to test with.

e.g.

append("key", "val1")
append("key", "val2")
append("key", "val3")
assert entries() == {"key": "val3"}
assert dump() == [("key", "val1"), ("key", "val2"), ("key", "val3")]

}

fn is_stale(&self) -> Result<bool, KvpError> {
self.pool_is_stale()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think pool_is_stale() can be moved into here. no need for the jump and "pool" is implied

}

fn is_empty(&self) -> Result<bool, KvpError> {
match std::fs::metadata(&self.path) {
Copy link
Copy Markdown
Contributor

@cjp256 cjp256 Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to check len() == 0?

Ok(metadata.mtime() < boot_time)
}

fn truncate_pool(&self) -> Result<(), KvpError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"_pool" is implied, maybe just remove it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually maybe just move all this into clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants