Conversation
394ad06 to
f71c3fc
Compare
17a35fc to
3ff46de
Compare
b8c0e1c to
bc9d6c7
Compare
3ff46de to
f30ae93
Compare
f30ae93 to
ae4123c
Compare
f7af22b to
9bcd4be
Compare
837d9b5 to
fb32de4
Compare
|
Need more review before merging. |
im doing it rn |
| /// every blk file with it (cycling the key by absolute byte position). This struct reads | ||
| /// the key once on construction and applies it transparently in all read operations. | ||
| /// When `xor.dat` is absent or all-zero the reads are pass-through. | ||
| pub struct BlkFileStore { |
There was a problem hiding this comment.
is it worth to align with lib.rs for consistency? pub(crate) there and pub here
There was a problem hiding this comment.
Isnt that what we already do?
| let obfuscation_key = db | ||
| .get(OBFUSCATE_KEY_KEY) | ||
| .map(|b| b.to_vec()) | ||
| .map(|b| b[1..].to_vec()) |
There was a problem hiding this comment.
what if b is empty? would It be better to add guards?
There was a problem hiding this comment.
If the result is empty there is not much we can do. Panic here seems appropriate bc it suggests level DB is serving corrupt keys (?)
|
|
||
| let spec = DenseBuildSpec { | ||
| blocks_dir, | ||
| range: start_height..end_height + 1, |
There was a problem hiding this comment.
Still new to codebase so just asking
Should this dense range start at start_height without adjusting the block height accessors too? Once the storage is offset from genesis block_of_tx() returns global heights but tx_range_for_block() still indexes block_tx_index as if the first stored block were height 0 so tx_range_for_block(block_of_tx(...)) will panic for tip windows.
There was a problem hiding this comment.
yes your intuition is correct
fb32de4 to
bd29ca9
Compare
13f195e to
67bb4c2
Compare
0xZaddyy
left a comment
There was a problem hiding this comment.
cACK, Great work overall! Just a few concerns to address.
| /// Read an entire blk file into memory, XOR-decrypting from offset 0. | ||
| pub fn read_file(&self, file_no: u32) -> io::Result<Vec<u8>> { | ||
| let mut bytes = std::fs::read(self.blk_path(file_no))?; | ||
| self.apply_xor(&mut bytes, 0); | ||
| Ok(bytes) | ||
| } |
There was a problem hiding this comment.
I’m a bit concerned about memory usage here, since this reads the entire blk file into memory; it could become problematic for large files. It's worth adding a comment about this or a streaming approach.
There was a problem hiding this comment.
Yeah this is a good point. FWIW we have a ticket to address things like this #11
I dont think we need to over optimize here since blk files are 128MB. And since we need to build an index for the entire block we need to read it eitherway. The streaming approach is not a bad idea but its tricky bc you dont want to read half way througha tx for instance. I'll need to think on this a bit more.
| "Indexing {} blocks from tip (datadir: {})", | ||
| depth + 1, | ||
| datadir.display() |
There was a problem hiding this comment.
| "Indexing {} blocks from tip (datadir: {})", | |
| depth + 1, | |
| datadir.display() | |
| "Indexing last {} blocks (depth: {}) from {}", | |
| depth + 1, | |
| depth, | |
| datadir.display() |
| eprintln!(" --blocks-dir <path> Directory containing blk*.dat files"); | ||
| eprintln!(" --range START..END Block range to index (default: 0..10)"); | ||
| eprintln!(" --datadir <path> Bitcoin Core data directory (e.g. ~/.bitcoin/)"); | ||
| eprintln!(" --depth N Number of blocks before tip to index (default: 10)"); |
There was a problem hiding this comment.
| eprintln!(" --depth N Number of blocks before tip to index (default: 10)"); | |
| eprintln!(" --depth N Index from N blocks before tip to current (default: 10)"); |
| /// | ||
| /// Each hint is `(file_no, height_first, height_last)`, sorted by `file_no`. | ||
| pub fn with_file_hints(mut self, hints: Vec<(u32, u32, u32)>) -> Self { | ||
| self.file_hints = hints; |
There was a problem hiding this comment.
What do you think about a sort of validation here?
making sure it's sorted
There was a problem hiding this comment.
In terms of height we can't make any assurances. But file number yes. Perhaps this should be a sorted datastructure e.g heap?
There was a problem hiding this comment.
Since this is expected to be sorted by file_no, i think we can use a data structure that guarantees ordering like BTreeMap if we also want lookup by file_no?
| self.file_hints = hints; | |
| pub fn with_file_hints(mut self, hints: BTreeMap<u32, (u32, u32)>) -> Self { | |
| self.file_hints = hints; |
There was a problem hiding this comment.
yeah that makes sense. If its ok with you I would like to make this a follow up
|
@0xZaddyy thank you for you review. I will get to the suggestions and the CI fix later today or tomorrow |
| pub fn build_indices( | ||
| blocks_dir: impl Into<PathBuf>, | ||
| range: std::ops::Range<u64>, | ||
| file_hints: Vec<(u32, u32, u32)>, |
There was a problem hiding this comment.
random u32's are confusing. Typedefing this would be great. future Pr would be fine
There was a problem hiding this comment.
after this goes in we can create a ticket for it.
| /// every blk file with it (cycling the key by absolute byte position). This struct reads | ||
| /// the key once on construction and applies it transparently in all read operations. | ||
| /// When `xor.dat` is absent or all-zero the reads are pass-through. | ||
| pub struct BlkFileStore { |
There was a problem hiding this comment.
Isnt that what we already do?
| /// | ||
| /// Each hint is `(file_no, height_first, height_last)`, sorted by `file_no`. | ||
| pub fn with_file_hints(mut self, hints: Vec<(u32, u32, u32)>) -> Self { | ||
| self.file_hints = hints; |
There was a problem hiding this comment.
In terms of height we can't make any assurances. But file number yes. Perhaps this should be a sorted datastructure e.g heap?
| /// Read an entire blk file into memory, XOR-decrypting from offset 0. | ||
| pub fn read_file(&self, file_no: u32) -> io::Result<Vec<u8>> { | ||
| let mut bytes = std::fs::read(self.blk_path(file_no))?; | ||
| self.apply_xor(&mut bytes, 0); | ||
| Ok(bytes) | ||
| } |
There was a problem hiding this comment.
Yeah this is a good point. FWIW we have a ticket to address things like this #11
I dont think we need to over optimize here since blk files are 128MB. And since we need to build an index for the entire block we need to read it eitherway. The streaming approach is not a bad idea but its tricky bc you dont want to read half way througha tx for instance. I'll need to think on this a bit more.
And getting block info from some block to some depth by iterating the chain backwards
Add multi-file parser support and sync_from_tip entry point that uses BlockIndex to find the tip, walks back depth blocks, and indexes forward into dense storage.
67bb4c2 to
66f8d20
Compare
| } | ||
|
|
||
| /// Read an entire blk file into memory, XOR-decrypting from offset 0. | ||
| pub fn read_file(&self, file_no: u32) -> io::Result<Vec<u8>> { |
There was a problem hiding this comment.
tried running the example, everything works smothly on regtest but on signet i get an error results below :
signet
tx-indexer on tip-to-depth is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 14s
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/signet --depth 10`
Indexing 11 blocks from tip (datadir: /home/shehu/.bitcoin/signet)
Error: failed to build indices: parse: unexpected eof at offset 3138913497 (len 50331648)
cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10 3.99s user 0.15s system 100% cpu 4.140 total
tx-indexer on tip-to-depth is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 4s
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/signet --depth 10`
Indexing 11 blocks from tip (datadir: /home/shehu/.bitcoin/signet)
Error: failed to build indices: parse: unexpected eof at offset 3138913497 (len 50331648)
cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10 3.98s user 0.14s system 100% cpu 4.125 total
regtest
tx-indexer on tip-to-depth is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 6s
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/regtest --depth 100
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s
Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/regtest --depth 100`
Indexing 101 blocks from tip (datadir: /home/shehu/.bitcoin/regtest)
Indexed 112 transactions in 101 blocks (478.39ms)
--- RBF signaling analysis (2.88ms) ---
Transactions signaling RBF: 11/112 (9.8%)
cargo run --example indexer -- --datadir ~/.bitcoin/regtest --depth 100 0.51s user 0.09s system 96% cpu 0.618 total
seems this happens on mainnet too: #44
There was a problem hiding this comment.
I fixed the mainnet syncing by supporting xor de/obfuscation. Are you xor'ing datadir? Its on by default
We should passing input datadir, where we want the data to go and depth. Everything else can be abstract away in the internal methods
If requested depth exceeds chain length return a specific error instead of key not found.
Ensure we can parse multiple blk files.
66f8d20 to
f4bfe57
Compare
|
Going to merge and ticket the follow ups |
This Pr just focuses on index building from the tip to some desired depth by first identifying the best block and working backwards to the depth and parsing from there. Everything in the dense id system work as is, treating the nth block at if it was genesis. i.e everything is stil the ith order just relative to the depth at which we sync at.