Skip to content

Tip to depth#39

Merged
arminsabouri merged 7 commits intomasterfrom
tip-to-depth
Apr 17, 2026
Merged

Tip to depth#39
arminsabouri merged 7 commits intomasterfrom
tip-to-depth

Conversation

@arminsabouri
Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri commented Apr 1, 2026

This Pr just focuses on index building from the tip to some desired depth by first identifying the best block and working backwards to the depth and parsing from there. Everything in the dense id system work as is, treating the nth block at if it was genesis. i.e everything is stil the ith order just relative to the depth at which we sync at.

@arminsabouri arminsabouri force-pushed the block-indexer branch 2 times, most recently from b8c0e1c to bc9d6c7 Compare April 3, 2026 14:17
Base automatically changed from block-indexer to master April 3, 2026 14:53
@arminsabouri arminsabouri marked this pull request as ready for review April 3, 2026 15:09
@arminsabouri arminsabouri force-pushed the tip-to-depth branch 3 times, most recently from f7af22b to 9bcd4be Compare April 3, 2026 20:37
@arminsabouri arminsabouri force-pushed the tip-to-depth branch 3 times, most recently from 837d9b5 to fb32de4 Compare April 8, 2026 01:38
Comment thread src/crates/block-index/src/lib.rs
@arminsabouri
Copy link
Copy Markdown
Collaborator Author

Need more review before merging.

@bc1cindy
Copy link
Copy Markdown
Contributor

bc1cindy commented Apr 8, 2026

Need more review before merging.

im doing it rn

Copy link
Copy Markdown
Contributor

@bc1cindy bc1cindy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks great!

Comment thread src/crates/primitives/src/blk_file.rs Outdated
/// every blk file with it (cycling the key by absolute byte position). This struct reads
/// the key once on construction and applies it transparently in all read operations.
/// When `xor.dat` is absent or all-zero the reads are pass-through.
pub struct BlkFileStore {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth to align with lib.rs for consistency? pub(crate) there and pub here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt that what we already do?

let obfuscation_key = db
.get(OBFUSCATE_KEY_KEY)
.map(|b| b.to_vec())
.map(|b| b[1..].to_vec())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if b is empty? would It be better to add guards?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the result is empty there is not much we can do. Panic here seems appropriate bc it suggests level DB is serving corrupt keys (?)

Comment thread examples/indexer.rs Outdated
Comment thread src/crates/primitives/src/unified/mod.rs Outdated

let spec = DenseBuildSpec {
blocks_dir,
range: start_height..end_height + 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still new to codebase so just asking

Should this dense range start at start_height without adjusting the block height accessors too? Once the storage is offset from genesis block_of_tx() returns global heights but tx_range_for_block() still indexes block_tx_index as if the first stored block were height 0 so tx_range_for_block(block_of_tx(...)) will panic for tip windows.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes your intuition is correct

@arminsabouri arminsabouri requested a review from bc1cindy April 15, 2026 18:41
@arminsabouri arminsabouri force-pushed the tip-to-depth branch 2 times, most recently from 13f195e to 67bb4c2 Compare April 15, 2026 18:56
Copy link
Copy Markdown
Contributor

@0xZaddyy 0xZaddyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cACK, Great work overall! Just a few concerns to address.

Comment on lines +39 to +44
/// Read an entire blk file into memory, XOR-decrypting from offset 0.
pub fn read_file(&self, file_no: u32) -> io::Result<Vec<u8>> {
let mut bytes = std::fs::read(self.blk_path(file_no))?;
self.apply_xor(&mut bytes, 0);
Ok(bytes)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit concerned about memory usage here, since this reads the entire blk file into memory; it could become problematic for large files. It's worth adding a comment about this or a streaming approach.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point. FWIW we have a ticket to address things like this #11
I dont think we need to over optimize here since blk files are 128MB. And since we need to build an index for the entire block we need to read it eitherway. The streaming approach is not a bad idea but its tricky bc you dont want to read half way througha tx for instance. I'll need to think on this a bit more.

Comment thread examples/indexer.rs Outdated
Comment on lines +48 to +50
"Indexing {} blocks from tip (datadir: {})",
depth + 1,
datadir.display()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Indexing {} blocks from tip (datadir: {})",
depth + 1,
datadir.display()
"Indexing last {} blocks (depth: {}) from {}",
depth + 1,
depth,
datadir.display()

Comment thread examples/indexer.rs Outdated
eprintln!(" --blocks-dir <path> Directory containing blk*.dat files");
eprintln!(" --range START..END Block range to index (default: 0..10)");
eprintln!(" --datadir <path> Bitcoin Core data directory (e.g. ~/.bitcoin/)");
eprintln!(" --depth N Number of blocks before tip to index (default: 10)");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eprintln!(" --depth N Number of blocks before tip to index (default: 10)");
eprintln!(" --depth N Index from N blocks before tip to current (default: 10)");

///
/// Each hint is `(file_no, height_first, height_last)`, sorted by `file_no`.
pub fn with_file_hints(mut self, hints: Vec<(u32, u32, u32)>) -> Self {
self.file_hints = hints;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about a sort of validation here?
making sure it's sorted

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of height we can't make any assurances. But file number yes. Perhaps this should be a sorted datastructure e.g heap?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is expected to be sorted by file_no, i think we can use a data structure that guarantees ordering like BTreeMap if we also want lookup by file_no?

Suggested change
self.file_hints = hints;
pub fn with_file_hints(mut self, hints: BTreeMap<u32, (u32, u32)>) -> Self {
self.file_hints = hints;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense. If its ok with you I would like to make this a follow up

Comment thread .gitignore
@arminsabouri
Copy link
Copy Markdown
Collaborator Author

@0xZaddyy thank you for you review. I will get to the suggestions and the CI fix later today or tomorrow

pub fn build_indices(
blocks_dir: impl Into<PathBuf>,
range: std::ops::Range<u64>,
file_hints: Vec<(u32, u32, u32)>,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random u32's are confusing. Typedefing this would be great. future Pr would be fine

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this goes in we can create a ticket for it.

/// every blk file with it (cycling the key by absolute byte position). This struct reads
/// the key once on construction and applies it transparently in all read operations.
/// When `xor.dat` is absent or all-zero the reads are pass-through.
pub struct BlkFileStore {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt that what we already do?

Comment thread src/crates/primitives/src/blk_file.rs Outdated
Comment thread .gitignore
///
/// Each hint is `(file_no, height_first, height_last)`, sorted by `file_no`.
pub fn with_file_hints(mut self, hints: Vec<(u32, u32, u32)>) -> Self {
self.file_hints = hints;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of height we can't make any assurances. But file number yes. Perhaps this should be a sorted datastructure e.g heap?

Comment on lines +39 to +44
/// Read an entire blk file into memory, XOR-decrypting from offset 0.
pub fn read_file(&self, file_no: u32) -> io::Result<Vec<u8>> {
let mut bytes = std::fs::read(self.blk_path(file_no))?;
self.apply_xor(&mut bytes, 0);
Ok(bytes)
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point. FWIW we have a ticket to address things like this #11
I dont think we need to over optimize here since blk files are 128MB. And since we need to build an index for the entire block we need to read it eitherway. The streaming approach is not a bad idea but its tricky bc you dont want to read half way througha tx for instance. I'll need to think on this a bit more.

And getting block info from some block to some depth
by iterating the chain backwards
Add multi-file parser support and sync_from_tip entry point that uses BlockIndex to find the tip, walks back depth blocks, and indexes forward into dense storage.
}

/// Read an entire blk file into memory, XOR-decrypting from offset 0.
pub fn read_file(&self, file_no: u32) -> io::Result<Vec<u8>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried running the example, everything works smothly on regtest but on signet i get an error results below :
signet

tx-indexer on  tip-to-depth is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 14s 
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/signet --depth 10`
Indexing 11 blocks from tip (datadir: /home/shehu/.bitcoin/signet)
Error: failed to build indices: parse: unexpected eof at offset 3138913497 (len 50331648)
cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10  3.99s user 0.15s system 100% cpu 4.140 total

tx-indexer on  tip-to-depth is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 4s 
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/signet --depth 10`
Indexing 11 blocks from tip (datadir: /home/shehu/.bitcoin/signet)
Error: failed to build indices: parse: unexpected eof at offset 3138913497 (len 50331648)
cargo run --example indexer -- --datadir ~/.bitcoin/signet --depth 10  3.98s user 0.14s system 100% cpu 4.125 total

regtest

tx-indexer on  tip-to-depth is 📦 v0.1.0 via 🦀 v1.92.0-nightly took 6s 
❯ time cargo run --example indexer -- --datadir ~/.bitcoin/regtest --depth 100           
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/examples/indexer --datadir /home/shehu/.bitcoin/regtest --depth 100`
Indexing 101 blocks from tip (datadir: /home/shehu/.bitcoin/regtest)
Indexed 112 transactions in 101 blocks (478.39ms)

--- RBF signaling analysis (2.88ms) ---
Transactions signaling RBF: 11/112 (9.8%)
cargo run --example indexer -- --datadir ~/.bitcoin/regtest --depth 100  0.51s user 0.09s system 96% cpu 0.618 total

seems this happens on mainnet too: #44

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the mainnet syncing by supporting xor de/obfuscation. Are you xor'ing datadir? Its on by default

We should passing input datadir, where we want the data to go
and depth. Everything else can be abstract away in the internal methods
If requested depth exceeds chain length return a specific error
instead of key not found.
Ensure we can parse multiple blk files.
@arminsabouri
Copy link
Copy Markdown
Collaborator Author

Going to merge and ticket the follow ups

@arminsabouri arminsabouri merged commit 44a831b into master Apr 17, 2026
3 checks passed
@arminsabouri arminsabouri deleted the tip-to-depth branch April 17, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants