Skip to content

Parallel simulation refactoring#1738

Open
bmalezieux wants to merge 6 commits intosbi-dev:mainfrom
bmalezieux:parallel-simulation-refactoring
Open

Parallel simulation refactoring#1738
bmalezieux wants to merge 6 commits intosbi-dev:mainfrom
bmalezieux:parallel-simulation-refactoring

Conversation

@bmalezieux
Copy link
Copy Markdown

@bmalezieux bmalezieux commented Jan 22, 2026

This PR refactors the simulation utilities to improve parallel execution efficiency and modularity.

  • New parallelize_simulator decorator/function: introduces a tool that wraps any simulator to enable parallel execution using joblib.

  • New simulate_from_thetas utility: added a lightweight wrapper around parallelize_simulator that directly executes simulations for a given set of parameters (thetas).

  • Refactored simulate_for_sbi (backward compatibility): updated the internal implementation of simulate_for_sbi to leverage the new simulate_from_thetas backend.

Copy link
Copy Markdown
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks very good already.

I was wondering whether we would also need to refactor process_simulator accordingly? e.g., we can likely drop the internal torch wrapping and accept numpy simulators (because we wrap to numpy as again). Would have to look into it in detail but I think we can simplify the function a bit.

@bmalezieux bmalezieux requested a review from janfb January 26, 2026 16:21
@bmalezieux bmalezieux marked this pull request as ready for review January 26, 2026 16:21
Copy link
Copy Markdown
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

A question: is it possible to use a return type which is str? It is unclear to me where this would go in the current sbi

seed: Optional[int] = None,
show_progress_bar: bool = True,
) -> Tuple[Tensor, Tensor]:
) -> Tuple[Tensor, Tensor | List[str] | str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be:

Suggested change
) -> Tuple[Tensor, Tensor | List[str] | str]:
) -> Tuple[Theta, X]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using the already-defined Theta and X type aliases here would be more consistent and readable, since they already capture all the possible return types.

UserWarning,
stacklevel=2,
)
batches = [theta for theta in thetas]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? In this context, we could use the batch parameter of joblib (useful if initialization of the simulator is slow)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Using joblib's batch parameter would be more efficient here, especially when simulator initialization is expensive. The current loop adds unnecessary overhead.

)
batches = [theta for theta in thetas]
else:
batches = [theta for theta in thetas]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why not keep batches = thetas? The current solution duplicates the theta in memory, which can be a large overhead for large number of simulation (probably for later version, right now everything needs to fit in memory).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The list comprehension duplicates theta in memory unnecessarily. batches = thetas would avoid that overhead entirely, which becomes significant for large simulation counts.

num_batches = (
num_simulations + simulation_batch_size - 1
) // simulation_batch_size
batches = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a iterator if possible, once again to avoid filling the memory and duplicating. in particular if this is a tensor, use split?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using torch.split(thetas, simulation_batch_size) would return views rather than copying data, keeping peak memory constant regardless of dataset size. This is especially important for large-scale workflows.


# Run in parallel
# Generate seeds
batch_seeds = np.random.randint(low=0, high=1_000_000, size=(len(batches),))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to seed this part no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the seed for generating batch_seeds should itself be derived from the user-provided seed argument to ensure full reproducibility. Currently if a user passes seed=42 the batch seed generation is still non-deterministic.

Comment on lines +79 to +81
context = parallel_config(n_jobs=num_workers)

with context:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
context = parallel_config(n_jobs=num_workers)
with context:
with parallel_config(n_jobs=num_workers):


# 1. No arguments
@parallelize_simulator
def decorated_sim(theta):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use simple_simulator_torch which is the same?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Reusing simple_simulator_torch here would reduce duplication in the test suite and make it clearer that the decorator test is testing the wrapping behaviour rather than a specific simulator implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants