Conversation
janfb
left a comment
There was a problem hiding this comment.
Awesome, looks very good already.
I was wondering whether we would also need to refactor process_simulator accordingly? e.g., we can likely drop the internal torch wrapping and accept numpy simulators (because we wrap to numpy as again). Would have to look into it in detail but I think we can simplify the function a bit.
tomMoral
left a comment
There was a problem hiding this comment.
A few comments.
A question: is it possible to use a return type which is str? It is unclear to me where this would go in the current sbi
| seed: Optional[int] = None, | ||
| show_progress_bar: bool = True, | ||
| ) -> Tuple[Tensor, Tensor]: | ||
| ) -> Tuple[Tensor, Tensor | List[str] | str]: |
There was a problem hiding this comment.
Shouldn't it be:
| ) -> Tuple[Tensor, Tensor | List[str] | str]: | |
| ) -> Tuple[Theta, X]: |
There was a problem hiding this comment.
Yes, using the already-defined Theta and X type aliases here would be more consistent and readable, since they already capture all the possible return types.
| UserWarning, | ||
| stacklevel=2, | ||
| ) | ||
| batches = [theta for theta in thetas] |
There was a problem hiding this comment.
Why? In this context, we could use the batch parameter of joblib (useful if initialization of the simulator is slow)
There was a problem hiding this comment.
Good point. Using joblib's batch parameter would be more efficient here, especially when simulator initialization is expensive. The current loop adds unnecessary overhead.
| ) | ||
| batches = [theta for theta in thetas] | ||
| else: | ||
| batches = [theta for theta in thetas] |
There was a problem hiding this comment.
Unclear why not keep batches = thetas? The current solution duplicates the theta in memory, which can be a large overhead for large number of simulation (probably for later version, right now everything needs to fit in memory).
There was a problem hiding this comment.
Agreed. The list comprehension duplicates theta in memory unnecessarily. batches = thetas would avoid that overhead entirely, which becomes significant for large simulation counts.
| num_batches = ( | ||
| num_simulations + simulation_batch_size - 1 | ||
| ) // simulation_batch_size | ||
| batches = [ |
There was a problem hiding this comment.
I would use a iterator if possible, once again to avoid filling the memory and duplicating. in particular if this is a tensor, use split?
There was a problem hiding this comment.
Using torch.split(thetas, simulation_batch_size) would return views rather than copying data, keeping peak memory constant regardless of dataset size. This is especially important for large-scale workflows.
|
|
||
| # Run in parallel | ||
| # Generate seeds | ||
| batch_seeds = np.random.randint(low=0, high=1_000_000, size=(len(batches),)) |
There was a problem hiding this comment.
We need to be able to seed this part no?
There was a problem hiding this comment.
Agreed, the seed for generating batch_seeds should itself be derived from the user-provided seed argument to ensure full reproducibility. Currently if a user passes seed=42 the batch seed generation is still non-deterministic.
| context = parallel_config(n_jobs=num_workers) | ||
|
|
||
| with context: |
There was a problem hiding this comment.
| context = parallel_config(n_jobs=num_workers) | |
| with context: | |
| with parallel_config(n_jobs=num_workers): |
|
|
||
| # 1. No arguments | ||
| @parallelize_simulator | ||
| def decorated_sim(theta): |
There was a problem hiding this comment.
Why not use simple_simulator_torch which is the same?
There was a problem hiding this comment.
Good catch. Reusing simple_simulator_torch here would reduce duplication in the test suite and make it clearer that the decorator test is testing the wrapping behaviour rather than a specific simulator implementation.
This PR refactors the simulation utilities to improve parallel execution efficiency and modularity.
New
parallelize_simulatordecorator/function: introduces a tool that wraps any simulator to enable parallel execution usingjoblib.New
simulate_from_thetasutility: added a lightweight wrapper aroundparallelize_simulatorthat directly executes simulations for a given set of parameters (thetas).Refactored
simulate_for_sbi(backward compatibility): updated the internal implementation ofsimulate_for_sbito leverage the newsimulate_from_thetasbackend.