Skip to content

FEAT: Improve interactions of user pools#1009

Open
ColmTalbot wants to merge 22 commits into
bilby-dev:mainfrom
ColmTalbot:user-pools
Open

FEAT: Improve interactions of user pools#1009
ColmTalbot wants to merge 22 commits into
bilby-dev:mainfrom
ColmTalbot:user-pools

Conversation

@ColmTalbot
Copy link
Copy Markdown
Collaborator

I noticed that it was difficult to pass a user-specified pool through run_sampler, and it isn't used at all in the post processing. This PR:

  • unifies how to create a pool for Bilby with a context (a la with multiprocessing.pool()...)
  • explicitly passes the user pool around including for post processing
  • moves logic out of Sampler._setup_pool and Sampler._close_pool
  • support MPI pools using schwimmbad. This change will effectively make parallel_bilby moot.

@ColmTalbot ColmTalbot marked this pull request as draft October 30, 2025 14:22
@ColmTalbot ColmTalbot marked this pull request as ready for review November 3, 2025 16:03
@ColmTalbot ColmTalbot added >100 lines refactoring sampling Issues about sampling algorithms, their efficiency and robustness labels Nov 4, 2025
Copy link
Copy Markdown
Collaborator

@mj-will mj-will left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I look forward to trying it! I've added some initial comments but might need to have a second look.

Comment thread bilby/core/sampler/__init__.py
parameters=priors.sample(),
) as _pool:
start_time = datetime.datetime.now()
sampler.pool = _pool
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do? Depending on how the sampler uses the pool, are there settings where the pool the sampler has stored is not updated?

For example, if the sampler has constructed a likelihood using the pool.map from the initial input pool will this break?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to work through what this would look like.

If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.

One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this, would it make sense to have an update_pool method or similar? That could have some additional logic to prevent multiple pools etc.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of what I had in mind: https://github.com/mj-will/aspire-bilby/blob/main/src/aspire_bilby/plugin.py#L199

self.pool is used in to create the likelihood function for aspire.

Looking at this now, since it happens in run_sampler perhaps this isn't actually an issue in this specific case. What do you think?

Comment thread bilby/core/sampler/ptemcee.py
npool=None,
pool=None,
parameters=None,
):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would benefit from a doc-string with example usage.

Comment thread bilby/core/utils/parallel.py
Comment thread bilby/core/result.py Outdated
Comment thread bilby/gw/conversion.py Outdated
Comment thread bilby/gw/conversion.py Outdated
Copy link
Copy Markdown
Collaborator Author

@ColmTalbot ColmTalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly what the behaviour for some of these questions is, so I'll go back and check, and ideally write docstrings about them.

Comment thread bilby/core/sampler/ptemcee.py
Comment thread bilby/core/sampler/__init__.py
parameters=priors.sample(),
) as _pool:
start_time = datetime.datetime.now()
sampler.pool = _pool
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to work through what this would look like.

If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.

One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.

@ColmTalbot ColmTalbot requested a review from mj-will January 26, 2026 16:30
@ColmTalbot ColmTalbot added this to the 3.0.0 milestone Jan 29, 2026
@mj-will mj-will requested a review from a team April 30, 2026 14:43
Copy link
Copy Markdown
Collaborator

@mj-will mj-will left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to submit my comments as a review, but see them above.

@mj-will mj-will requested a review from a team April 30, 2026 14:48

_pool = multiprocessing.Pool(
processes=npool,
initializer=_initialize_global_variables,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to allow users to specify their own (additional) initializer.

For nessai, there's another variable that needs to be global for the pool to work and that can't be intialized unless using the fork start method.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we would also need to be able to allow users to pass through arbitrary additional arguments. At that stage, is it easier to just describe how to implement an independent version?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true.

Here's what I had to do for nessai:

It works fine but since it duplicates so much of the code, it needs to be kept up-to-date with the main code base.

@mj-will mj-will requested a review from a team April 30, 2026 16:03
@mj-will mj-will added the to discuss To be discussed on an upcoming call label Apr 30, 2026
Copy link
Copy Markdown
Collaborator

@adivijaykumar adivijaykumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for schwimmbad/MPI support?

@mj-will
Copy link
Copy Markdown
Collaborator

mj-will commented May 14, 2026

We should documentation for user pools and note that the pool will only work correctly in the initializer has been called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring sampling Issues about sampling algorithms, their efficiency and robustness to discuss To be discussed on an upcoming call >100 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants