FEAT: Improve interactions of user pools#1009
Conversation
mj-will
left a comment
There was a problem hiding this comment.
Looks great, I look forward to trying it! I've added some initial comments but might need to have a second look.
| parameters=priors.sample(), | ||
| ) as _pool: | ||
| start_time = datetime.datetime.now() | ||
| sampler.pool = _pool |
There was a problem hiding this comment.
Is this safe to do? Depending on how the sampler uses the pool, are there settings where the pool the sampler has stored is not updated?
For example, if the sampler has constructed a likelihood using the pool.map from the initial input pool will this break?
There was a problem hiding this comment.
I'm trying to work through what this would look like.
If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.
One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.
There was a problem hiding this comment.
Coming back to this, would it make sense to have an update_pool method or similar? That could have some additional logic to prevent multiple pools etc.
There was a problem hiding this comment.
Here's an example of what I had in mind: https://github.com/mj-will/aspire-bilby/blob/main/src/aspire_bilby/plugin.py#L199
self.pool is used in to create the likelihood function for aspire.
Looking at this now, since it happens in run_sampler perhaps this isn't actually an issue in this specific case. What do you think?
| npool=None, | ||
| pool=None, | ||
| parameters=None, | ||
| ): |
There was a problem hiding this comment.
I think this would benefit from a doc-string with example usage.
ColmTalbot
left a comment
There was a problem hiding this comment.
I don't remember exactly what the behaviour for some of these questions is, so I'll go back and check, and ideally write docstrings about them.
| parameters=priors.sample(), | ||
| ) as _pool: | ||
| start_time = datetime.datetime.now() | ||
| sampler.pool = _pool |
There was a problem hiding this comment.
I'm trying to work through what this would look like.
If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.
One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.
mj-will
left a comment
There was a problem hiding this comment.
Forgot to submit my comments as a review, but see them above.
|
|
||
| _pool = multiprocessing.Pool( | ||
| processes=npool, | ||
| initializer=_initialize_global_variables, |
There was a problem hiding this comment.
I think it would be useful to allow users to specify their own (additional) initializer.
For nessai, there's another variable that needs to be global for the pool to work and that can't be intialized unless using the fork start method.
There was a problem hiding this comment.
Does this mean that we would also need to be able to allow users to pass through arbitrary additional arguments. At that stage, is it easier to just describe how to implement an independent version?
There was a problem hiding this comment.
Yes, that's true.
Here's what I had to do for nessai:
- https://github.com/bilby-dev/nessai-bilby/blob/main/src/nessai_bilby/plugin.py#L29
- https://github.com/bilby-dev/nessai-bilby/blob/main/src/nessai_bilby/plugin.py#L362
It works fine but since it duplicates so much of the code, it needs to be kept up-to-date with the main code base.
adivijaykumar
left a comment
There was a problem hiding this comment.
Can we add a test for schwimmbad/MPI support?
|
We should documentation for user pools and note that the pool will only work correctly in the initializer has been called. |
I noticed that it was difficult to pass a user-specified pool through run_sampler, and it isn't used at all in the post processing. This PR:
with multiprocessing.pool()...)Sampler._setup_poolandSampler._close_poolparallel_bilbymoot.