Document how to use a component reading to create a new source#131
Document how to use a component reading to create a new source#131
Conversation
…ore efficient source
|
Interesting! Could give a big speedup in some cases. However, I'd like to suggest a modification. Currently the probability of sampling neutrons from point If we have a source distribution such that the probability density of Intuitively, with the new source we only increase the probability of sampling in the region where we actually need to sample, but in that region, the ratio of the probabilities of sampling two points does not change. After applying the gaussian kernel to I'm imagining it would look something like this: def source_from_reading(\n",
reading,
original_source,
neutrons: int,
time_smooth_kernel: sc.Variable | None = None,
wavelength_smooth_kernel: sc.Variable | None = None,
):
from scipp.scipy.ndimage import gaussian_filter
p = reading.data.hist(wavelength=original_source.coords['wavelength'], birth_time=original_source.coords['birth_time'])
if time_smooth_kernel is None:
time_smooth_kernel = 2 * time_binning
if wavelength_smooth_kernel is None:
wavelength_smooth_kernel = 2 * wavelength_binning
p = gaussian_filter(
p,
sigma={
'birth_time': time_smooth_kernel,
'wavelength': wavelength_smooth_kernel,
},
)
q = original_source * (p > 1e-8) / (original_source * (p > 1e-8)).sum()
return tof.Source.from_distribution(neutrons=neutrons, p=q)" |
Yes that's it. Because then the distribution in the detector will be unchanged, assuming we sampled a reasonable number of points to figure out what region of the source makes it to the detector.
I don't understand why we want the shape of the new source to be independent of the original source. If we want the new source to be finer than the original source, then we can first up-sample the original source (for example by using scipy.ndimage.zoom, or scipp.rebin) to the desired shape, and then do the same thing. Making the new source coarser than the original source does not seem useful, but I might be missing something. |
|
I meant that to do an operation such as In my current version, I had missed that in the code you proposed, you suggested to use for histogramming the bin edges from the original source, which ensures consistency. |
|
@jokasimr I had an alternative idea for masking the original source distribution. I am wondering if instead of creating a new source, this couldn't be something we always make available in the Can we make something where we try with a fraction of the neutrons requested by the user, and every time a neutron gets blocked by a component, we lower the probability distribution in the square (lambda, birth_time) where that neutron originated from by a small factor (maybe something related to how many neutrons were requested, to avoid having that value as yet another free parameter). So the squares that are producing many blocked neutrons would gradually see their probability fall to zero, but we don't want to mask the square entirely if only one neutron from there was blocked. We could use chunks of neutrons of size [30%, 25%, 20%, 15%, 10%] so that we learn from a large chunk to begin with, and then keep refining as we go along. But I expect that the last 3 chunks would contain very few blocked neutrons? (according to Peter, this is called 'adaptive importance sampling'). Finally, this could be an even bigger advantage for pulse skipping instruments. Instead of making a probability distribution for just one pulse, we concatenate it over the number of pulses, and sample from that. The gradual masking would mean that the probability inside every other pulse would drop drastically uniformly as all neutrons there get blocked, and so we don't waste compute by sampling neutrons from those blocked pulses. One caveat is that sampling this way means that we will no longer necessarily have exactly the same number of neutrons inside each pulse (I haven't found a good way to do so at least). It would be close but not exact, as it is now. If we cannot fix this, it would mean we would have to return the results as a DataGroup with one entry per pulse, instead of a 2D DataArray with a |
We could just group/bin the events per pulse and return the binned data, no? Then the number of events don't have to be the same for each pulse.
How about we specify the target of optimization? like, we only sample the events that survive at a certain point. Sth like... Then we won't need general sampling optimization like importance sampling, |
That's an option for sure. But then users would have to know a bit about binned data to work with it, which may be more of a learning curve.
Yes, optimizing up to a component is definitely something we want to have, like in the case of T-rex 👍 |
|
@bingli621 also brought up that it might be easy to make mistakes using this optimization, for example if the user uses the optimization Is there some way we can avoid that? For example, if selecting |
|
The adaptive approach sounds really nice. But it is a bit complicated. Running with 30% of all neutrons in the first test will limit the speedup we can achieve to ~3x. Something like:
But that would of course be yet a bit more complicated. |
I don't see a good way to avoid this. Removing detectors would be possible, but you can just as easily request a Chopper reading and that would also be 'wrong' (or not properly flooded with neutrons). I think we need to basically make it clear what the optimization does, and assume that the users are intelligent people that understand that if you optimize for the detector, then things might not be accurate for a monitor placed way before along the beam path. |
30% was just a guess, I have not tested it. I suggested it to Claude who thought that it was a good idea, but then I asked what about using a small chunk first (e.g. 5%) and it thought it also sounded like a good idea 🙄 (if we added the notion of 'confidence' when masking parts of the original source). So it all remains to be tested. It could be that instead of decreasing chunks we could go with increasing chunks, from 10% up to 30%? |

We add section to the docs where we show how to use a component reading to create a more efficient source.
We histogram the neutrons at the detector in wavelength and birth_time, smooth the histogram, and use it as a new distribution to sample from.
Can speed up calculations.
We should consider making the final function into a
classmethodof theSource:Source.from_reading(...).