Skip to content

Train-time data-augmentation for parameterised learning #68

@GilesStrong

Description

@GilesStrong

Overview

Parameterised learning is useful in HEP, for example in cases where a classifier should learn multiple signal hypotheses (e.g. a heavy Higgs of several possible masses) see Baldi et al., 2016.

In this example the signal would have a parameterised input equal to the true resonant mass, and the background would be randomly assigned resonant masses. Once trained, the entire dataset can be set to a particular resonant mass in order to perform inference for a given hypothesis. This last part is already possible with the ParametrisedPrediction class.

Data augmentation for parameterised learning

Currently the random assignment of parameterised-feature values for background (in the example above) is performed once when preparing the data for training. It could well be possible that it is useful to perform this random assignment during training, which may provide some of the benefits of train-time data augmentation.

Implementation

To avoid conflicts with HEPAugFoldYielder, and due to the fact that this only wants to be performed during training, this secondary form of augmentation should probably implemented as a callback. It also needs to account for the possibility that multiple parameterisation features may be used, and that only a subset of the data may need to be changed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlow priorityNot urgent and won't degrade with time

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions