OneStopFitting is a software package for Gaussian Process (GP) background estimation. Estimation pipelines are constructed using a yaml-based configuration, which can then be run either locally or distributed over condor. It processes one and two-dimensional histograms to generate background estimates and uncertainties for use in the Combine statistical framework. If includes a large number of prebuilt kernels, models, means, and transformations. Also provided is a substantial suite of diagnostic information, including plots, metrics, and summary reports.
There is also a system for post-processing combine results, generating limit/significance plots, GOF examinations, likelihood scans, etc.
The codebase is organized into modules under the fitting/ directory:
core/: Core data structures and transforms.combine/: Interface for the Combine framework.data/: Core data manipulation.diagnostics/: Plotting other tools for examining results.distributed/: Parameter sweeps and HTCondor support.inference/: Models, kernels, means, optimization, etc. The meat and potatoes.steps/: Pipeline execution steps.
The project uses uv for dependency management and supports Apptainer/Singularity containers.
To install:
uv venv
uv syncThe setup.sh script initializes the environment on clusters with CVMFS access. Python 3.11+ is required.
The entry point is python -m fitting.
Executes the fitting pipeline.
Arguments:
--config,-c: Path to the YAML configuration file.--background,-b: Path to the background histogram (.pklz4).--signal,-s: Path to the signal histogram (.pklz4).--output,-o: Output directory.--injection-rate: Signal injection rate multiplier.--start-from: Pipeline phase to resume from (LOAD,FIT,COMBINE).
python -m fitting run \
--config resources/smoothing_configs/Signal312/comp.yaml \
--background subsetexported/2018/Signal312/qcd_inclusive_2018/comp_mStop_vs_mChiRatio.pklz4 \
--injection-rate 0.1 \
--output output/2018/qcd_inclusive/compGenerates sampled backgrounds from the latent distribution.
Arguments:
--state,-s: Path to thestate.pklz4file.--output-dir,-o: Output directory for generated frames.--name,-n: Prefix for naming extractions.--num-samples: Number of posterior draws.--include-smooth: Output the GPR mean.
python -m fitting smooth \
--state output/2018/qcd_inclusive/comp/state.pklz4 \
--name qcd_smoothed_category \
--output-dir smoothed_outputs/ \
--num-samples 10Aggregates metrics from summary.json files and generates summary plots.
Arguments:
--metric,-m: Path to the metric insummary.json.--output,-o: Visualization directory.--formats,-f: Image formats (e.g.,png,pdf).--smooth-sigma: Gaussian filter sigma for outputs.--cmap: Matplotlib colormap.
python -m fitting aggregate-plot \
"output/2018/qcd_inclusive/comp/**/summary.json" \
--metric metrics.blinded_chi2_per_bin \
--output diagnostic_plots/ \
--formats png \
--formats pdf \
--smooth-sigma 1.5Generates PDF reports from summary.json files.
Arguments:
--input,-i: Input paths or glob strings forsummary.json.--output,-o: Output directory.--single-document: Combines outputs into one document.--latex-engine: LaTeX engine (pdflatex,xelatex).
python -m fitting report \
--input "output/2018/qcd_inclusive/comp/**/summary.json" \
--output report_output/ \
--single-document \
--latex-engine pdflatexGenerate submission files for cluster execution.
Arguments (makecondor):
--signal: Glob pattern for signal templates.--background: Glob pattern for background templates.--years: Data years.--subdir-format: Output directory format string.--output: Output directory for submission files.
python -m fitting makecondor \
--signal "subsetexported/2018/Signal312/**/signal_*.pklz4" \
--background "subsetexported/2018/Signal312/**/qcd_inclusive*.pklz4" \
--years 2018 \
--pipelines smoothing \
--subdir-format "{era.name}/{dataset_name}" \
--output condor_submit_filesArguments (makebatch):
- Features the same arguments as
makecondorwith parameter sweep support. --rates: Injection rates to sweep.--rebin: Rebin factors to sweep.--window-spread: Window spread values.--config-base: Base configuration template.
python -m fitting makebatch \
--signal "subsetexported/2018/Signal312/**/signal_*.pklz4" \
--background "subsetexported/2018/Signal312/**/qcd_inclusive*.pklz4" \
--years 2018 \
--pipelines smoothing \
--config-base resources/smoothing_configs/Signal312/comp.yaml \
--rates "0.0,0.1,0.5" \
--rebin "1,2" \
--output batch_submit_filesExtracts Combine results and updates summary.json files.
Arguments:
<summaries>: Paths tosummary.jsonfiles.
python -m fitting harvest output/2018/qcd_inclusive/comp/**/summary.jsonModels and inference methods are implemented in fitting.inference.
ExactGPConfig: ExactSparseGPConfig: Spare matrix approximations for large datasets.VariationalGPConfig: Stochastic variational inference with learnable inducing points.MultiFidelityGPConfig: Experimental multi-fidelity gaussian process based on QCD simulation.QCDPriorGPConfig: Bayesian workflow using hyperpriors from MC.
Kernels include options from GPJax and custom extensions.
- Standard:
RBF,Matern12/Matern32/Matern52,RationalQuadratic,Polynomial,Periodic,Linear,White. - Composites:
SumKernelConfig,ProductKernelConfig,ScaledKernelConfig. - Neural Network Kernel: Dense neural network applied before a base kernel. Comes in warping and absolute versions.
MCEnsembleKernel: Covariance matrix derived from systematic variations.MultiFidelityResidualKernel: Used in the multi-fidelity model.
Available mean functions:
ZeroMeanConfig,ConstantMeanConfig: Standard means.PolynomialBackgroundMeanConfig,ParametricBackgroundMeanConfig,SignalTemplateMeanConfig: Parametric backgrounds.DoubleSidedCrystalBallMeanConfig,GaussianBumpMeanConfig: Resonance structures.QCDMCMeanConfig: Mean derived from MC.LookupTableMeanConfig,InterpolatedMeanConfig: Pre-specified means, can be useful as part of more complex pipelines.
OPTIMIZATION: MLE or MAP estimates using Optax minimizers (Adam, AdamW, SGD).TWO_STAGE: Iterative procedure for kernel and mean function optimization. First tries to fit the mean to learn large scale structure, then uses GPR to learn residuals.SAMPLING: Bayesian inference using NumPyro.