CMGD-Tree is a small sandbox for streamed histogram-tree boosting with:
- GPU training
- CPU or GPU prediction
- family-specific losses and target statistics
- MGD and NGD examples
- toy data providers for quick experiments
The main command is:
python fit_single_tree_hist_demo.pyLatest writeup PDF:
If you just want to see something work:
Run the default normal example:
python fit_single_tree_hist_demo.pyRun the same example and print the fitted trees:
python fit_single_tree_hist_demo.py --print-treesRun the same example and generate plots:
python fit_single_tree_hist_demo.py --plotRun the same example with both plots and printed trees:
python fit_single_tree_hist_demo.py --plot --print-treesWrite the output to a log file:
python fit_single_tree_hist_demo.py --plot > ~/logs/cmgtree-demo.log 2>&1The current implemented families are:
normal_identitypoissonpoisson_ngdgammanegative_binomialheteroskedastic_normalheteroskedastic_normal_ngd
Useful example commands:
Normal identity example. This is the default multiclass-like mean fit.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family normal_identityPoisson MGD example. This learns positive mean count predictions.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family poissonPoisson NGD example. This uses the same Poisson toy, but with family-side Fisher preconditioning.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family poisson_ngdGamma MGD example. This is a positive continuous target with fixed shape.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family gammaNegative binomial MGD example. This is an overdispersed count example with fixed dispersion.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family negative_binomialHeteroskedastic normal MGD example. This predicts first and second moments and derives variance from them.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family heteroskedastic_normalHeteroskedastic normal NGD example. This uses the same toy, but with an NGD-style family update.
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family heteroskedastic_normal_ngdThe code is now organized in three user-facing directories:
-
families/ Put the statistical model here. A family defines target statistics, model state, updates, and monitoring loss.
-
data_providers/ Put the data source here. Today these are toy generators. Later this is also the right place for a real streamed data loader.
-
examples/ Put example-specific defaults here. An example ties together a family choice and the default tree, dataset, and training settings you want to start from.
So the intended pattern is:
- new probabilistic model: add a file in
families/ - new toy generator or real loader: add a file in
data_providers/ - new runnable configuration: add a file in
examples/
Top-level flags:
--modify key value ...Override any config entry from the tree, dataset, or training groups.--profilePrint timing and memory summaries for training and evaluation.--plotWrite plots to./plots/<plot_training_id>/.--print-treesPrint every fitted tree after the run.--full-outputCompatibility alias for--plot --print-trees.
Example:
python fit_single_tree_hist_demo.py \
--plot \
--print-trees \
--modify family gamma max_depth 4 max_leaves 16 n_features 8The script has three config groups:
TREE_CONFIG = {
"max_bin": 64,
"cut_sample_rows": 200000,
"grow_policy": "depthwise",
"max_depth": 2,
"max_leaves": 4,
"min_samples_leaf": 512,
"min_split_loss": 1e-3,
"reg_lambda": 0.0,
"family": "normal_identity",
"class_weights": None,
}
DATASET_CONFIG = {
"n_features": 32,
"n_classes": 4,
"batch_size": 65536,
"n_batches": 12,
"seed": 0,
"feature_offset_scale": 2.5,
"feature_noise": 1.0,
}
TRAINING_CONFIG = {
"plot_training_id": "single_tree_demo",
"plot_bins": 80,
"plot_mode": "all",
"threads_per_block": 128,
"training_backend": "auto",
"cpu_threads": 0,
"predict_method": "cpu",
"cpu_predictor": "numba_parallel",
"n_boost_rounds": 2,
"learning_rate": 1.0,
"fresh_inference_batch_size": None,
"fresh_inference_n_batches": None,
}-
max_binNumber of histogram bins per feature. Larger values make split search finer, but cost more memory and compute. -
cut_sample_rowsMaximum number of streamed rows used to estimate the feature cuts. This does not cap training rows. It only affects how the bin boundaries are chosen. -
grow_policyTree growth strategy.depthwiseexpands level by level.lossguideexpands the currently best leaves first. -
max_depthMaximum tree depth. -
max_leavesMaximum number of leaves. This can be more restrictive thanmax_depth. -
min_samples_leafMinimum effective sample count required in a leaf. Prevents very small leaves. -
min_split_lossMinimum gain required to keep a split. Larger values make the tree more conservative. -
reg_lambdaL2-style regularization term used in split scoring / leaf scoring. -
familyWhich probabilistic example family to use. -
class_weightsOptional per-target weighting vector. Mainly useful for multi-output normal-style fits.
-
n_featuresInput feature dimension. -
n_classesOutput target-stat dimension. For some examples this is literally the number of outputs. For heteroskedastic normal it is fixed to2, representing[y, y^2]. -
batch_sizeNumber of streamed events per batch. -
n_batchesNumber of streamed batches. Total training events arebatch_size * n_batches. -
seedRandom seed for the toy provider. -
feature_offset_scaleProvider-side offset used to make the toy data less degenerate. -
feature_noiseProvider-side feature scale.
-
plot_training_idOutput directory name under./plots/. -
plot_binsNumber of bins used in diagnostic plots. -
plot_modeProvider-specific plotting mode selector. In most cases, leave this atall. -
threads_per_blockCUDA kernel launch block size for the GPU trainer. -
training_backendauto,gpu, orcpu.autouses the GPU when available and otherwise falls back to CPU. -
cpu_threadsNumber of CPU threads to use for CPU prediction and CPU-side update work.0resolves to the default thread policy. -
predict_methodcpuorgpu. This controls prediction and cache-update prediction, not the tree-fitting backend. -
cpu_predictorCPU prediction implementation. Choices:indexleaf_masknumbanumba_parallel
-
n_boost_roundsNumber of boosting iterations. -
learning_rateShrinkage factor applied to each fitted tree. -
fresh_inference_batch_sizeOptional batch size for the separate fresh-inference benchmark path in profiling mode. -
fresh_inference_n_batchesOptional number of batches for the separate fresh-inference benchmark path in profiling mode.
Some families come with example-owned defaults. These are applied only when you do not override them explicitly.
Current example defaults:
-
heteroskedastic_normaln_features=2n_classes=2n_batches=24max_depth=2max_leaves=4n_boost_rounds=50learning_rate=0.2
-
gamman_features=4n_classes=4max_depth=3max_leaves=8n_boost_rounds=50learning_rate=0.2
-
negative_binomialn_features=4n_classes=4max_depth=3max_leaves=8n_boost_rounds=50learning_rate=0.2
Prediction is an important runtime choice.
predict_method=gpu
- keeps prediction on the GPU
- usually best when you are already on the GPU and the batches are large
- useful when cache updates should stay on-device during GPU training
predict_method=cpu
- uses the CPU predictors in single_tree.py
- often convenient for small runs and inspection
- useful when you want to compare CPU inference implementations
CPU predictor choices:
indexSimple index-set traversal.leaf_maskLeaf-mask style NumPy predictor.numbaCompiled single-core CPU predictor.numba_parallelCompiled multi-core CPU predictor.
Example:
Run GPU training with GPU prediction:
python fit_single_tree_hist_demo.py \
--modify training_backend gpu predict_method gpuRun GPU training with fast multi-core CPU prediction:
python fit_single_tree_hist_demo.py \
--modify training_backend gpu predict_method cpu cpu_predictor numba_parallel cpu_threads 8Run everything on CPU:
python fit_single_tree_hist_demo.py \
--modify training_backend cpu predict_method cpu cpu_predictor numba_parallel cpu_threads 8Plotting is off by default.
When --plot is enabled, plots are written under:
./plots/<plot_training_id>/The actual plots depend on the provider:
normal_identityfeature-density plots and feature-target mean overlayspoissonfeature-density plots and feature-target mean overlaysgammafeature-density plots and feature-target mean overlaysnegative_binomialfeature-density plots and feature-target mean overlaysheteroskedastic_normalobserved/predicted mean and observed/predicted variance overlays
For smaller exploratory runs:
python fit_single_tree_hist_demo.py \
--plot \
--modify n_features 4 n_classes 4 plot_training_id small_demoThe code is organized around additive boosting of tree outputs.
In the MGD examples, the family supplies target statistics T(y), a model state, and a dual prediction.
At round t, the tree is fit to the residual-style pseudo-response
R_t(x, y) = T(y) - eta_t^*(x)
and the model is updated additively with a learning rate:
state_{t+1}(x) = state_t(x) + alpha * f_t(x)
The histogram tree learner itself is generic: it just fits the supplied vector pseudo-response.
In NGD, the trainer still fits a tree to a supplied pseudo-response, but the family changes what that target is.
Instead of the plain residual, the family can supply a Fisher-preconditioned target
U_t(x, y) = G(state_t(x))^{-1} (T(y) - eta_t^*(x))
where G is the Fisher information in the chosen coordinate system.
So algorithmically:
- MGD is the identity / unpreconditioned case
- NGD is the family-preconditioned case
This is why both styles can share the same tree trainer.
If you want a timing and memory summary:
python fit_single_tree_hist_demo.py --profileThis reports the built-in training and evaluation profiling information.
Benchmark utilities live under benchmarks/. See benchmarks/README.md for how to run them.