To start the program, download the repository and navigate to the parent folder, CNN/, then enter on terminal/cmd
python3 watchmal.py #flags and arguments
There is an extensive list of flags which can be used to tune the training engine, detailed below. Every flag has valid default behaviour and thus none of the flags need to be specified to run the program; however, the data path default . is probably invalid for any particular case.
-hprints out the help dialogue for all flags onto the terminal window. There is no config option for this flag.-m #name #constructorspecifies an architecture to train on. Make sure the selected architecture exists inmodels/. A list of available architectures is printed on the terminal for convenience. The config option for this flag ismodel.-pms #space-delimited list of named argumentsspecifies a list of arguments to pass to the CNN constructor. Make sure the arguments are valid for the selected constructor. A list of arguments taken by each constructor is printed on the terminal for convenience. The config option for this flag isparams.-dev #cpu/gpusets the engine to offload work to the CPU or GPU. If GPU is selected, you must also specify a list of GPUs. The config option for this flag isdevice.-gpu #space-delimited list of gpus (ints)gives the engine a list of GPUs to train on. If no GPUs are given, the training engine defaults to running on the CPU. The config option for this flag isgpu_list.-do #train #test #valinstructs the engine to run training, testing, and validation tasks. The engine can run any subset of these tasks and runs them all by default. The config option for this flag istasks.-wst #integerinstructs the engine to dump a list of root file paths and indices identifying the n worst-identified events in the input dataset during validation. This dumps to a plain-text file in thesave_pathdirectory. By default this is set to0. The config option for this flag isworst.-bst #integerinstructs the engine to dump a list of root file paths and indices identifying the n best-identified events in the input dataset during validation. This dumps to the same directory asworst. The config option for this flag isbest.
See the wiki page on ROOT file conversion for the conversion pathway from ROOT to .npz to HDF5.
-pat #pathspecifies the path to the labeled dataset which the engine will train, test, and validate on. HDF5 is the only supported data format at the moment. The config option for this flag ispath.-roo #ROOT file list (ROOTS.txt)specifies the location of the text file that contains the absolute paths to the original ROOT files used to generate the dataset in use. This flag is usually unnecessary sinceROOTS.txtis placed in the same directory as the dataset by default. The config option for this flag isroot.-sub #integerspecifies a subset of the dataset located atpathto use, which can be useful for making faster training runs. By default, all of the data is used. The config option for this flag issubset.-shf #True/Falsespecifies whether or not to shuffle the contents of the input dataset. By default this is set toTrue. The config option for this flag isshuffle.-vas #float between 0 and 1specifies the fraction of the dataset to use for validation. By default this is set to0.1. The config option for this flag isval_split.-tes #float between 0 and 1specifies the fraction of the dataset to use for testing. By default this is set to0.1. The config option for this flag istest_split.- There is no option to specify the fraction of the dataset to use for training. This fraction is the remainder of the dataset that is outside the validation and test splits (i.e.
train_split = 1 - val_split - test_split). -epo #floatspecifies the number of epochs to train the data over. This number does not have to be a whole number. By default this is set to1.0. The config option for this flag isepochs.-tnb #integerspecifies the batch size during training. By default this is set to20. The config option for this flag isbatch_size_train.-vlb #integerspecifies the batch size during validation. By default this is set to1000. The config option for this flag isbatch_size_val.-tsb #integerspecifies the batch size during testing. By default this is set to1000. The config option for this flag isbatch_size_test.- Note: the batch size should never exceed the dataset size.
-sap #pathspecifies the directory into which to save the training engine output data. This directory will be located insideUSER/and has a default name ofsave_path. The config option for this flag issave_path.-dsc #descriptionspecifies a subdirectory undersave_pathto save data from a particular run. By default this is set todata_description. The config option for this flag isdata_description.-ret #state filespecifies the path to a state file from which to restore the weights in the neural net. By default the state is not loaded. The config option for this flag isrestore_state.
-l #config filespecifies a config file to load settings from. By default no config file is loaded and settings are interpreted from the specified flags. If this flag is specified but other flags conflict with the settings in the conflict file, the flags given on the commandline will override the respective settings in the config file. The config option for this flag isload.-s #config filespecifies the name of a config file to save settings to (overwrite enabled). By default no config file is saved. The config option for this flag iscfg.
Note that you can manually write a config file and load it with the -l flag as an alternative to using commandline flags. The syntax for the config file is
[config]
option1 = string1
option2 = string2
...
By default the config file extension is .ini.