We use cucumber specifications to describe and execute modelling scenarios, and systematically produce corresponding causal graphs that can be used to test causal relationships.
This repository is currently in an experimental phase.
- Clone this repository
git clone https://github.com/CITCOM-project/causcumber.git - Change to the folder containing the repository
cd causcumber - Create a virtual environment e.g.:
- In
./causcumber, runpython3 -m venv causcumber_venv - To activate the virtual environment, run
source causcumber_venv/bin/activate
- In
- Install
GraphViz - Install
causcumberusing the commandpip install -e .
Due to the current experimental nature of this work, contributions are currently limited to the core citcom team. Once key architectural decisions are finalised, we will open to a broader community. The current process for making changes to the code (e.g. adding new features or fixing bugs) in this repository are:
- Install as above
- Make a branch and check it out
- Make your changes
- Make a pull request against the
mainbranch and request a review from one of the citcom team - On an approving review, merge your changes into
main
The scenarios directory contains different example scenarios implemented in the Covasim model. For each scenario in the directory, a separate sub-directory should be created that contains the simulation and a cucumber specification. Within each scenario sub-directory, three directories should be created:
dags/: this directory should contain any causal graphs as.dotfiles. This is where CauseCumber will place causal graphs too.features/: this directory should contain all of the elements for behave, including.featurefiles, anenvironment.pyfile, and a directorysteps/containing python scripts to implement step definitions for each.featurefile.observational_data/: this directory should contain any observational data that you wish to use instead of running the model. This is optional.
- Create a
.featurefile specifying desired causal properties as scenarios in Gherkin language. - Specify a
Backgroundscenario that lists the inputs and outputs of interest. - Transform each scenario into a causal question
- Infer a fully-connected causal DAG from the
Backgroundand prune manually. - Run the system to get data for each causal question or, alternatively, select previous execution data to achieve the same.
- Write step definitions (AKA hooks) into the data with Cucumber and use DoWhy to calculate causal estimates for each scenario and check that these match the specified behaviour in the
Thenclauses.
We work with CSV files produced by Covasim simulations. These have 164 columns, the headings of which is as follows:
t(time step)date- Cumulative (
cum_) and new (new_)infectionsreinfectionsinfectioussymptomaticseverecriticalrecoveriesdeathstestsdiagnosesknown_deathsquarantinedvaccinationsvaccinated
n_susceptiblen_exposedn_infectiousn_symptomaticn_severen_criticaln_recoveredn_deadn_diagnosedn_known_deadn_quarantinedn_vaccinatedn_aliven_naiven_preinfectiousn_removedprevalenceincidencer_effdoubling_timetest_yieldrel_test_yieldfrac_vaccinatedpop_nabspop_protectionpop_symp_protection
Each row in the CSV represents a single time step (day) in the model. The outputs are stored in compare_interventions/results, which is ignored by Git during the development process. We will make our results publicly available via ORDA when it is appropriate to do so.