diff --git a/plan.md b/plan.md new file mode 100644 index 0000000..fbb062c --- /dev/null +++ b/plan.md @@ -0,0 +1,268 @@ +# Development plan + +Based on the meeting of March 5th + the discussion on GitHub ([link](https://github.com/EpiForeSITE/epiworldPythonStreamlit/discussions/2)), here is a list of tasks/things that need to be built for the project. + +## Example of yaml doc + +From Olivia's comments: + +```yaml +model: + metadata: + title: "Measles Outbreak Cost Calculator" + description: "Estimates the economic cost of a measles outbreak across three outbreak-size scenarios." + authors: + - name: "Jane Doe" + email: "jane@example.org" + - name: "John Smith" + introduction: | + ## Background + Measles is a highly contagious viral disease ... + + ## Assumptions + All wage figures are in 2024 USD and are taken + from the Bureau of Labor Statistics. + + parameters: + cost_hosp: + type: integer + label: "Cost of measles hospitalization" + description: "Average direct medical cost per hospitalised measles case (USD)." + default: 31168 + min: 0 + max: 500000 + unit: "USD" + references: + - "Ortega-Sanchez et al. (2014). Vaccine, 32(34)." + + prop_hosp: + type: number + label: "Proportion of cases hospitalised" + description: "Fraction of confirmed cases requiring hospital admission." + default: 0.20 + min: 0 + max: 1 + unit: "proportion" + references: + - "CDC Measles surveillance data 2019." + + equations: + eq_hosp: + label: "Hospitalisation cost" + unit: "USD" + output: "integer" + compute: "n_cases * prop_hosp * cost_hosp" + + table: + scenarios: + - id: "s_22" + label: "22 Cases" + vars: + n_cases: 22 + - id: "s_100" + label: "100 Cases" + vars: + n_cases: 100 + - id: "s_803" + label: "803 Cases" + vars: + n_cases: 803 + + rows: + - label: "Hospitalisation cost" + value: "eq_hosp" + - label: "Lost productivity" + value: "eq_lost_prod" + - label: "Contact tracing cost" + value: "eq_tracing" + - label: "TOTAL" + value: "eq_total" + emphasis: "strong" + + figures: + - title: "My figure" + alt-text: "Some text" + py-code: | + import matplotlib as mp + mp.plot(...) +``` + +## YAML document structure + +The YAML document must define the following top-level sections: + +- **Metadata:** Title, description, author(s), optional introduction text, and a Markdown report template with placeholders (e.g., `{{ table:table1 }}`, `{{ figure:fig1 }}`). An `assumptions` field (Markdown string) may also be included. +- **Parameters:** Named numeric parameters with type, label, description, default, min/max bounds, unit, and optional references. +- **Equations:** Named expressions (as safe Python arithmetic strings) with label, unit, and output type. +- **Tables:** Scenario columns (each defining a set of variable overrides) and rows (each pointing to an equation result). +- **Figures:** A list of figures, each with a title, alt-text, and a small Python snippet to generate the plot. +- **Current Parameters** *(optional):* A snapshot of parameter values representing the saved state of the model. + +Example report template: + +```md +# Title of the report + +## Sub title + +some text, some number {{ equation:value1 }} + +{{ table:table1 }} + +Some more text + +{{ table:table2 }} + +And a pretty figure + +{{ figure:fig1 }} +``` + +## Task dependency diagram + +The diagram below shows how the infrastructure tasks and function-level tasks relate to each other. Infrastructure tasks (top block) must be finalized before implementation work can proceed; solid arrows indicate data/output dependencies and dashed arrows indicate that a function triggers another in the normal app flow. + +```mermaid +flowchart TD + subgraph INFRA["🏗️ Infrastructure — finalize first"] + direction LR + BP["Branch protection rules"] + AG["AGENTS.md"] + DC["Devcontainer"] + GH_CI["GitHub Actions: CI testing (uv)"] + GH_AG["GitHub Actions: agent environment"] + DC --> GH_CI + DC --> GH_AG + end + + subgraph FUNC["💻 Function implementation"] + STLITE["Setup stlite framework"] + VY["validate_yaml()"] + BM["build_menu()"] + WP["watch_parameters()"] + RM["run_model()"] + GR["generate_report()"] + PDF["save_as_pdf()"] + SMS["store_model_state()"] + SCM["save_current_model()"] + + STLITE --> VY + VY --> BM + BM -.->|"user values"| WP + VY -->|"model_dict"| WP + WP --> RM + RM --> GR + GR --> PDF + WP --> SMS + WP --> SCM + end + + INFRA ==> FUNC +``` + +## Function summary + +| Function | Input → Output | +|---|---| +| `validate_yaml(yaml_content: str)` | raw YAML string → validated `dict` | +| `build_menu(model_dict: dict)` | model dict → Streamlit widgets (side-effects) | +| `watch_parameters(model_dict, current_values)` | model dict + user values → validated dict + warnings | +| `run_model(model_dict, parameters)` | model dict + params → `list[dict]` (per-scenario results) | +| `generate_report(model_dict, results)` | model dict + results → HTML string | +| `save_as_pdf(html_content: str)` | HTML string → PDF bytes | +| `store_model_state(model_dict, parameters)` | model + params → persisted state (side-effects) | +| `save_current_model(model_dict, current_parameters)` | model + params → YAML string on disk | + +## Tasks + +### `validate_yaml(yaml_content: str) -> dict` + +- **Input:** Raw YAML document as a string (e.g., file contents read from disk or uploaded by the user). +- **Output:** A validated Python dictionary representing the model, or raises a descriptive error on failure. +- **Steps:** + 1. Parse the YAML string into a dictionary and validate its structure against the expected schema (required keys, value types, etc.). See the [SO reference on YAML validation in Python](https://stackoverflow.com/questions/3262569/validating-a-yaml-document-in-python/22231372#22231372) for schema-based approaches; if a full schema validator is too heavy, at minimum walk the parsed dictionary and check each required field manually. + 2. Ensure that any embedded Python code (e.g., figure snippets) is not malicious. Use the [CPython AST module](https://docs.python.org/3/library/ast.html) (`ast.walk`) to inspect the parse tree, whitelist allowed node types/names, and reject anything outside that set. + 3. Check equations for recursive references and determine a safe execution order (topological sort). + 4. Validate units for consistency (e.g., ensure monetary values are not mixed with proportions without explicit conversion). + +### `build_menu(model_dict: dict) -> None` + +- **Input:** Validated model dictionary (output of `validate_yaml`). +- **Output:** No return value; renders the Streamlit sidebar/parameter panel with appropriate input widgets for each parameter. +- **Steps:** + - Build input widgets from the `parameters` section. + - Populate values using the `default` fields, unless `current_parameters` are present in the model dictionary (in which case those values take precedence). + - Trigger guardrail warnings when parameter values approach `safe_min` / `safe_max` boundaries. + +### `watch_parameters(model_dict: dict, current_values: dict) -> dict` + +- **Input:** Validated model dictionary and a dictionary of current parameter values entered by the user. +- **Output:** A dictionary of validated parameter values, with warning messages attached for any values that fall outside `safe_min` / `safe_max` bounds. +- **Steps:** + - For each parameter, check that its current value lies within `[safe_min, safe_max]`. + - Return the validated values along with any triggered warnings. + +### `run_model(model_dict: dict, parameters: dict) -> list[dict]` + +- **Input:** Validated model dictionary and a validated parameter dictionary (output of `watch_parameters`). +- **Output:** A list of dictionaries, one per scenario column, where each dictionary maps equation/row names to computed numeric values. +- **Steps:** + - Validate scenario ranges and column definitions. + - For each scenario: + 1. Merge scenario-specific variable overrides into the base parameters. + 2. Evaluate equations in topologically-sorted order to produce row values. + - Return the list of per-scenario result dictionaries. + +### `generate_report(model_dict: dict, results: list[dict]) -> str` + +- **Input:** Validated model dictionary and the list of scenario results (output of `run_model`). +- **Output:** An HTML string representing the full rendered report. +- **Steps:** + - Process the Markdown report template, replacing `{{ equation:* }}`, `{{ table:* }}`, and `{{ figure:* }}` placeholders with computed values, formatted tables, and rendered figures respectively. + - **Figures:** Consider using [Streamlit's built-in charting functions](https://docs.streamlit.io/develop/api-reference/charts) in preference to raw `matplotlib` calls, to simplify dependencies and keep the interface consistent with the Streamlit app. + - Render the final document as an HTML string. + +### `save_as_pdf(html_content: str) -> bytes` + +- **Input:** HTML string (output of `generate_report`). +- **Output:** PDF file as bytes, ready to be offered as a download. +- **Notes:** + - Since the app targets WASM (via `stlite`), dependencies that require native binaries (e.g., most headless-browser or Chromium-based libraries) are not available. + - Potential approaches to evaluate: + - Call a REST API service for HTML-to-PDF conversion. + - Emit an intermediate TeX document and use a TeX-to-PDF pipeline where available. + - Use browser-native APIs (e.g., `window.print()` / the `print` CSS media query) to trigger a client-side PDF save — this has been seen in production Streamlit apps and may be the most WASM-friendly option. + - ReportLab and similar direct-to-PDF Python libraries may require paid licenses or native extensions; evaluate licensing before adopting. + +### `store_model_state(model_dict: dict, parameters: dict) -> None` + +- **Input:** Validated model dictionary and the current parameter values. +- **Output:** No return value; persists the model state so the user can navigate back to it. +- **Notes:** + - `localStorage` (or `sessionStorage`) via Streamlit's JavaScript component API is the most likely mechanism in a WASM context, since there is no server-side filesystem. Investigate whether Streamlit exposes an API to this effect or whether a custom component is needed. + - The UI could surface this as a small history panel or list of previously visited states. + +### `save_current_model(model_dict: dict, current_parameters: dict) -> str` + +- **Input:** Validated model dictionary and the current parameter values. +- **Output:** A YAML string (the original model with the `current_parameters` section populated) saved to disk or offered as a file download. +- **Steps:** + - Merge the current parameter values into the model dictionary under `current_parameters`. + - Serialise back to a YAML string. + - Write to disk or trigger a browser download. + +### Setup `stlite` framework + +- Configure the project to run entirely in the browser using [`stlite`](https://stlite.net/). +- Verify that all dependencies (pure-Python or available as Pyodide wheels) are compatible with the WASM runtime. + +## Other tasks (non-function) + +These are project-level tasks that need to be addressed but do not map directly to a single function: + +- **Branch protection rules:** Update the repository's branch protection rules to require all changes to be submitted via pull requests (no direct pushes to the main branch). +- **`AGENTS.md` file:** Draft a simple `AGENTS.md` file that describes the autonomous agents involved in the project, their roles, and the conventions they should follow. +- **GitHub Actions workflow — agent environment:** Create a GitHub Actions workflow that sets up the environment required by the agent (tools, credentials, runtime dependencies). +- **GitHub Actions workflow — CI testing:** Create a GitHub Actions workflow that installs project dependencies using [`uv`](https://github.com/astral-sh/uv) and runs the test suite on every push/PR. +- **Devcontainer environment:** Create a `.devcontainer` configuration (e.g., `devcontainer.json` + Dockerfile or feature list) so contributors can open the project in a fully configured, reproducible development container. +