r4subtrace is the traceability engine in the R4SUB ecosystem. It quantifies and explains end-to-end traceability between clinical submission artifacts -- primarily ADaM outputs <-> derivations <-> SDTM sources <-> specs <-> code -- and converts trace evidence into standardized R4SUB Evidence Table rows (from r4subcore).
It focuses on answering one question:
Can we prove where each analysis variable/value came from, and can a reviewer follow it?
In real submissions, issues are rarely "a single failed rule." Many are trace failures:
- Missing or ambiguous derivation documentation
- ADaM variable not linkable to SDTM sources
- Mismatch between spec and what code produces
- Inconsistent naming across specs, define.xml, and datasets
- Reviewer cannot reproduce or validate lineage
r4subtrace formalizes traceability as evidence + measurable indicators.
- L0 -- None: no linkage available
- L1 -- Spec-only: ADaM spec defines derivation but no code mapping
- L2 -- Spec + source mapping: ADaM var mapped to SDTM vars/domains
- L3 -- Spec + code mapping: mapping exists with high confidence or derivation text
pak::pak(c("R4SUB/r4subcore", "R4SUB/r4subtrace"))library(r4subcore)
library(r4subtrace)
ctx <- r4sub_run_context(study_id = "ABC123", environment = "DEV")adam_meta <- read.csv("adam_metadata.csv") # columns: dataset, variable, label, type
sdtm_meta <- read.csv("sdtm_metadata.csv") # same structure
map <- read.csv("trace_map.csv")
# recommended columns:
# adam_dataset, adam_var, sdtm_domain, sdtm_var, derivation_text(optional), confidence(optional)tm <- build_trace_model(
adam_meta = adam_meta,
sdtm_meta = sdtm_meta,
mapping = map
)
ev <- trace_model_to_evidence(tm, ctx = ctx, source_name = "r4subtrace", source_version = "0.1.0")
validate_evidence(ev)
evidence_summary(ev)ind <- trace_indicator_scores(ev)
indA list with:
nodes: tidy table of assets (dataset/variable/spec/program)edges: tidy table of relationships + confidencediagnostics: issues found (orphans, ambiguities, conflicts)
Evidence rows are emitted for:
- each ADaM variable trace level
- each orphan/ambiguity/conflict
- aggregate coverage metrics
TRACE_VAR_COVERAGE_L2PLUS: proportion of ADaM variables with L2+ traceTRACE_VAR_COVERAGE_L3PLUS: proportion with L3+ traceTRACE_ORPHAN_VAR_COUNT: orphan ADaM vars with no SDTM mappingTRACE_AMBIGUOUS_MAPPING_COUNT: vars mapped to multiple SDTM sourcesTRACE_MEAN_TRACE_LEVEL: mean trace level across all ADaM variables
- Graph-first: traceability is a graph problem
- Evidence-first: all conclusions are backed by explicit evidence rows
- Tool-agnostic: can ingest mapping from any source format
- Reviewer-centric: emphasize explainability, not just metrics
MIT