Skip to content

anymed2 and diab_drug2 have self-referential metadata — breaks rec_with_table() #13

@DougManuel

Description

@DougManuel

Priority: P0 (CRAN-blocking)

Problem

anymed2 and diab_drug2 in variables.csv have self-referential variableStart values:

anymed2,...,"cycle6::ANYMED2, [anymed2]"
diab_drug2,...,"cycle6::DIAB_DRUG2, [diab_drug2]"

The [anymed2] notation tells rec_with_table() to look for anymed2 as a starting variable in the source data — but anymed2 doesn't exist in CHMS source data. It's a workflow artifact (numeric copy of anymed). Same for diab_drug2.

Impact

  • rec_with_table() will fail when trying to derive these variables
  • MockData can't generate them (12 remaining discrepancies from PR Mock data review #7 are all anymed2/diab_drug2)
  • Downstream variables that depend on these (highbp14090, highbp14090_adj, control14090, control14090_adj, diabx) can't be derived automatically
  • Users must follow a manual multi-step workflow documented in the vignette

Current manual workflow

# Step 1: derive anymed in medication data
cycle1_meds_recoded <- rec_with_table(cycle1_meds, ..., "anymed", ...)

# Step 2: MANUAL — create numeric copy
cycle1_meds_recoded$anymed2 <- as.numeric(as.character(cycle1_meds_recoded$anymed))

# Step 3: MANUAL — merge into main cycle data
cycle1 <- merge(cycle1, select(cycle1_meds_recoded, clinicid, anymed2))

# Step 4: now can derive hypertension variables
cycle1_final <- rec_with_table(cycle1, ..., "highbp14090", ...)

Options to fix

A. Mark as derived from anymed/diab_drug
Change variableStart to DerivedVar::[anymed] and DerivedVar::[diab_drug]. This documents the dependency correctly but requires recodeflow to support chained derivations (anymed itself is derived).

B. Make derivation functions return numeric
Change cycles1to2_any_antiHTN_meds() to return numeric instead of factor. Then anymed2 is no longer needed. Update determine_hypertension() to accept the output directly.

C. Handle conversion in consuming functions
Make determine_hypertension() and determine_inclusive_diabetes() convert factor to numeric internally, eliminating the need for separate anymed2/diab_drug2 variables entirely.

Option C is the simplest and removes the most complexity. Option B is also clean. Option A preserves the current architecture but adds metadata complexity.

Files involved

  • inst/extdata/variables.csv — self-referential variableStart
  • inst/extdata/variable-details.csv — recTo: copy (but nothing to copy from)
  • R/blood-pressure.Rdetermine_hypertension() expects ANYMED2
  • R/diabetes.Rdetermine_inclusive_diabetes() expects diab_drug2
  • vignettes/recoding_medications.qmd — documents manual workflow

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions