Skip to content

adjusted BMI for master files#146

Closed
caitlink12 wants to merge 24 commits into5yr-age-categoriesfrom
adj-bmi
Closed

adjusted BMI for master files#146
caitlink12 wants to merge 24 commits into5yr-age-categoriesfrom
adj-bmi

Conversation

@caitlink12
Copy link
Copy Markdown

@caitlink12 caitlink12 commented Oct 8, 2025

added HTWDBMI_der (derived BMI) calculated with HTWDHTM (continuous height) HWTDWTK (continuous weight) for master files 2001-2018 into variable_details
added HWTDCOR_der (corrected BMI) calculated using HWTDHTM (continuous height) and HWTDWTK (continuous weight) for master files 2001-2018 into variable_details
added HTWDBMI_der and HWTDCOR_der along with necessary information in variables.csv

included 2001 into derived adjusted BMI (grouped)
created adjusted derived BMI variable with true height and weight values for ICES surveys
2001_m has non-grouped height and weight variables to use for HWTDCOR so do not need to include it in HWTGCOR
added in HWTDBMI_der, HWTDBMI_der_cat4, and HWTDCOR_der for master files to use continuous height and weight values
@yulric yulric changed the base branch from feature/v3.0.0-validation-infrastructure to 5yr-age-categories October 15, 2025 18:50
Copy link
Copy Markdown
Contributor

@yulric yulric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caitlink12 I'm not sure what happened but it looks like the changes for the new BMI variables in the variables details sheet are not there (?). Do you mind re-adding them.

adjusting to track changes properly
updated HWTDBMI, HWTDBMI_der, HWTDCOR_der variables which use continuous height and weight measures
@caitlink12
Copy link
Copy Markdown
Author

@yulric the variables were still in the sheet but the changes must not have been tracked after I had to make additional adjustments to another variables after the initial push. I've re-added them in now so they should appear as a tracked change now.

Copy link
Copy Markdown
Contributor

@yulric yulric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't see the changes in the variable details sheet on Github but that's ok, I looked at the rows within the variable details sheet. The only issue I see is that the database suffixes are inconsistent between the two sheets.

corrected _i suffixes to _m for HWTDBMI, HWTDBMI_der, HWTDCOR_der, HWTDHTM, HWTDWTK
@caitlink12
Copy link
Copy Markdown
Author

@yulric I've adjusted the suffixes to _m for HWTDBMI, HWTDBMI_der, HWTDCOR_der, HWTDHTM and HWTDWTK.

Comment thread inst/extdata/variable_details.csv Outdated
Comment thread inst/extdata/variable_details.csv Outdated
Comment thread inst/extdata/variable_details.csv Outdated
Comment thread inst/extdata/variable_details.csv Outdated
Comment thread inst/extdata/variable_details.csv Outdated
Comment thread inst/extdata/variables.csv Outdated
Worksheet fixes (HWTDHTM rows only):
- Fix HTWDVHTM typo → HWTDVHTM (4 rows in variable_details, 1 in variables)
- Fix cchs_2003_m → cchs2003_m, cchs_2005_m → cchs2005_m underscore typos
- Fix missing cchs prefix: 2015_2016_m → cchs2015_2016_m

New R/bmi-master.R:
- Self-contained Master-file BMI functions (calculate_bmi_D, adjust_bmi_D,
  categorize_bmi_D) that don't depend on assign_tagged_na
- Referenced by variable_details.csv as Func:: entries for HWTDBMI_der,
  HWTDCOR_der, HWTDBMI_der_cat4
@DougManuel
Copy link
Copy Markdown
Contributor

Code review

Reviewed 10 BMI variables (HWTGHTM, HWTGWTK, HWTGBMI, HWTGBMI_der, HWTGBMI_der_cat4, HWTGBMI_cat4, HWTGCOR_der, HWTDBMI_der, HWTDBMI_der_cat4, HWTDCOR_der) for PUMF and Master across 2001–2018.

Issues found and fixed (68ee15d)

  1. HTWDVHTM typo in HWTDHTM variableStart (L3, worksheet naming)
    4 rows in variable_details.csv and 1 row in variables.csv had transposed HTWDVHTM instead of HWTDVHTM. Confirmed via MCP cchs-metadata: HWTDVHTM exists, HTWDVHTM does not. Fixed.

  2. Underscore typos in database names (L3, databaseStart consistency)
    cchs_2003_m, cchs_2005_m (with extra underscore) in HWTDHTM variableStart. Fixedcchs2003_m, cchs2005_m.

  3. Missing cchs prefix (L3, databaseStart consistency)
    2015_2016_m missing the cchs prefix in HWTDHTM variableStart. Fixedcchs2015_2016_m.

  4. Missing Master BMI DV functions (L4/L6, derived variable specification)
    variable_details.csv references Func::calculate_bmi_D, Func::adjust_bmi_D, Func::categorize_bmi_D but no R file defined them. Created R/bmi-master.R with self-contained implementations using dplyr::case_when and haven::tagged_na. These do not depend on assign_tagged_na (which is undefined on this branch). Fixed.

Issues noted but not fixed

  1. HWTGBMI_cat4 semantic change (L2, confidence 75)
    variableStart changed from DerivedVar::[HWTGBMI_der] (cchsflow-derived BMI) to DerivedVar::[HWTGBMI] (StatCan's pre-calculated grouped BMI). This changes the input source — intentional? If so, no action needed.

  2. PUMF BMI functions broken (pre-existing, confidence 100)
    bmi_funcalculate_bmicalculate_bmi_coreassign_tagged_na — the last function is undefined on 5yr-age-categories. All PUMF BMI derived variables (HWTGBMI_der, HWTGCOR_der, HWTGBMI_der_cat4) fail at runtime. Pre-existing on target branch; not introduced by this PR.

  3. PUMF L6 integration blocked (pre-existing)
    rec_with_table() runs but produces no BMI output columns due to issue 6. PUMF height/weight pass-through variables (HWTGHTM, HWTGWTK) work correctly. Master DV functions (issue 4 fix) tested with synthetic data — all produce correct results.

L6 integration summary

PUMF pass-through variables (HWTGHTM, HWTGWTK) recoded successfully across all 9 cycles. PUMF derived variables blocked by pre-existing assign_tagged_na issue. Master DV functions validated with synthetic data (BMI calculation, WHO 4-category classification, Connor Gorber bias correction all correct).

Add the 6-file v3 infrastructure stack for missing data handling:
- file-sourcing.R (L1): robust file sourcing with here()
- worksheet-loaders.R (L2B): worksheet metadata loading
- worksheet-getters.R (L3): get_variable_details(), get_variables()
- missing-pattern-cache.R (L4): session-level missing pattern cache
- missing-data-functions.R (L5): any_missing(), get_priority_missing(), assign_missing()
- clean-variables.R (L6): clean_variables(vars=, output_format=)

These files provide the shared infrastructure for v3 derive functions
across all variable domains (BMI, smoking, etc.).
Resolve namespace collision with new clean_variables() from
R/clean-variables.R (v3 infrastructure). The old API using
continuous_vars/categorical_vars params is renamed to
clean_variables_legacy(). Callers in smoking.R and alcohol.R
will be updated when v3-smoking merges.
Also fix is_value_in_range() and apply_else_rule() in clean-variables.R
to handle the copy_mapping format (recStart/recStart_values/recEnd) from
map_recStart_to_recEnd(), which was mismatched with the expected min/max
and action keys.
- Add deprecated aliases: calculate_bmi_D, adjust_bmi_D, categorize_bmi_D
  forwarding to _master equivalents
- Update variable_details.csv Func:: references from _D to _master
- Add 6 alias delegation tests (76 total passing)
Replace separate PUMF (bmi.R) and Master (bmi-master.R) implementations
with unified source-agnostic functions using semantic parameter names
(height_m, weight_kg). Worksheet routing maps PUMF grouped variables
(HWTGHTM/HWTGWTK) and Master continuous variables (HWTDHTM/HWTDWTK)
to the same function parameters.

- Delete bmi-master.R; consolidate into bmi.R
- Rename parameters from CCHS variable names to semantic names
- Update deprecated aliases (_D suffix) to point to unified functions
- Add @param ... to all three BMI functions for R CMD check compliance
- Update worksheets for unified function routing
- Regenerate man/ pages
@DougManuel
Copy link
Copy Markdown
Contributor

BMI unification: source-agnostic functions

Replaced the separate PUMF (bmi.R) and Master (bmi-master.R) implementations with unified source-agnostic functions.

What changed

  • Deleted bmi-master.R — all three BMI functions (calculate_bmi, adjust_bmi, categorize_bmi) now live in bmi.R and work with both PUMF and Master data
  • Semantic parameter namesheight_m/weight_kg instead of CCHS variable names (HWTGHTM/HWTGWTK). The worksheet routes the appropriate source variables to these parameters.
  • Deprecated aliases updated_D suffix aliases now point to the unified functions (not the deleted _master variants)
  • @param ... added — fixes R CMD check warnings caused by deprecated aliases using function(...) with @rdname
  • Worksheets updated — both PUMF and Master databases route to the same functions via variables.csv/variable_details.csv

Design rationale

Same pattern as calculate_pack_years() — the function doesn't need to know whether it's receiving PUMF grouped midpoints or Master continuous measurements. The worksheet handles the source variable mapping:

Parameter PUMF source Master source
height_m HWTGHTM (grouped) HWTDHTM (continuous)
weight_kg HWTGWTK (grouped) HWTDWTK (continuous)

Verification

  • 58/58 unit tests pass
  • R CMD check: 0 new errors/warnings/notes (all pre-existing)

DougManuel and others added 2 commits March 30, 2026 20:12
Run fix-worksheets.R to remove unnecessary quotes per project convention.
… NA::a rows for all derived HWT variables, and eliminated redundant/incorrect HWTGBMI_cat4 variable
@rafdoodle
Copy link
Copy Markdown
Collaborator

Changes manually merged to v3 via commit 3ce9a9f. Will close this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants