Skip to content

256 get cox pairwise df strata in formulas#258

Open
jszczypinski wants to merge 25 commits into
mainfrom
256_get_cox_pairwise_df_strata_in_formulas
Open

256 get cox pairwise df strata in formulas#258
jszczypinski wants to merge 25 commits into
mainfrom
256_get_cox_pairwise_df_strata_in_formulas

Conversation

@jszczypinski
Copy link
Copy Markdown
Contributor

@jszczypinski jszczypinski commented May 27, 2026

Currently get_cox_pairwise_df do not accept formulas with strata() term, which is quite common.
Also formulas in R should not use namespace
https://stat.ethz.ch/pipermail/r-devel/2025-April/083989.html

Closes #256


Pre-review Checklist (if item does not apply, mark is as complete)

  • All GitHub Action workflows pass with a ✅
  • PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main()
  • If a bug was fixed, a unit test was added.
  • Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage()
  • Request a reviewer

Reviewer Checklist (if item does not apply, mark is as complete)

  • If a bug was fixed, a unit test was added.
  • Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

  • Update NEWS.md with the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • All GitHub Action workflows pass with a ✅
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge" or "Rebase and merge".

Comment thread R/get_cox_pairwise_df.R Outdated
Signed-off-by: Jan Szczypiński <jan.szczypinski@gmail.com>
Comment thread R/get_cox_pairwise_df.R Outdated
Signed-off-by: Jan Szczypiński <jan.szczypinski@gmail.com>
Comment thread tests/testthat/test-get_cox_pairwise_df.R Outdated
Signed-off-by: Jan Szczypiński <jan.szczypinski@gmail.com>
@jszczypinski jszczypinski self-assigned this May 27, 2026
@jszczypinski jszczypinski marked this pull request as ready for review May 27, 2026 09:33
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Unit Tests Summary

  1 files  240 suites   3m 13s ⏱️
240 tests 240 ✅ 0 💤 0 ❌
712 runs  712 ✅ 0 💤 0 ❌

Results for commit 3049777.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Unit Test Performance Difference

Test Suite $Status$ Time on main $±Time$ $±Tests$ $±Skipped$ $±Failures$ $±Errors$
check_formula_for_namespace 👶 $+0.00$ $+2$ $0$ $0$ $0$
modify_header_rm_md 💔 $0.09$ $+1.20$ $0$ $0$ $0$ $0$
Additional test case details
Test Suite $Status$ Time on main $±Time$ Test Case
check_formula_for_namespace 👶 $+0.00$ .check_formula_for_namespace_aborts_when_a_namespace_is_present
check_formula_for_namespace 👶 $+0.00$ .check_formula_for_namespace_correctly_parses_long_formulas
check_formula_for_namespace 👶 $+0.16$ .check_formula_for_namespace_passes_valid_formulas
get_cox_pairwise_df 👶 $+0.00$ get_cox_pairwise_df_handles_robust_TRUE_correctly_via_...
get_cox_pairwise_df 👶 $+0.00$ get_cox_pairwise_df_respects_conf.int_passed_via_...
get_cox_pairwise_df 👶 $+0.01$ get_cox_pairwise_df_works_for_formula_with_complex_strata_
get_cox_pairwise_df 👶 $+0.00$ get_cox_pairwise_df_works_for_formula_with_covariates_via_likelihood_ratio
modify_header_rm_md 💔 $0.09$ $+1.20$ strip_md_bold_works_with_gtsummary_table
modify_zero_recode 💚 $1.56$ $-1.43$ modify_zero_recode_works

Results for commit a94398a

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

badge

Code Coverage Summary

Filename                                 Stmts    Miss  Cover    Missing
-------------------------------------  -------  ------  -------  ------------------------------------------------------------------------------------------------
R/add_blank_rows.R                          63       0  100.00%
R/add_difference_row.R                     101       0  100.00%
R/add_forest_utils.R                        97      10  89.69%   76-79, 94-100
R/add_forest.R                             139       0  100.00%
R/add_hierarchical_count_row.R              33       0  100.00%
R/adjust_stat_columns_wrap.R                29       1  96.55%   53
R/annotate_gg_km.R                         135       0  100.00%
R/annotate_gg_pkc.R                         92       0  100.00%
R/annotate_gg.R                             81       0  100.00%
R/ard_tabulate_abnormal_by_baseline.R       65       0  100.00%
R/crane-package.R                            2       2  0.00%    26-27
R/deprecated.R                              21      21  0.00%    15-51
R/df_add_poolings.R                         41       0  100.00%
R/get_cox_pairwise_df.R                    149      42  71.81%   140-147, 172-178, 301-306, 347-354, 363-384
R/gg_km_utils.R                             35      14  60.00%   18-35
R/gg_km.R                                  143      37  74.13%   55-58, 75, 102, 176-181, 184-187, 197-199, 204-205, 239-241, 248-251, 255, 266-270, 283, 285-287
R/gg_lineplot.R                             94       0  100.00%
R/gg_mmrm_lineplot.R                       102       1  99.02%   106
R/gg_pkc_lineplot.R                         98       0  100.00%
R/gg_utils.R                               221       0  100.00%
R/label_roche.R                             72       0  100.00%
R/modify_header_rm_md.R                     18       2  88.89%   35-36
R/modify_zero_recode.R                      20       1  95.00%   64
R/reverse_difference_ci.R                   33       0  100.00%
R/tbl_baseline_chg.R                       188       0  100.00%
R/tbl_coxph.R                               90       1  98.89%   219
R/tbl_hierarchical_incidence_rate.R        209       0  100.00%
R/tbl_hierarchical_rate_and_count.R        339      13  96.17%   343, 425, 446-456
R/tbl_hierarchical_rate_by_grade.R         317       3  99.05%   169-171
R/tbl_listing.R                             35       0  100.00%
R/tbl_mmrm.R                               254       1  99.61%   393
R/tbl_null_report.R                          9       0  100.00%
R/tbl_rmpt.R                               157      12  92.36%   297-302, 314-319
R/tbl_roche_subgroups.R                    155       0  100.00%
R/tbl_roche_summary.R                       64       0  100.00%
R/tbl_shift.R                              116       0  100.00%
R/tbl_survfit_quantiles.R                  132       1  99.24%   295
R/tbl_survfit_times.R                       92       0  100.00%
R/tbl_with_pools.R                          64       0  100.00%
R/theme_gtsummary_roche.R                   84       1  98.81%   61
R/utils.R                                   42       0  100.00%
TOTAL                                     4231     163  96.15%

Diff against main

Filename                   Stmts    Miss  Cover
-----------------------  -------  ------  --------
R/get_cox_pairwise_df.R      +53     +36  -21.94%
R/utils.R                     +6       0  +100.00%
TOTAL                        +59     +36  -0.81%

Results for commit: 3049777

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

Comment thread R/get_cox_pairwise_df.R Outdated
Comment thread R/annotate_gg_km.R Outdated
Comment thread R/get_cox_pairwise_df.R
#' \itemize{
#' \item `HR`: The Hazard Ratio formatted to two decimal places.
#' \item `95% CI`: The 95\% confidence interval as `"(lower, upper)"`.
#' \item `95% CI`: The 95% Wald confidence interval as `"(lower, upper)"`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change the method if you want? If not we need to be upfront about it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, coxph() calculates the variance using the standard Fisher information matrix, and the resulting CIs extracted via summary() or confint() are Wald confidence intervals.

If you want to change the underlying variance method to calculate Robust (Sandwich) Confidence Intervals, you need to use the robust = TRUE argument inside your model call.

So we can add an argument robust and additionally specify the confidence limit in another argument?

However these will be two additional arguments in the function call. What is your take on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use ... to pass these additional arguments consistently to each method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the code to do that.

Comment thread R/get_cox_pairwise_df.R Outdated
Comment on lines +130 to +137
formula_str <- paste(deparse(model_formula), collapse = " ")
if (grepl("\\b[a-zA-Z][a-zA-Z0-9.]*::", formula_str)) {
cli::cli_abort(
"{.arg model_formula} must be specified without namespace.",
call = get_cli_abort_call()
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same from AI in another PR (avot01). I wonder if we should make an utility for it

Copy link
Copy Markdown
Contributor Author

@jszczypinski jszczypinski May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put it import-standalone-checks.R

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called check_formula_for_namespace()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not the place for this. see header:

# Standalone file: do not edit by hand
# Source: https://github.com/insightsengineering/standalone/blob/HEAD/R/standalone-checks.R
# Generated by: usethis::use_standalone("insightsengineering/standalone", "checks")
# ----------------------------------------------------------------------

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I have put it in utils.R

Comment thread R/import-standalone-checks.R Outdated
#' Check if formula contains namespace
#' @keywords internal
#' @noRd
check_formula_for_namespace <- function(formula) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a test for this directly (to add into utils.R)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for it

@@ -40,18 +38,20 @@ test_that(
expect_no_error(
suppressWarnings(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would aim at removing all suppressWarnings (I may also have added some tbh)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all from get_cox_pairwise_df()

Comment thread R/get_cox_pairwise_df.R
Comment on lines +267 to +283
if (test == "log-rank") {
# --- 1. Standard Log-Rank via survival (Matches SAS & rtables) ---
sdiff <- survival::survdiff(formula, data = data)

# Calculate the asymptotic p-value using the chi-square statistic
# Degrees of freedom is (number of groups - 1)
p_value <- stats::pchisq(sdiff$chisq, df = length(sdiff$n) - 1, lower.tail = FALSE)
} else if (test == "likelihood-ratio") {
# --- 2. Likelihood-Ratio Test via survival ---
# Fit the full Cox model
fit_full <- survival::coxph(formula, data = data, ties = ties)

# Safely create the reduced formula by dropping the arm variable
reduced_formula <- stats::update(formula, paste(". ~ . -", arm))

# Fit the reduced model explicitly to avoid drop1 environment scope errors
fit_reduced <- survival::coxph(reduced_formula, data = data, ties = ties)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted to survival for computing log-rank p value, since coin is using permutation test for this. This resulted in discrepancy between crane and tern results.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to talk about the LRT engine change (survregcoxph)

Just flagging this because I think it deserves a note in the PR description / NEWS.

The old LRT used survreg(dist = "exponential") to match SAS's PROC LIFETEST LR test. The new code uses nested coxph() anova (Cox partial-likelihood LRT, equivalent to SAS PROC PHREG). The switch was necessary because survreg() errors on strata() terms.

However, we could preserve both: use the original survreg exponential LRT when the formula has no strata() terms (keeping backward compatibility and SAS PROC LIFETEST alignment), and fall back to the Cox LRT only when strata() is present. This way existing non-stratified results stay unchanged, and stratified formulas get a valid test. Both approaches have SAS equivalents — they just come from different procedures.

Either way, worth a note in NEWS since test = "likelihood-ratio" behavior is changing for at least the stratified case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make get_cox_pairwise_df accept formulas with strata() and remove namespace from formulas

3 participants