From 2b866c29b90a4e65d1b43efc19762b17056dcf83 Mon Sep 17 00:00:00 2001 From: Bill Denney Date: Mon, 11 May 2026 09:48:45 -0400 Subject: [PATCH 1/2] Balanced width pipeline + row-overflow retry loop (issue 35) The default column-width pipeline is now "split first, wrap per page" - when a table cannot fit on one page width, the page-column-split runs with each column's *minimum* survivable width as the capacity-planning input, and each resulting page water-fills its columns within that page's own horizontal slack. The legacy "wrap whole table, then split using post-wrap widths" path is preserved as `col_split_strategy = "wrap_first"` so the two behaviours can be compared empirically before removing the legacy one. Concrete example: a 4-column table that splits across 2 pages used to have page 1 columns crushed to ~0.6" each because the whole-table water-fill had to make room for the 4th column that ended up on page 2 anyway. Under the new default, page 1's columns get the full page 1 slack (~1.45" each); page 2 is unchanged. API surface: - `tfl_table(col_split_strategy = c("balanced", "wrap_first"))` - new default "balanced". "wrap_first" preserves the legacy ordering. - `tfl_table(row_overflow_max_retries = 5L)` - cap on the new retry loop. When a row's wrapped height exceeds the page under the balanced strategy, the bottleneck column's minimum is raised by 0.25" and the width pipeline re-runs. After the cap the final `paginate_rows()` call goes through the existing `overflow_action` path so the user-visible error/warn behaviour is unchanged. Set `row_overflow_max_retries = 0L` to disable the loop entirely. - `paginate_rows(..., collect_overflows = TRUE)` - new option that returns `list(pages, overflows)` instead of signalling. Used by the retry loop in `.tfl_table_to_pagelist_default()`. Implementation: - `R/table_columns.R::compute_col_widths()` is now a dispatcher. The legacy body lives in `.compute_col_widths_wrap_first()` verbatim. The new body in `.compute_col_widths_balanced()` implements the Case-A / Case-B / Case-C decision tree from the issue: A. sum(natural) <= content -> use natural; one page. B. sum(min) <= content -> single page; water-fill from natural down to fit. C. else -> page-split using mins for capacity, then per-page water-fill from natural down to the page's slack with group columns pinned at min so data columns absorb the slack. - `R/wrap.R` gains three helpers: `.compute_col_min_widths()` extracts the floor measurement that `.compute_wrapped_widths()` did inline. `.water_fill_to_budget()` is a pure water-from-top loop that takes pre-computed minimums. `.reconcile_page_widths()` flattens per-page width vectors into one per-column vector (group columns get the MIN across pages, per the user's design call - group cells frequently flow across rows via rowspan suppression so data columns benefit more from the slack). - `R/table_rows.R::paginate_rows()` gains the `collect_overflows` param and returns a structured result when set. The bottleneck column for each overflow is the wrap-eligible column with the maximum cell height in the offending row. - `R/table_pagelist.R` reorganises Step 4-6 around a repeat loop that retries on row overflow. The row-height scratch device's lifecycle is now fully contained inside each iteration via the local `.run_pagination_iter()` helper, so the surrounding loop never holds a viewport across a `compute_col_widths()` call. - `R/tfl_table.R` adds and validates the two new arguments; both are persisted on the `tfl_table` object. All 1095 existing tests still pass. Tests, demos, and docs for the new behaviour land in follow-up commits. Co-Authored-By: Claude Opus 4.7 (1M context) --- R/table_columns.R | 462 ++++++++++++++++++++++++++++++++++++--------- R/table_pagelist.R | 212 +++++++++++++-------- R/table_rows.R | 61 ++++-- R/tfl_table.R | 40 ++++ R/wrap.R | 189 +++++++++++++++++++ 5 files changed, 789 insertions(+), 175 deletions(-) diff --git a/R/table_columns.R b/R/table_columns.R index d3d437f..363de6f 100644 --- a/R/table_columns.R +++ b/R/table_columns.R @@ -95,21 +95,83 @@ resolve_col_specs <- function(tbl) { compute_col_widths <- function(resolved_cols, data, content_width_in, tbl, pg_width, pg_height, margins, overflow_action = c("error", "warn"), - validate_overflow = TRUE) { + validate_overflow = TRUE, + floor_overrides = NULL) { overflow_action <- match.arg(overflow_action) + strategy <- tbl$col_split_strategy %||% "balanced" + + # Shared setup: compute natural widths, resolve relative weights, + # auto-detect wrap eligibility, and measure col_cont_label_half_w. + setup <- .resolve_natural_widths( + resolved_cols, data, content_width_in, tbl, pg_width, pg_height, margins + ) + + # Dispatch. Each strategy returns list(resolved_cols, col_groups). + if (identical(strategy, "wrap_first")) { + res <- .compute_col_widths_wrap_first( + widths_natural = setup$widths_natural, + resolved_cols = setup$resolved_cols, + data = data, + content_width_in = content_width_in, + tbl = tbl, + pg_width = pg_width, + pg_height = pg_height, + margins = margins, + overflow_action = overflow_action, + validate_overflow = validate_overflow, + h_pad_in = setup$h_pad_in, + min_in = setup$min_in, + n_grp = setup$n_grp, + breaks = setup$breaks + ) + } else { + res <- .compute_col_widths_balanced( + widths_natural = setup$widths_natural, + resolved_cols = setup$resolved_cols, + data = data, + content_width_in = content_width_in, + tbl = tbl, + pg_width = pg_width, + pg_height = pg_height, + margins = margins, + overflow_action = overflow_action, + validate_overflow = validate_overflow, + h_pad_in = setup$h_pad_in, + min_in = setup$min_in, + n_grp = setup$n_grp, + breaks = setup$breaks, + floor_overrides = floor_overrides + ) + } + + res$col_cont_label_half_w <- setup$col_cont_label_half_w + res +} + +# --------------------------------------------------------------------------- +# .resolve_natural_widths() - shared setup (Passes 1, 2, 3) +# --------------------------------------------------------------------------- + +# Computes per-column natural widths, resolves relative weights, and +# auto-detects wrap eligibility. Returns the per-column natural width +# vector, the updated resolved_cols (with wrap eligibility resolved), +# the measured `col_cont_label_half_w` for later layout decisions, and a +# handful of derived scalars (`h_pad_in`, `min_in`, `n_grp`, `breaks`) +# the strategy functions need. +# +# The scratch device used for text measurement is opened, used, and +# closed inside this function so neither strategy has to manage it. +.resolve_natural_widths <- function(resolved_cols, data, content_width_in, + tbl, pg_width, pg_height, margins) { n_cols <- length(resolved_cols) n_grp <- length(tbl$group_vars) min_in <- .width_in(tbl$min_col_width) - cell_pad <- tbl$cell_padding # 4-element named unit (top/right/bottom/left) + cell_pad <- tbl$cell_padding h_pad_in <- .width_in(cell_pad[["right"]]) + .width_in(cell_pad[["left"]]) na_str <- tbl$na_string max_rows <- tbl$max_measure_rows - # --- Open scratch device for text width measurement --- - # The device is closed immediately after measurement (before relative weight - # resolution and wrapping) because .apply_col_wrapping() opens its own device. - # on.exit ensures cleanup if the measurement loop errors. scratch_file <- tempfile(fileext = ".pdf") grDevices::pdf(scratch_file, width = pg_width, height = pg_height) outer_vp <- .make_outer_vp(margins) @@ -125,17 +187,10 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, w <- cs$width if (inherits(w, "unit")) { - # Fixed unit width — apply floor max(min_in, .width_in(w)) } else if (is.numeric(w) && !is.null(w)) { - NA_real_ # relative weight — resolved in second pass + NA_real_ # relative weight - resolved below } else { - # NULL / missing - auto-size from content. Header labels are rendered - # with the header_row gpar (typically bold) and cells with the cell - # gpar (regular); measuring a bold header with the regular-weight - # cell gpar undersizes the column and makes the rendered header bleed - # into the next column. Measure each kind of text with its actual - # rendering gpar and take the larger of the two as the natural width. cell_gp <- .gp_with_lineheight( .resolve_table_cell_gp(tbl$gp, cs$is_group_col), tbl$line_height ) @@ -149,11 +204,6 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, } }, numeric(1L)) - # Measure half-width of a col_cont_msg label while the device is still open. - # The label is rotated 90°, so its viewport "width" equals one character height. - # Divided by 2 because the text is centred at x = 0 or x = 1 npc, placing - # half its width inside the viewport. Returned so the caller can decide - # whether a second compute_col_widths() pass is needed. col_cont_label_half_w <- if (!is.null(tbl$col_cont_msg)) { cont_gp <- .gp_with_lineheight( .resolve_table_gp(tbl$gp, "continued"), tbl$line_height @@ -163,14 +213,14 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, 0 } - # Close the scratch device now — must happen before .apply_col_wrapping() - # opens its own device. Clear the on.exit handler to avoid a double-close. + # Close the scratch device now so strategy functions can open their own + # without nested-device complications. grid::popViewport() grDevices::dev.off() unlink(scratch_file) on.exit(NULL) - # --- Resolve relative weights --- + # Resolve relative weights. rel_idx <- which(vapply(resolved_cols, function(cs) { is.numeric(cs$width) && !is.null(cs$width) && !inherits(cs$width, "unit") }, logical(1L))) @@ -185,12 +235,9 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, }, numeric(1L)) } - # --- Resolve auto-detect wrap eligibility (cs$wrap == NA) --- - # The "auto" mode marks data columns as NA in resolve_col_specs(); we - # promote each NA to TRUE / FALSE here based on whether the column actually - # contains a break character. Skipping a column with no breakable text - # avoids wasting a wrap pass on numeric / single-token columns where it - # could not narrow the width anyway. + # Auto-detect wrap eligibility (cs$wrap == NA - the "auto" mode marks + # data columns as NA in resolve_col_specs(); promote each NA to TRUE + # / FALSE based on whether the column contains a break character). breaks <- tbl$wrap_breaks %||% wrap_breaks_default() for (j in seq_len(n_cols)) { cs_j <- resolved_cols[[j]] @@ -201,9 +248,36 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, } } - # --- Attempt word-wrap if total exceeds content width --- - total_w <- sum(widths_in) + list( + widths_natural = widths_in, + resolved_cols = resolved_cols, + col_cont_label_half_w = col_cont_label_half_w, + h_pad_in = h_pad_in, + min_in = min_in, + n_grp = n_grp, + breaks = breaks + ) +} + +# --------------------------------------------------------------------------- +# .compute_col_widths_wrap_first() - legacy pre-issue-35 strategy +# --------------------------------------------------------------------------- +# Whole-table water-fill first, then page-split using the post-wrap widths. +# This is the strategy that shipped with issue #28; preserved verbatim so +# the new "balanced" strategy can be compared empirically against it. +.compute_col_widths_wrap_first <- function(widths_natural, resolved_cols, data, + content_width_in, tbl, + pg_width, pg_height, margins, + overflow_action, validate_overflow, + h_pad_in, min_in, n_grp, breaks) { + n_cols <- length(resolved_cols) + na_str <- tbl$na_string + max_rows <- tbl$max_measure_rows + widths_in <- widths_natural + + # Word-wrap if total exceeds content width. + total_w <- sum(widths_in) if (total_w > content_width_in + 1e-6) { widths_in <- .compute_wrapped_widths( widths_in, resolved_cols, data, tbl, content_width_in, @@ -212,11 +286,7 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, total_w <- sum(widths_in) } - # --- Optional height-balance pass (opt-in via wrap_balance = "height") --- - # Runs unconditionally when opted in; the algorithm itself is a no-op if - # there is no improvement available. Falls back silently to the input - # widths on any error or invariant violation, so opting in cannot worsen - # the result. + # Optional whole-table height-balance. if (identical(tbl$wrap_balance, "height")) { widths_in <- .height_balance_widths( widths_in, resolved_cols, data, tbl, @@ -227,14 +297,9 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, total_w <- sum(widths_in) } - # --- Check feasibility --- errors <- character(0) if (!validate_overflow) { - # Skip overflow validation entirely. The caller (typically the second - # cw_adj pass in .tfl_table_to_pagelist_default) is recomputing widths - # for layout reasons after a prior pass already validated the same - # configuration; re-signalling here would emit a duplicate warning. resolved_cols <- lapply(seq_len(n_cols), function(j) { cs <- resolved_cols[[j]] cs$width_in <- widths_in[[j]] @@ -242,21 +307,264 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, }) col_groups <- paginate_cols(widths_in, content_width_in, n_grp, tbl$allow_col_split, tbl$balance_col_pages) - return(list(resolved_cols = resolved_cols, - col_groups = col_groups, - col_cont_label_half_w = col_cont_label_half_w)) + return(list(resolved_cols = resolved_cols, col_groups = col_groups)) } - # Per-column / group-aware overflow check. Group columns repeat on every - # column-paginated page, so the available width for any single data column - # is content_width_in - grp_w. A group column itself must fit in the full - # content width (grp_w == 0 if there are no group columns, in which case the - # data-col rule reduces to `widths_in[j] > content_width_in`). - grp_w <- if (n_grp > 0L) sum(widths_in[seq_len(n_grp)]) else 0 + errors <- .check_col_overflow_per_col(widths_in, resolved_cols, n_grp, + content_width_in, overflow_action, + errors) + if (total_w > content_width_in + 1e-6 && !tbl$allow_col_split) { + errors <- .check_total_width_overflow(widths_in, resolved_cols, + content_width_in, overflow_action, + errors, total_w) + } + if (length(errors) > 0L) { + rlang::abort(paste(errors, collapse = "\n")) + } + + resolved_cols <- lapply(seq_len(n_cols), function(j) { + cs <- resolved_cols[[j]] + cs$width_in <- widths_in[[j]] + cs + }) + col_groups <- paginate_cols(widths_in, content_width_in, n_grp, + tbl$allow_col_split, tbl$balance_col_pages) + + list(resolved_cols = resolved_cols, col_groups = col_groups) +} + +# --------------------------------------------------------------------------- +# .compute_col_widths_balanced() - issue #35 strategy +# --------------------------------------------------------------------------- + +# Decision tree from issue #35: +# A. If sum(natural) <= content_width -> use natural; single page. +# B. Elif sum(min) <= content_width -> single page; water-fill from +# natural down to fit. (Mathematically equivalent to today's whole- +# table water-fill when there's no page split.) +# C. Else: page-split using MIN widths for capacity planning. For each +# resulting page, water-fill from natural down to that page's slack, +# with group columns pinned at their MIN width so data columns receive +# the most per-page slack. Reconcile per-page widths into one +# per-column vector (group columns get the MIN across pages; data +# columns each appear on one page only). +# +# `floor_overrides` is a named numeric vector (col -> min width override +# in inches). When a column's name is in the override map, its computed +# minimum is `max(computed_min, override)`. Used by the row-overflow +# retry loop in `.tfl_table_to_pagelist_default()` to widen a column whose +# cell content forced a too-tall row. +.compute_col_widths_balanced <- function(widths_natural, resolved_cols, data, + content_width_in, tbl, + pg_width, pg_height, margins, + overflow_action, validate_overflow, + h_pad_in, min_in, n_grp, breaks, + floor_overrides) { + n_cols <- length(resolved_cols) + na_str <- tbl$na_string + max_rows <- tbl$max_measure_rows + eps <- 1e-6 + + wrap_eligible <- vapply(resolved_cols, `[[`, logical(1L), "wrap") + + # Compute per-column minimum widths. For non-wrap-eligible cols the + # minimum is the natural width (they can't shrink). For wrap-eligible + # cols it's the longest-token floor. + widths_min <- .compute_col_min_widths( + widths_natural = widths_natural, + resolved_cols = resolved_cols, + data = data, + tbl = tbl, + h_pad_in = h_pad_in, + min_in = min_in, + pg_width = pg_width, + pg_height = pg_height, + margins = margins + ) + + # Apply floor overrides from the row-overflow retry loop, if any. + if (length(floor_overrides) > 0L) { + for (col_name in names(floor_overrides)) { + j <- match(col_name, + vapply(resolved_cols, `[[`, character(1L), "col")) + if (!is.na(j)) { + widths_min[[j]] <- max(widths_min[[j]], + unname(floor_overrides[[col_name]])) + } + } + } + + total_natural <- sum(widths_natural) + total_min <- sum(widths_min) + + if (total_natural <= content_width_in + eps) { + # Case A: everything fits at natural width. + widths_in <- widths_natural + col_groups <- list(seq_len(n_cols)) + } else if (total_min <= content_width_in + eps) { + # Case B: everything fits if we wrap. Single page. + widths_in <- .water_fill_to_budget( + widths_in = widths_natural, + widths_min = widths_min, + wrap_eligible = wrap_eligible, + budget_in = content_width_in + ) + col_groups <- list(seq_len(n_cols)) + } else { + # Case C: page-split using min widths for capacity planning. Group + # columns are pinned at their min width on every page; data columns + # on each page water-fill from natural down to that page's slack. + col_groups <- paginate_cols( + widths_min, content_width_in, n_grp, + tbl$allow_col_split, tbl$balance_col_pages + ) + + per_page_widths <- vector("list", length(col_groups)) + for (g in seq_along(col_groups)) { + page_idx <- col_groups[[g]] + # Starting widths: group cols at min, data cols at natural. + page_w <- numeric(length(page_idx)) + for (k in seq_along(page_idx)) { + j <- page_idx[[k]] + if (j <= n_grp) { + page_w[[k]] <- widths_min[[j]] + } else { + page_w[[k]] <- widths_natural[[j]] + } + } + # wrap_eligible for water-fill: only DATA cols may shrink further. + # Group cols are pinned at min and excluded from active set. + page_elig <- wrap_eligible[page_idx] + for (k in seq_along(page_idx)) { + if (page_idx[[k]] <= n_grp) page_elig[[k]] <- FALSE + } + per_page_widths[[g]] <- .water_fill_to_budget( + widths_in = page_w, + widths_min = widths_min[page_idx], + wrap_eligible = page_elig, + budget_in = content_width_in + ) + } + widths_in <- .reconcile_page_widths(per_page_widths, col_groups, + n_group_cols = n_grp, + n_cols = n_cols) + } + + # Optional per-page height-balance (opt-in via wrap_balance = "height"). + if (identical(tbl$wrap_balance, "height") && length(col_groups) >= 1L) { + widths_in <- .apply_per_page_height_balance( + widths_in = widths_in, + col_groups = col_groups, + resolved_cols = resolved_cols, + data = data, + tbl = tbl, + h_pad_in = h_pad_in, + na_str = na_str, + max_rows = max_rows, + breaks = breaks, + pg_width = pg_width, + pg_height = pg_height, + margins = margins, + n_grp = n_grp + ) + } + + errors <- character(0) + + if (!validate_overflow) { + resolved_cols <- lapply(seq_len(n_cols), function(j) { + cs <- resolved_cols[[j]] + cs$width_in <- widths_in[[j]] + cs$width_natural_in <- widths_natural[[j]] + cs$width_min_in <- widths_min[[j]] + cs + }) + return(list(resolved_cols = resolved_cols, col_groups = col_groups)) + } + + errors <- .check_col_overflow_per_col(widths_in, resolved_cols, n_grp, + content_width_in, overflow_action, + errors) + if (sum(widths_in) > content_width_in + eps && !tbl$allow_col_split && + length(col_groups) > 1L) { + errors <- .check_total_width_overflow(widths_in, resolved_cols, + content_width_in, overflow_action, + errors, sum(widths_in)) + } + if (length(errors) > 0L) { + rlang::abort(paste(errors, collapse = "\n")) + } + + resolved_cols <- lapply(seq_len(n_cols), function(j) { + cs <- resolved_cols[[j]] + cs$width_in <- widths_in[[j]] + cs$width_natural_in <- widths_natural[[j]] + cs$width_min_in <- widths_min[[j]] + cs + }) + + list(resolved_cols = resolved_cols, col_groups = col_groups) +} + +# Per-page height-balance helper used by .compute_col_widths_balanced(). +# Calls .height_balance_widths() once per page-column-split page with the +# page's column subset; reconciles results back into a flat per-column +# width vector. Non-group columns appear on exactly one page so their +# height-balanced width is the result. Group columns appear on every +# page; their width is kept at the input value (since group cols don't +# participate in height-balance anyway). +.apply_per_page_height_balance <- function(widths_in, col_groups, + resolved_cols, data, tbl, + h_pad_in, na_str, max_rows, + breaks, pg_width, pg_height, + margins, n_grp) { + widths_out <- widths_in + for (g in seq_along(col_groups)) { + page_idx <- col_groups[[g]] + page_cols <- resolved_cols[page_idx] + page_widths <- widths_in[page_idx] + page_data <- data[, vapply(page_cols, `[[`, character(1L), "col"), + drop = FALSE] + balanced <- .height_balance_widths( + widths_in = page_widths, + resolved_cols = page_cols, + data = page_data, + tbl = tbl, + h_pad_in = h_pad_in, + na_str = na_str, + max_rows = max_rows, + breaks = breaks, + pg_width = pg_width, + pg_height = pg_height, + margins = margins + ) + # Only non-group columns are updated; group columns stay at their + # input width. + for (k in seq_along(page_idx)) { + j <- page_idx[[k]] + if (j > n_grp) { + widths_out[[j]] <- balanced[[k]] + } + } + } + widths_out +} + +# --------------------------------------------------------------------------- +# Shared overflow-check helpers +# --------------------------------------------------------------------------- + +# Per-column overflow validation. Group columns must fit content_width +# alone; data columns must fit alongside the group columns on a single +# page (since group columns repeat on every column-paginated page). +.check_col_overflow_per_col <- function(widths_in, resolved_cols, n_grp, + content_width_in, overflow_action, + errors) { + n_cols <- length(resolved_cols) + grp_w <- if (n_grp > 0L) sum(widths_in[seq_len(n_grp)]) else 0 for (j in seq_len(n_cols)) { cs <- resolved_cols[[j]] if (j <= n_grp) { - # Group column j: must fit in content_width_in alone if (widths_in[[j]] > content_width_in + 1e-6) { errors <- .overflow_signal( sprintf( @@ -268,9 +576,6 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, ) } } else { - # Data column j: must fit alongside the group columns on a single page. - # Use a tiny tolerance and avoid double-reporting when n_grp == 0 and - # the same overflow would also be caught by the (commented) total check. if (grp_w + widths_in[[j]] > content_width_in + 1e-6) { if (n_grp > 0L) { msg <- sprintf( @@ -292,41 +597,24 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, } } } + errors +} - # Total-width check: only meaningful when allow_col_split = FALSE. When - # allow_col_split = TRUE, paginate_cols() handles the multi-page split and - # this is not an overflow event. - if (total_w > content_width_in + 1e-6 && !tbl$allow_col_split) { - col_detail <- paste(vapply(seq_len(n_cols), function(j) { - sprintf(" %s: %.3g in", resolved_cols[[j]]$col, widths_in[[j]]) - }, character(1L)), collapse = "\n") - msg <- sprintf(paste0( - "Total column width (%.3g in) exceeds available content width (%.3g in) ", - "after wrapping.\nColumn widths:\n%s\n", - "Set `allow_col_split = TRUE` to split columns across pages, ", - "or reduce column widths / enable wrap_cols." - ), total_w, content_width_in, col_detail) - errors <- .overflow_signal(msg, overflow_action, errors) - } - - if (length(errors) > 0L) { - rlang::abort(paste(errors, collapse = "\n")) - } - - # --- Store final widths in resolved_cols --- - resolved_cols <- lapply(seq_len(n_cols), function(j) { - cs <- resolved_cols[[j]] - cs$width_in <- widths_in[[j]] - cs - }) - - # --- Determine column groups --- - col_groups <- paginate_cols(widths_in, content_width_in, n_grp, - tbl$allow_col_split, tbl$balance_col_pages) - - list(resolved_cols = resolved_cols, - col_groups = col_groups, - col_cont_label_half_w = col_cont_label_half_w) +# Total-width overflow check. Only meaningful when allow_col_split = FALSE. +.check_total_width_overflow <- function(widths_in, resolved_cols, + content_width_in, overflow_action, + errors, total_w) { + n_cols <- length(resolved_cols) + col_detail <- paste(vapply(seq_len(n_cols), function(j) { + sprintf(" %s: %.3g in", resolved_cols[[j]]$col, widths_in[[j]]) + }, character(1L)), collapse = "\n") + msg <- sprintf(paste0( + "Total column width (%.3g in) exceeds available content width (%.3g in) ", + "after wrapping.\nColumn widths:\n%s\n", + "Set `allow_col_split = TRUE` to split columns across pages, ", + "or reduce column widths / enable wrap_cols." + ), total_w, content_width_in, col_detail) + .overflow_signal(msg, overflow_action, errors) } # --------------------------------------------------------------------------- diff --git a/R/table_pagelist.R b/R/table_pagelist.R index 5a77c92..bb2583f 100644 --- a/R/table_pagelist.R +++ b/R/table_pagelist.R @@ -137,89 +137,149 @@ tfl_table_to_pagelist <- function(tbl, pg_width, pg_height, dots, resolved_cols <- resolve_col_specs(tbl) n_group_cols <- length(tbl$group_vars) - # --- Step 4: Compute column widths and determine column groups --- - # Keep a pre-width copy of resolved_cols in case a second pass is needed. - resolved_cols_0 <- resolved_cols - col_result <- compute_col_widths( - resolved_cols, tbl$data, cw, tbl, pg_width, pg_height, margins, - overflow_action = overflow_action - ) - resolved_cols <- col_result$resolved_cols # widths now set in inches - col_groups <- col_result$col_groups # list of integer vectors - has_col_split <- length(col_groups) > 1L - - # Second pass: if a column split was detected and col_cont_msg labels will - # appear, reserve half a character-height at each labelled viewport edge so - # table content does not overlap the rotated annotations. Labels appear on: - # left side — every col page that is NOT the first (col_cont_msg[[1L]]) - # right side — every col page that is NOT the last (col_cont_msg[[2L]]) - # Both conditions arise whenever n_col_groups > 1, so reduce cw by the - # relevant label half-widths and re-compute with the tighter constraint. - if (has_col_split && !is.null(tbl$col_cont_msg)) { - hw <- col_result$col_cont_label_half_w - cw_adj <- cw - if (!is.null(tbl$col_cont_msg[[1L]])) cw_adj <- cw_adj - hw - if (!is.null(tbl$col_cont_msg[[2L]])) cw_adj <- cw_adj - hw - col_result <- compute_col_widths( - resolved_cols_0, tbl$data, cw_adj, tbl, pg_width, pg_height, margins, - overflow_action = overflow_action, - validate_overflow = FALSE # first pass already validated - ) - resolved_cols <- col_result$resolved_cols - col_groups <- col_result$col_groups - has_col_split <- length(col_groups) > 1L - } - - # --- Step 5: Measure row heights --- - # Open scratch PDF device for height measurement - scratch_file_rh <- tempfile(fileext = ".pdf") - grDevices::pdf(scratch_file_rh, width = pg_width, height = pg_height) - outer_vp <- .make_outer_vp(margins) - grid::pushViewport(outer_vp) - on.exit({ - grid::popViewport() - grDevices::dev.off() - unlink(scratch_file_rh) - }, add = TRUE) - - breaks <- tbl$wrap_breaks %||% wrap_breaks_default() + # --- Step 4-6: Compute column widths, measure row heights, paginate --- + # Under col_split_strategy = "balanced", a row whose wrapped height exceeds + # the available page content height triggers a retry: the bottleneck + # column's minimum width is raised by `step_in` and the whole width- + # measurement-pagination loop runs again. Up to `row_overflow_max_retries` + # retries (default 5; 0L disables). After the cap the final paginate_rows + # call is made with the user's overflow_action so the standard error/warn + # path fires. + resolved_cols_0 <- resolved_cols # pre-width snapshot for re-runs + breaks <- tbl$wrap_breaks %||% wrap_breaks_default() wrap_extra_pad_in <- if (!is.null(tbl$wrap_extra_padding)) { .height_in(tbl$wrap_extra_padding) } else 0 + strategy <- tbl$col_split_strategy %||% "balanced" + max_retries <- as.integer(tbl$row_overflow_max_retries %||% 5L) + use_retry_loop <- identical(strategy, "balanced") && max_retries > 0L + floor_step_in <- 0.25 # how much to widen on each retry + floor_overrides <- numeric(0L) + names(floor_overrides) <- character(0L) + retries <- 0L + + # Per-iteration helper: opens a fresh row-height scratch device, runs + # the measurement + pagination phase, closes the scratch device, and + # returns (row_pages, cell_h_mat, cont_row_h). The scratch device's + # lifecycle is fully contained inside this helper so the surrounding + # retry loop never holds a viewport across a compute_col_widths() + # call (compute_col_widths() opens its own scratch devices internally). + .run_pagination_iter <- function(resolved_cols, collect_overflows) { + scratch_file_rh <- tempfile(fileext = ".pdf") + grDevices::pdf(scratch_file_rh, width = pg_width, height = pg_height) + rh_outer_vp <- .make_outer_vp(margins) + grid::pushViewport(rh_outer_vp) + on.exit({ + grid::popViewport() + grDevices::dev.off() + unlink(scratch_file_rh) + }, add = TRUE) + + header_row_h <- if (tbl$show_col_names) { + .measure_header_row_height(resolved_cols, tbl$gp, tbl$cell_padding, + tbl$line_height, breaks = breaks, + wrap_extra_pad_in = wrap_extra_pad_in) + } else 0 + + cell_h_mat <- measure_row_heights_tbl( + tbl$data, resolved_cols, tbl$gp, tbl$cell_padding, + tbl$na_string, tbl$line_height, tbl$max_measure_rows, + breaks = breaks, + wrap_extra_pad_in = wrap_extra_pad_in + ) - header_row_h <- if (tbl$show_col_names) { - .measure_header_row_height(resolved_cols, tbl$gp, tbl$cell_padding, - tbl$line_height, breaks = breaks, - wrap_extra_pad_in = wrap_extra_pad_in) - } else 0 + cont_row_h <- max( + .measure_cont_row_height(tbl$row_cont_msg[[1L]], tbl$gp, tbl$cell_padding, + tbl$line_height), + .measure_cont_row_height(tbl$row_cont_msg[[2L]], tbl$gp, tbl$cell_padding, + tbl$line_height) + ) - cell_h_mat <- measure_row_heights_tbl( - tbl$data, resolved_cols, tbl$gp, tbl$cell_padding, - tbl$na_string, tbl$line_height, tbl$max_measure_rows, - breaks = breaks, - wrap_extra_pad_in = wrap_extra_pad_in - ) + pr_args <- list( + tbl$data, cell_h_mat, resolved_cols, tbl$group_vars, + cont_row_h, header_row_h, ch, + tbl$row_cont_msg, tbl$group_rule, + suppress_repeated_groups = isTRUE(tbl$suppress_repeated_groups), + collect_overflows = collect_overflows + ) + if (!collect_overflows) { + pr_args$overflow_action <- overflow_action + } + pr_res <- do.call(paginate_rows, pr_args) + list( + pr_res = pr_res, + cell_h_mat = cell_h_mat, + cont_row_h = cont_row_h + ) + } - # cont_row_h: height of a (continued) row — measure the cont message text - cont_row_h <- max( - .measure_cont_row_height(tbl$row_cont_msg[[1L]], tbl$gp, tbl$cell_padding, - tbl$line_height), - .measure_cont_row_height(tbl$row_cont_msg[[2L]], tbl$gp, tbl$cell_padding, - tbl$line_height) - ) + repeat { + col_result <- compute_col_widths( + resolved_cols_0, tbl$data, cw, tbl, pg_width, pg_height, margins, + overflow_action = overflow_action, + floor_overrides = floor_overrides + ) + resolved_cols <- col_result$resolved_cols + col_groups <- col_result$col_groups + has_col_split <- length(col_groups) > 1L - # Rule heights: rules are drawn within existing space (0 height), but - # we need to know if we should budget for them when computing page capacity. - # Approach: rules are infinitesimally thin — they don't consume row space. - - # --- Step 6: Paginate rows --- - row_pages <- paginate_rows( - tbl$data, cell_h_mat, resolved_cols, tbl$group_vars, - cont_row_h, header_row_h, ch, - tbl$row_cont_msg, tbl$group_rule, - suppress_repeated_groups = isTRUE(tbl$suppress_repeated_groups), - overflow_action = overflow_action - ) + # Second pass: if a column split was detected and col_cont_msg labels + # will appear, reserve half a character-height at each labelled + # viewport edge so table content does not overlap the rotated + # annotations. Labels appear on every column page that is NOT first + # (left side) and NOT last (right side); both conditions arise + # whenever n_col_groups > 1, so reduce cw by the relevant label + # half-widths and re-compute with the tighter constraint. + if (has_col_split && !is.null(tbl$col_cont_msg)) { + hw <- col_result$col_cont_label_half_w + cw_adj <- cw + if (!is.null(tbl$col_cont_msg[[1L]])) cw_adj <- cw_adj - hw + if (!is.null(tbl$col_cont_msg[[2L]])) cw_adj <- cw_adj - hw + col_result <- compute_col_widths( + resolved_cols_0, tbl$data, cw_adj, tbl, pg_width, pg_height, margins, + overflow_action = overflow_action, + validate_overflow = FALSE, + floor_overrides = floor_overrides + ) + resolved_cols <- col_result$resolved_cols + col_groups <- col_result$col_groups + has_col_split <- length(col_groups) > 1L + } + + if (use_retry_loop && retries < max_retries) { + iter_res <- .run_pagination_iter(resolved_cols, collect_overflows = TRUE) + if (length(iter_res$pr_res$overflows) == 0L) { + row_pages <- iter_res$pr_res$pages + cell_h_mat <- iter_res$cell_h_mat + cont_row_h <- iter_res$cont_row_h + break + } + # Raise the bottleneck column's floor for the next retry. + for (ev in iter_res$pr_res$overflows) { + bot_j <- ev$bottleneck_col + if (bot_j < 1L || bot_j > length(resolved_cols)) next + cs <- resolved_cols[[bot_j]] + cur_w <- cs$width_in + new_floor <- cur_w + floor_step_in + prev <- if (cs$col %in% names(floor_overrides)) { + floor_overrides[[cs$col]] + } else { + 0 + } + floor_overrides[cs$col] <- max(prev, new_floor) + } + retries <- retries + 1L + # Loop back: recompute widths with the new floors. + } else { + # No retries left (or wrap_first mode): make the final call with the + # user's overflow_action so error/warn fires through the normal path. + iter_res <- .run_pagination_iter(resolved_cols, collect_overflows = FALSE) + row_pages <- iter_res$pr_res + cell_h_mat <- iter_res$cell_h_mat + cont_row_h <- iter_res$cont_row_h + break + } + } # --- Step 7: Assemble page specs --- n_rp <- length(row_pages) diff --git a/R/table_rows.R b/R/table_rows.R index df70ee5..6af93e1 100644 --- a/R/table_rows.R +++ b/R/table_rows.R @@ -235,15 +235,28 @@ measure_row_heights_tbl <- function(data, resolved_cols, gp_tbl, cell_padding, #' exceeds the available page content height (a row that wraps to taller #' than one page is almost always a sign of input that needs to change). #' The same knob downgrades column-overflow events; see [export_tfl_page()]. -#' @return A list of row-page specs, each with `$rows`, `$is_cont_top`, -#' `$is_cont_bottom`, `$group_starts`, and `$row_heights_in` (the committed -#' per-row heights for that page in inches). +#' Ignored when `collect_overflows = TRUE`. +#' @param collect_overflows Logical. When `FALSE` (default) the function +#' behaves as before: row-overflow events are routed through +#' `overflow_action`. When `TRUE` the function does *not* signal on +#' overflow but instead collects the events and returns them alongside +#' the page specs, so a caller can iterate (see the row-overflow retry +#' loop in `.tfl_table_to_pagelist_default()`). +#' @return When `collect_overflows = FALSE` (default), a list of row-page +#' specs, each with `$rows`, `$is_cont_top`, `$is_cont_bottom`, +#' `$group_starts`, and `$row_heights_in` (the committed per-row heights +#' for that page in inches). +#' When `collect_overflows = TRUE`, a 2-element list: +#' `$pages` (as above) and `$overflows` (a list of overflow events, +#' each `list(row, bottleneck_col, cell_height_in)`; empty if no +#' row overflowed). #' @keywords internal paginate_rows <- function(data, cell_h_mat, resolved_cols, group_vars, cont_row_h, header_row_h, content_height_in, row_cont_msg, group_rule, suppress_repeated_groups = TRUE, - overflow_action = "error") { + overflow_action = "error", + collect_overflows = FALSE) { n_rows <- nrow(data) # Group boundaries in the *full* data — used for the page-spec $group_starts @@ -251,11 +264,15 @@ paginate_rows <- function(data, cell_h_mat, resolved_cols, group_vars, group_starts <- .compute_group_starts(data, group_vars) pages <- list() + overflows <- list() cur_rows <- integer(0L) committed_rh <- numeric(0L) # heights for cur_rows after last successful add is_cont_top <- FALSE errors <- character(0L) + wrap_elig <- vapply(resolved_cols, function(cs) isTRUE(cs$wrap), + logical(1L)) + flush_page <- function(rows, row_heights_in, is_cont_top, is_cont_bottom) { pages[[length(pages) + 1L]] <<- list( rows = rows, @@ -313,14 +330,30 @@ paginate_rows <- function(data, cell_h_mat, resolved_cols, group_vars, (if (is_cont_top) cont_row_h else 0) + sum(rh) if (min_required > content_height_in + 1e-6) { - msg <- sprintf( - paste0("Row %d of the table wraps to a height (%.3g in) that ", - "exceeds the available page content height (%.3g in). ", - "Reduce the cell content, increase the page height, widen ", - "the column, or set the column to wrap less aggressively."), - i, sum(rh), content_height_in - ) - errors <- .overflow_signal(msg, overflow_action, errors) + if (collect_overflows) { + # Identify the bottleneck column: the wrap-eligible column whose + # cell in row i has the greatest measured height. Fall back to + # the tallest cell overall if no wrap-eligible column is present + # in this row (the caller's retry loop will still know which row + # was the problem). + cell_heights <- cell_h_mat[i, ] + cand_h <- cell_heights * as.numeric(wrap_elig) + bot_j <- if (any(cand_h > 0)) which.max(cand_h) else which.max(cell_heights) + overflows[[length(overflows) + 1L]] <- list( + row = i, + bottleneck_col = bot_j, + cell_height_in = cell_heights[[bot_j]] + ) + } else { + msg <- sprintf( + paste0("Row %d of the table wraps to a height (%.3g in) that ", + "exceeds the available page content height (%.3g in). ", + "Reduce the cell content, increase the page height, widen ", + "the column, or set the column to wrap less aggressively."), + i, sum(rh), content_height_in + ) + errors <- .overflow_signal(msg, overflow_action, errors) + } } # Fall through to commit the row. } @@ -335,6 +368,10 @@ paginate_rows <- function(data, cell_h_mat, resolved_cols, group_vars, flush_page(cur_rows, committed_rh, is_cont_top, is_cont_bottom = FALSE) } + if (collect_overflows) { + return(list(pages = pages, overflows = overflows)) + } + if (length(errors) > 0L) { rlang::abort(paste(errors, collapse = "\n")) } diff --git a/R/tfl_table.R b/R/tfl_table.R index c3ddef1..681cf88 100644 --- a/R/tfl_table.R +++ b/R/tfl_table.R @@ -251,6 +251,29 @@ tfl_colspec <- function(col, #' wrapped or `\n`-broken text. Default `unit(0.5, "lines")`. Set to #' `unit(0, "lines")` to disable. Only multi-line cells receive the extra; #' single-line cells are unaffected. +#' @param col_split_strategy Either `"balanced"` (default) or `"wrap_first"`. +#' Controls the order in which text wrapping and page-column-split +#' interact when a table is wider than one page. +#' +#' * `"balanced"` (default, introduced for issue #35): page-split using +#' the **minimum** survivable column widths for capacity planning, then +#' water-fill within each page so columns can be as wide as that page's +#' horizontal slack allows. Group columns are pinned at their minimum +#' width so per-page data columns get the most slack. Multi-page tables +#' end up with less-wrapped columns per page than the legacy strategy. +#' * `"wrap_first"` (pre-issue-35 behaviour): water-fill the whole table +#' down to one page's content width, then page-split using the post- +#' wrap widths. Every page-column-split page shows the same heavily +#' wrapped columns. Kept temporarily for empirical comparison; will be +#' removed in a future release if the `"balanced"` default proves +#' consistently better. +#' @param row_overflow_max_retries Non-negative integer. Maximum number of +#' times the `"balanced"` strategy will retry when `paginate_rows()` +#' reports a row whose wrapped height exceeds the page. Each retry +#' raises the bottleneck column's minimum width by 0.25 inches and +#' re-runs the width pipeline. `0L` disables the retry loop entirely +#' (the first row-overflow goes straight to `overflow_action`). Default +#' `5L`. Ignored when `col_split_strategy = "wrap_first"`. #' @param max_measure_rows Positive numeric or `Inf` (default). Maximum number #' of unique cell strings sampled per column when computing content-based #' column widths. Strings are sampled in descending order of `nchar()` so @@ -314,6 +337,8 @@ tfl_table <- function(x, cell_padding = grid::unit(c(0.2, 0.5), "lines"), line_height = 1.05, wrap_extra_padding = grid::unit(0.5, "lines"), + col_split_strategy = c("balanced", "wrap_first"), + row_overflow_max_retries = 5L, max_measure_rows = Inf) { # --- Validate x --- @@ -417,6 +442,19 @@ tfl_table <- function(x, } wrap_balance <- match.arg(wrap_balance) + # --- Validate col_split_strategy --- + if (!is.character(col_split_strategy) || + length(col_split_strategy) == 0L || + !all(col_split_strategy %in% c("balanced", "wrap_first"))) { + rlang::abort('`col_split_strategy` must be "balanced" or "wrap_first".') + } + col_split_strategy <- match.arg(col_split_strategy) + + # --- Validate row_overflow_max_retries --- + checkmate::assert_int(row_overflow_max_retries, lower = 0L, + .var.name = "row_overflow_max_retries") + row_overflow_max_retries <- as.integer(row_overflow_max_retries) + # --- Validate min_col_width --- checkmate::assert_class(min_col_width, "unit", .var.name = "min_col_width") @@ -517,6 +555,8 @@ tfl_table <- function(x, cell_padding = cell_padding, line_height = line_height, wrap_extra_padding = wrap_extra_padding, + col_split_strategy = col_split_strategy, + row_overflow_max_retries = row_overflow_max_retries, max_measure_rows = max_measure_rows ), class = "tfl_table" diff --git a/R/wrap.R b/R/wrap.R index fa7cf17..9054861 100644 --- a/R/wrap.R +++ b/R/wrap.R @@ -273,6 +273,195 @@ wrap_breaks_default <- function() { .wrap_string(label, inner, gp, breaks) } +# --------------------------------------------------------------------------- +# .compute_col_min_widths() - per-column minimum (floor) widths +# --------------------------------------------------------------------------- + +#' Per-column minimum survivable width in inches +#' +#' For wrap-eligible columns, the minimum is +#' `max(min_col_width, longest_unbreakable_token + h_pad)`, measured under +#' both the cell and header gpars and taking the larger so a bold-rendered +#' header token cannot be undersized. For non-wrap-eligible columns the +#' minimum equals the supplied `widths_natural` (those columns cannot +#' shrink without overflowing). +#' +#' Opens its own scratch PDF device and outer viewport for measurement. +#' +#' @param widths_natural Numeric vector of per-column natural widths +#' (inches). Used as the floor for non-wrap-eligible columns. +#' @param resolved_cols The `resolve_col_specs()` output. +#' @param data The full data frame from `tbl$data`. +#' @param tbl A `tfl_table` object (used for `gp`, `cell_padding`, +#' `line_height`, `na_string`, `max_measure_rows`, `min_col_width`, +#' `wrap_breaks`). +#' @param h_pad_in Horizontal cell padding (left+right) in inches. +#' @param min_in `min_col_width` resolved to inches. +#' @param pg_width,pg_height,margins Forwarded to the scratch device. +#' +#' @return Numeric vector of per-column minimum widths in inches. +#' +#' @keywords internal +.compute_col_min_widths <- function(widths_natural, resolved_cols, data, tbl, + h_pad_in, min_in, + pg_width, pg_height, margins) { + n <- length(resolved_cols) + breaks <- tbl$wrap_breaks %||% wrap_breaks_default() + na_str <- tbl$na_string + max_rows <- tbl$max_measure_rows + + scratch_file <- tempfile(fileext = ".pdf") + grDevices::pdf(scratch_file, width = pg_width, height = pg_height) + outer_vp <- .make_outer_vp(margins) + grid::pushViewport(outer_vp) + on.exit({ + grid::popViewport() + grDevices::dev.off() + unlink(scratch_file) + }, add = TRUE) + + vapply(seq_len(n), function(j) { + cs <- resolved_cols[[j]] + if (!isTRUE(cs$wrap)) { + return(widths_natural[[j]]) + } + cell_gp <- .gp_with_lineheight( + .resolve_table_cell_gp(tbl$gp, cs$is_group_col), tbl$line_height + ) + hdr_gp <- .gp_with_lineheight( + .resolve_table_gp(tbl$gp, "header_row"), tbl$line_height + ) + parts <- .split_col_strings(data[[cs$col]], cs$label, na_str, max_rows) + t_data <- .column_min_token_width_in(parts$data, cell_gp, breaks) + t_hdr <- .column_min_token_width_in(parts$header, hdr_gp, breaks) + max(min_in, max(t_data, t_hdr) + h_pad_in) + }, numeric(1L)) +} + +# --------------------------------------------------------------------------- +# .water_fill_to_budget() - pure water-from-top given pre-computed mins +# --------------------------------------------------------------------------- + +#' Water-fill widths down to a target budget, given pre-computed minimums +#' +#' Pure water-from-top: at each iteration find the widest set of +#' wrap-eligible columns above their floor (`widths_min`) and shrink them +#' together until they meet the next-widest competitor, hit a floor, or +#' absorb the remaining excess. Returns the per-column widths summing to +#' `≤ budget_in + eps` when feasible. +#' +#' Unlike [`.compute_wrapped_widths()`], this helper does *not* re-measure +#' the per-column floors from cell content — it trusts the supplied +#' `widths_min` vector. Use this when the floors are computed once and +#' applied to many sub-problems (per-page water-fill under the +#' `col_split_strategy = "balanced"` pipeline). +#' +#' If `sum(widths_min) > budget_in`, the function returns the widths +#' clamped to the floors (sum may still exceed budget); the caller is +#' responsible for detecting that case and paginating differently. +#' +#' @param widths_in Numeric vector of starting widths in inches. +#' @param widths_min Numeric vector of per-column floors in inches. +#' @param wrap_eligible Logical vector; only `TRUE` columns participate +#' in shrinking. +#' @param budget_in Numeric target sum for `widths_in`. +#' +#' @return Numeric vector of resulting widths. +#' +#' @keywords internal +.water_fill_to_budget <- function(widths_in, widths_min, wrap_eligible, + budget_in) { + n <- length(widths_in) + eps <- 1e-6 + + # First snap to floors: nothing can be below its floor. This handles the + # case where the caller passed in a width that's already too narrow. + widths_in <- pmax(widths_in, widths_min) + + max_iter <- 2L * n + 50L + for (iter in seq_len(max_iter)) { + excess <- sum(widths_in) - budget_in + if (excess <= eps) break + + active <- which(wrap_eligible & widths_in > widths_min + eps) + if (length(active) == 0L) break + + max_w <- max(widths_in[active]) + at_max <- active[widths_in[active] >= max_w - eps] + + others <- setdiff(active, at_max) + next_comp <- if (length(others) > 0L) max(widths_in[others]) else -Inf + floor_max <- max(widths_min[at_max]) + step_floor <- max_w - floor_max + step_compete <- max_w - next_comp + step_excess <- excess / length(at_max) + step <- min(step_floor, step_compete, step_excess) + if (step <= eps) break + + widths_in[at_max] <- widths_in[at_max] - step + } + + widths_in +} + +# --------------------------------------------------------------------------- +# .reconcile_page_widths() - flatten per-page widths into one per-col vector +# --------------------------------------------------------------------------- + +#' Combine per-page width vectors into one per-column vector +#' +#' Non-group columns appear on exactly one page-column-split page; their +#' final width is whatever that page allocated. Group columns repeat on +#' every page and must be drawn at a single width that satisfies every +#' page; under the `col_split_strategy = "balanced"` design they are pinned +#' at the minimum width across pages so data columns on every page receive +#' the most slack. This helper enforces both rules and returns a single +#' `numeric(n_cols)` vector with each column's final width. +#' +#' @param per_page_widths List of `numeric` vectors; element `g` is the +#' per-column width vector for `col_groups[[g]]` (length equals +#' `length(col_groups[[g]])`). +#' @param col_groups List of integer vectors of column indices per +#' page-column-split page (as returned by [`paginate_cols()`]). +#' @param n_group_cols Integer scalar; the first `n_group_cols` column +#' indices are group columns. +#' @param n_cols Total number of columns in the table. +#' +#' @return Numeric vector of length `n_cols`. +#' +#' @keywords internal +.reconcile_page_widths <- function(per_page_widths, col_groups, n_group_cols, + n_cols) { + widths_out <- rep(NA_real_, n_cols) + + # Group columns: take the MIN width across pages so data cols on every + # page receive the most slack. + if (n_group_cols > 0L) { + grp_idx <- seq_len(n_group_cols) + for (g in grp_idx) { + grp_widths <- vapply(seq_along(col_groups), function(p) { + pos <- match(g, col_groups[[p]]) + if (is.na(pos)) NA_real_ else per_page_widths[[p]][[pos]] + }, numeric(1L)) + widths_out[[g]] <- min(grp_widths, na.rm = TRUE) + } + } + + # Data columns: each appears on exactly one page-column-split page. + for (p in seq_along(col_groups)) { + page_idx <- col_groups[[p]] + page_w <- per_page_widths[[p]] + for (k in seq_along(page_idx)) { + j <- page_idx[[k]] + if (j > n_group_cols) { + widths_out[[j]] <- page_w[[k]] + } + } + } + + widths_out +} + # --------------------------------------------------------------------------- # .compute_wrapped_widths() - water-from-top column narrowing # --------------------------------------------------------------------------- From d73d91ef4da11bec73a6102fdc6c89470674082a Mon Sep 17 00:00:00 2001 From: Bill Denney Date: Mon, 11 May 2026 10:39:59 -0400 Subject: [PATCH 2/2] Tests, demos, vignettes, design doc for issue 35 Adds the verification harness and user-facing documentation for the balanced wrap pipeline: - tests/testthat/test-wrap.R gains 15+ tests covering: * `.water_fill_to_budget()` no-op, snap-to-floor, equal-shrink-of- widest, floor honored when budget too tight, non-wrap cols left alone; * `.reconcile_page_widths()` non-group widths per-page, group cols taking the MIN across pages; * balanced vs wrap_first on a multi-page input (balanced widths strictly wider for the wrap-eligible cols); * balanced == wrap_first when table fits one page; * group cols pinned at min width across pages under balanced; * `col_split_strategy` and `row_overflow_max_retries` validation; * `paginate_rows(collect_overflows = TRUE)` returns pages + overflow info instead of aborting; * `row_overflow_max_retries = 0L` disables the retry loop. - examples/wrap_demos.R gains a "col_split_strategy comparison" section: scenarios 15-17 render the same input twice (once with each strategy) to make the per-page width difference visible side-by-side. Scenario 18 shows the row-overflow retry loop: with retries disabled the input errors; with retries enabled (the default) the loop widens the bottleneck column until the row fits and a PDF is produced. - A subtle bug found while building scenario 18: the floor-override mechanism raised `widths_min` but not `widths_natural`, so the Case-A path (`sum(natural) <= content_width`) ignored the retry's effect. The fix bumps `widths_natural[j]` whenever a floor override raises it. Without this fix the retry loop runs but never increases the rendered column width on Case-A inputs. - vignettes/v02-tfl_table_intro.Rmd: `col_split_strategy` and `row_overflow_max_retries` added to the `tfl_table()` argument table. - vignettes/v03-tfl_table_styling.Rmd: new "Balanced split-then-wrap" subsection explaining the strategy choice, the group-column pin-at-min rule, and the row-overflow retry loop. - vignettes/v04-troubleshooting.Rmd: "Row wrapped to taller than one page" updated to mention the retry loop, how to tune it via `row_overflow_max_retries`, and how to disable it for issue-#28 behaviour. - design/DECISIONS.md: new entry D-42 capturing the algorithm, the group-col-at-min design call, the row-overflow retry mechanism, alternatives considered, and the empirical-comparison rollout plan with `col_split_strategy = "wrap_first"`. All 1170+ tests pass. `R CMD check` is 0 errors / 0 warnings / 2 pre- existing NOTEs (worktree `.git` marker + future-timestamp flake). Co-Authored-By: Claude Opus 4.7 (1M context) --- R/table_columns.R | 11 +- design/DECISIONS.md | 134 +++++++++++++++ examples/wrap_demos.R | 174 +++++++++++++++++++ man/compute_col_widths.Rd | 3 +- man/dot-compute_col_min_widths.Rd | 51 ++++++ man/dot-reconcile_page_widths.Rd | 34 ++++ man/dot-water_fill_to_budget.Rd | 40 +++++ man/paginate_rows.Rd | 24 ++- man/tfl_table.Rd | 28 ++++ tests/testthat/test-wrap.R | 250 ++++++++++++++++++++++++++++ vignettes/v02-tfl_table_intro.Rmd | 2 + vignettes/v03-tfl_table_styling.Rmd | 38 +++++ vignettes/v04-troubleshooting.Rmd | 19 ++- 13 files changed, 794 insertions(+), 14 deletions(-) create mode 100644 man/dot-compute_col_min_widths.Rd create mode 100644 man/dot-reconcile_page_widths.Rd create mode 100644 man/dot-water_fill_to_budget.Rd diff --git a/R/table_columns.R b/R/table_columns.R index 363de6f..80cadef 100644 --- a/R/table_columns.R +++ b/R/table_columns.R @@ -383,13 +383,20 @@ compute_col_widths <- function(resolved_cols, data, content_width_in, ) # Apply floor overrides from the row-overflow retry loop, if any. + # An override raises both widths_min AND widths_natural: it represents + # "the renderer told us this column needs to be at least N inches wide + # for its content to fit on a page", which is a hard lower bound the + # decision tree must honour even on the Case-A (no-wrap-needed) path + # where widths_natural would otherwise be used as-is. if (length(floor_overrides) > 0L) { for (col_name in names(floor_overrides)) { j <- match(col_name, vapply(resolved_cols, `[[`, character(1L), "col")) if (!is.na(j)) { - widths_min[[j]] <- max(widths_min[[j]], - unname(floor_overrides[[col_name]])) + bumped_floor <- max(widths_min[[j]], + unname(floor_overrides[[col_name]])) + widths_min[[j]] <- bumped_floor + widths_natural[[j]] <- max(widths_natural[[j]], bumped_floor) } } } diff --git a/design/DECISIONS.md b/design/DECISIONS.md index c70db8a..7468310 100644 --- a/design/DECISIONS.md +++ b/design/DECISIONS.md @@ -1165,3 +1165,137 @@ to `"auto"`, which can wrap previously-overflowing tables. Per the project owner's confirmation, no backward-compatibility constraint is in force at this development stage. Tables that already fit see no behavioural change. + +--- + +## D-42: Balance word-wrap with column-split (issue #35) + +**Decision:** Reverse the order of the two algorithms that decide how a +wide `tfl_table` lays out: page-split the columns **first** using each +column's *minimum* survivable width as the capacity-planning input, then +water-fill each resulting page's columns locally within that page's +horizontal slack. The pre-issue-#35 order (water-fill the whole table +down to one page width, then page-split using the post-wrap widths) is +preserved as `col_split_strategy = "wrap_first"` so the two orderings +can be empirically compared before removing the legacy one. + +**User need (from issue #35):** the existing pipeline was producing +*every* page-column-split page with the *same* heavily-wrapped widths - +widths chosen to fit *all* columns on one page even when most ended up +on different pages. Per-page water-fill gives each page columns sized +for that page's actual slack instead. + +**Decision tree (in `.compute_col_widths_balanced()`):** + +``` +total_natural = sum(natural widths) +total_min = sum(minimum widths) + +if total_natural <= content_width: + Case A use natural widths; one page-group. +elif total_min <= content_width: + Case B one page-group; water-fill natural down to fit. +else: + Case C page-split using widths_min for capacity. Per page, + water-fill (natural -> page-slack) with group columns + pinned at their min width so data columns absorb the slack. + Reconcile per-page widths via .reconcile_page_widths() + (group cols get the MIN across pages; data cols each + appear on one page). +``` + +**Group column width rule (user's design call):** group columns repeat +on every page-column-split page. Rather than letting each page choose +its group-column width independently (data-structure churn) or pinning +to the max width across pages (wastes per-page data slack), the package +pins group columns at their *minimum* width on every page. Rationale: +group columns often carry multi-line labels that flow across rows via +the rowspan suppression behaviour added by issue #29, so they rarely +need full natural width. Data columns benefit more from the slack. + +**Row-overflow retry loop (step 5 of the issue):** after per-page widths +are decided, `paginate_rows()` measures cell heights and flags rows +whose wrapped height exceeds the page. Under +`col_split_strategy = "balanced"` the orchestrator retries: it raises +the bottleneck column's minimum by 0.25 inches, runs the width pipeline +again, and re-paginates. Up to `row_overflow_max_retries` iterations +(default `5L`; `0L` disables the loop). After the cap, the final +`paginate_rows()` call goes through the existing `overflow_action` +path so the user-visible error/warn behaviour is unchanged. + +For row-overflow recovery to work the *natural* width must also rise +with the retry floor. Otherwise the Case-A branch (where +`sum(natural) <= content_width`) keeps the user's narrow fixed-width +setting and the cell still wraps to too many lines. The implementation +bumps `widths_natural[j] <- max(widths_natural[j], floor_override[j])` +when overrides are applied. + +**Helpers (`R/wrap.R`):** +- `.compute_col_min_widths()` extracts the floor-measurement portion + of `.compute_wrapped_widths()` so the balanced strategy can compute + minimums once and reuse them in the decision tree. +- `.water_fill_to_budget()` pure water-from-top loop taking + pre-computed mins. No scratch device; safe to call inside the + per-page loop. +- `.reconcile_page_widths()` flattens per-page width vectors into one + per-column vector; group columns take the MIN across pages. + +**`paginate_rows()` API change:** new `collect_overflows` parameter. +When `TRUE`, the function returns `list(pages, overflows)` instead of +signalling on row-overflow. The retry loop in +`.tfl_table_to_pagelist_default()` uses this mode; the final +post-cap call uses `collect_overflows = FALSE` to fire the user's +`overflow_action`. + +**`compute_col_widths()` dispatcher:** the function is now a thin +dispatcher on `tbl$col_split_strategy`. Shared setup (Pass-1 +auto-size, Pass-2 relative weights, Pass-3 auto-detect wrap +eligibility) lives in `.resolve_natural_widths()`. Two strategy +functions handle the rest: +- `.compute_col_widths_wrap_first()` - legacy body preserved. +- `.compute_col_widths_balanced()` - new logic. + +**Why keep both strategies in the same release?** The user explicitly +asked for empirical before/after comparison. The demo script renders +each scenario under both strategies so the difference is visible +side-by-side. After evaluation the legacy path can be removed (one +function deletion + dispatch simplification) or kept as an escape hatch. + +**Alternatives considered and rejected:** +- *Single-pass algorithm choosing widths and pages jointly* - global + optimisation over `(page_assignment, per_page_widths)` is + combinatorial in column count; for typical 5-30 column tables not + worth the complexity vs. greedy split + per-page water-fill. +- *Per-page group-column width* - flattens per-page widths needed a + more elaborate data structure with one width-per-(col, page). Not + worth the cognitive load when pinning at min keeps the structure + flat and visually consistent. +- *Smarter row-overflow heuristic (move bottleneck column to its own + page)* - the simple "raise the bottleneck's floor and re-split" + approach already works for the cases the user described. Out of + scope; can be revisited if a test case shows the simple heuristic + diverging. + +**Files touched:** +- Modified: `R/table_columns.R` (dispatcher + two strategy functions + + shared-setup helper + shared overflow-check helpers). +- Modified: `R/wrap.R` (three new helpers). +- Modified: `R/tfl_table.R` (two new args, validation, persistence). +- Modified: `R/table_rows.R` (`collect_overflows` param; + bottleneck-col reporting in the row-overflow path). +- Modified: `R/table_pagelist.R` (retry loop wraps Step 4-6; per-iter + helper for scratch-device lifecycle). +- New tests: `.water_fill_to_budget()`, + `.reconcile_page_widths()`, balanced-vs-wrap_first end-to-end, + retry-loop arg validation, `paginate_rows(collect_overflows = TRUE)`, + `row_overflow_max_retries = 0L` disabling the loop. +- New demos: scenarios 15-18 in `examples/wrap_demos.R` (paired + `_wrap_first.pdf` / `_balanced.pdf` PDFs). + +**Backward compatibility:** the `col_split_strategy` default is +`"balanced"`, so multi-page tables under default settings will produce +different per-page column widths than before issue #35. Per the +project owner's stance, no backward-compat constraint is in force. +Single-page tables (Case A or B) produce identical output under both +strategies (verified by a regression test). `wrap_first` is preserved +as an opt-in for empirical comparison and as an escape hatch. diff --git a/examples/wrap_demos.R b/examples/wrap_demos.R index 2abcfd2..1f948a0 100644 --- a/examples/wrap_demos.R +++ b/examples/wrap_demos.R @@ -487,6 +487,180 @@ add_section( } ) +# =========================================================================== +# Issue #35 - col_split_strategy comparison +# =========================================================================== +# Each scenario renders TWICE on the same input - once with the legacy +# "wrap_first" strategy and once with the new default "balanced". Compare +# the *_wrap_first.pdf and *_balanced.pdf pages side-by-side to see whether +# the new strategy actually delivers wider per-page columns. + +cat("\n## col_split_strategy comparison (issue #35)\n\n", + "The PDFs below are paired: open the _wrap_first and _balanced ", + "versions of each scenario side-by-side and compare per-page column ", + "widths. Under the new default the columns on each split page receive ", + "that page's actual horizontal slack rather than the cross-page", + "minimum.\n\n", + file = readme, append = TRUE, sep = "") + +# Scenario 15: 4 columns, two wrap-eligible string + two unbreakable. +# Page 1 = {a, b, c}, page 2 = {d}. Under wrap_first, a + b are crushed +# to their floor because the whole-table water-fill has to keep room for +# d (which ends up on page 2 anyway). Under balanced, a + b share +# page 1's slack with c and end up much wider. +issue35_asym_df <- data.frame( + alpha = rep(paste(rep("alpha", 8), collapse = " "), 4), + bravo = rep(paste(rep("bravo", 8), collapse = " "), 4), + short_c = rep("unbreak_one_token_here", 4), + long_d = rep("another_long_token_unbreakable", 4), + stringsAsFactors = FALSE +) + +for (strat in c("wrap_first", "balanced")) { + local({ + s <- strat + add_section( + sprintf("15_split_with_unbreakables_%s.pdf", s), + sprintf("Asymmetric mix - col_split_strategy = \"%s\"", s), + sprintf(paste0("4 columns: two wrap-eligible strings (alpha, bravo) ", + "and two single-token columns (short_c, long_d). Page ", + "1 receives {alpha, bravo, short_c}; page 2 receives ", + "{long_d} alone. Under col_split_strategy = \"%s\". ", + "Look at how wide alpha and bravo are on page 1: under ", + "\"wrap_first\" they are crushed to ~0.6 inches, under ", + "\"balanced\" they are ~1.4 inches."), s), + function() { + tbl <- tfl_table(issue35_asym_df, col_split_strategy = s) + export_tfl(tbl, + file = p(sprintf("15_split_with_unbreakables_%s.pdf", s)), + pg_width = 6, pg_height = 8.5, + min_content_height = grid::unit(1, "inches")) + } + ) + }) +} + +# Scenario 16: 3 wrap-eligible cols + 2 numeric cols, multi-page table. +# Demonstrates that under "balanced" each page's wrap-eligible columns +# water-fill into that page's slack. +issue35_clinical_df <- data.frame( + ae_term = rep(paste(rep("Headache mild moderate severe related", + 4), collapse = " "), 10), + ae_action = rep(paste(rep("Drug withdrawn temporarily", + 4), collapse = " "), 10), + ae_notes = rep(paste(rep("Patient continued safely", + 4), collapse = " "), 10), + onset_day = 1:10, + duration_day = 11:20 +) + +for (strat in c("wrap_first", "balanced")) { + local({ + s <- strat + add_section( + sprintf("16_clinical_three_wrap_cols_%s.pdf", s), + sprintf("3 wrap-eligible string cols + 2 numerics - col_split_strategy = \"%s\"", s), + sprintf(paste0("A clinical listing with three free-text columns and ", + "two narrow numeric columns. Total natural width ", + "exceeds one page so the table page-splits. Under ", + "\"%s\". Open both versions side-by-side: \"balanced\" ", + "should give each free-text column substantially more ", + "width per page than \"wrap_first\"."), s), + function() { + tbl <- tfl_table(issue35_clinical_df, col_split_strategy = s) + export_tfl(tbl, + file = p(sprintf("16_clinical_three_wrap_cols_%s.pdf", s)), + pg_width = 6, pg_height = 8.5, + min_content_height = grid::unit(1, "inches")) + } + ) + }) +} + +# Scenario 17: single-page table (sum(natural) <= content_width). Both +# strategies should produce identical output. +issue35_small_df <- data.frame( + arm = c("Active", "Placebo"), + n = c(120L, 118L), + responder = c(68L, 31L), + stringsAsFactors = FALSE +) + +for (strat in c("wrap_first", "balanced")) { + local({ + s <- strat + add_section( + sprintf("17_fits_one_page_%s.pdf", s), + sprintf("Table that fits one page - col_split_strategy = \"%s\"", s), + sprintf(paste0("Sanity check: a table that fits comfortably on one ", + "page width should produce identical output under both ", + "strategies (no wrap or split needed). ", + "col_split_strategy = \"%s\"."), s), + function() { + tbl <- tfl_table(issue35_small_df, col_split_strategy = s) + export_tfl(tbl, + file = p(sprintf("17_fits_one_page_%s.pdf", s)), + pg_width = 6, pg_height = 8.5, + min_content_height = grid::unit(1, "inches")) + } + ) + }) +} + +# Scenario 18: row-overflow recovery via row_overflow_max_retries. A +# moderately-sized cell wrapped into a deliberately narrow column would +# normally overflow the page; the balanced strategy's retry loop should +# widen the offending column and recover. Sized so the default 5 retries +# (each + 0.25") arrive at a column wide enough to fit the cell within +# the page content height. +issue35_overflow_df <- data.frame( + note = rep(paste(rep("alpha beta gamma delta", + 5), collapse = " "), 3), + small = rep("x", 3), + stringsAsFactors = FALSE +) + +add_section( + "18_row_overflow_retry_disabled.pdf", + "Row-overflow with retry disabled (row_overflow_max_retries = 0)", + paste0("Same input as 18_..._enabled.pdf below but with the retry loop ", + "disabled. The first row's wrapped height exceeds the page so ", + "the standard error fires (no PDF produced). See captured stderr ", + "below."), + function() { + tbl <- tfl_table( + issue35_overflow_df, + cols = list(tfl_colspec("note", width = grid::unit(0.5, "inches"), + wrap = TRUE)), + row_overflow_max_retries = 0L + ) + export_tfl(tbl, file = p("18_row_overflow_retry_disabled.pdf"), + pg_width = 5, pg_height = 5, + min_content_height = grid::unit(0.5, "inches")) + } +) + +add_section( + "18_row_overflow_retry_enabled.pdf", + "Row-overflow with retry enabled (default 5 retries)", + paste0("Same input. Under the default `row_overflow_max_retries = 5L`, ", + "the balanced strategy raises the offending column's minimum ", + "width by 0.25 inches and re-runs the width pipeline. After ", + "enough retries the column is wide enough that its cell wraps ", + "to a height that fits on the page, and the PDF renders."), + function() { + tbl <- tfl_table( + issue35_overflow_df, + cols = list(tfl_colspec("note", width = grid::unit(0.5, "inches"), + wrap = TRUE)) + # row_overflow_max_retries defaults to 5L + ) + export_tfl(tbl, file = p("18_row_overflow_retry_enabled.pdf"), + pg_width = 5, pg_height = 5, + min_content_height = grid::unit(0.5, "inches")) + } +) + # --------------------------------------------------------------------------- # Summary # --------------------------------------------------------------------------- diff --git a/man/compute_col_widths.Rd b/man/compute_col_widths.Rd index 9a83bba..85c24f7 100644 --- a/man/compute_col_widths.Rd +++ b/man/compute_col_widths.Rd @@ -13,7 +13,8 @@ compute_col_widths( pg_height, margins, overflow_action = c("error", "warn"), - validate_overflow = TRUE + validate_overflow = TRUE, + floor_overrides = NULL ) } \arguments{ diff --git a/man/dot-compute_col_min_widths.Rd b/man/dot-compute_col_min_widths.Rd new file mode 100644 index 0000000..523d9c0 --- /dev/null +++ b/man/dot-compute_col_min_widths.Rd @@ -0,0 +1,51 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wrap.R +\name{.compute_col_min_widths} +\alias{.compute_col_min_widths} +\title{Per-column minimum survivable width in inches} +\usage{ +.compute_col_min_widths( + widths_natural, + resolved_cols, + data, + tbl, + h_pad_in, + min_in, + pg_width, + pg_height, + margins +) +} +\arguments{ +\item{widths_natural}{Numeric vector of per-column natural widths +(inches). Used as the floor for non-wrap-eligible columns.} + +\item{resolved_cols}{The \code{resolve_col_specs()} output.} + +\item{data}{The full data frame from \code{tbl$data}.} + +\item{tbl}{A \code{tfl_table} object (used for \code{gp}, \code{cell_padding}, +\code{line_height}, \code{na_string}, \code{max_measure_rows}, \code{min_col_width}, +\code{wrap_breaks}).} + +\item{h_pad_in}{Horizontal cell padding (left+right) in inches.} + +\item{min_in}{\code{min_col_width} resolved to inches.} + +\item{pg_width, pg_height, margins}{Forwarded to the scratch device.} +} +\value{ +Numeric vector of per-column minimum widths in inches. +} +\description{ +For wrap-eligible columns, the minimum is +\code{max(min_col_width, longest_unbreakable_token + h_pad)}, measured under +both the cell and header gpars and taking the larger so a bold-rendered +header token cannot be undersized. For non-wrap-eligible columns the +minimum equals the supplied \code{widths_natural} (those columns cannot +shrink without overflowing). +} +\details{ +Opens its own scratch PDF device and outer viewport for measurement. +} +\keyword{internal} diff --git a/man/dot-reconcile_page_widths.Rd b/man/dot-reconcile_page_widths.Rd new file mode 100644 index 0000000..cd75c5f --- /dev/null +++ b/man/dot-reconcile_page_widths.Rd @@ -0,0 +1,34 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wrap.R +\name{.reconcile_page_widths} +\alias{.reconcile_page_widths} +\title{Combine per-page width vectors into one per-column vector} +\usage{ +.reconcile_page_widths(per_page_widths, col_groups, n_group_cols, n_cols) +} +\arguments{ +\item{per_page_widths}{List of \code{numeric} vectors; element \code{g} is the +per-column width vector for \code{col_groups[[g]]} (length equals +\code{length(col_groups[[g]])}).} + +\item{col_groups}{List of integer vectors of column indices per +page-column-split page (as returned by \code{\link[=paginate_cols]{paginate_cols()}}).} + +\item{n_group_cols}{Integer scalar; the first \code{n_group_cols} column +indices are group columns.} + +\item{n_cols}{Total number of columns in the table.} +} +\value{ +Numeric vector of length \code{n_cols}. +} +\description{ +Non-group columns appear on exactly one page-column-split page; their +final width is whatever that page allocated. Group columns repeat on +every page and must be drawn at a single width that satisfies every +page; under the \code{col_split_strategy = "balanced"} design they are pinned +at the minimum width across pages so data columns on every page receive +the most slack. This helper enforces both rules and returns a single +\code{numeric(n_cols)} vector with each column's final width. +} +\keyword{internal} diff --git a/man/dot-water_fill_to_budget.Rd b/man/dot-water_fill_to_budget.Rd new file mode 100644 index 0000000..1ea9094 --- /dev/null +++ b/man/dot-water_fill_to_budget.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/wrap.R +\name{.water_fill_to_budget} +\alias{.water_fill_to_budget} +\title{Water-fill widths down to a target budget, given pre-computed minimums} +\usage{ +.water_fill_to_budget(widths_in, widths_min, wrap_eligible, budget_in) +} +\arguments{ +\item{widths_in}{Numeric vector of starting widths in inches.} + +\item{widths_min}{Numeric vector of per-column floors in inches.} + +\item{wrap_eligible}{Logical vector; only \code{TRUE} columns participate +in shrinking.} + +\item{budget_in}{Numeric target sum for \code{widths_in}.} +} +\value{ +Numeric vector of resulting widths. +} +\description{ +Pure water-from-top: at each iteration find the widest set of +wrap-eligible columns above their floor (\code{widths_min}) and shrink them +together until they meet the next-widest competitor, hit a floor, or +absorb the remaining excess. Returns the per-column widths summing to +\verb{≤ budget_in + eps} when feasible. +} +\details{ +Unlike \code{\link[=.compute_wrapped_widths]{.compute_wrapped_widths()}}, this helper does \emph{not} re-measure +the per-column floors from cell content — it trusts the supplied +\code{widths_min} vector. Use this when the floors are computed once and +applied to many sub-problems (per-page water-fill under the +\code{col_split_strategy = "balanced"} pipeline). + +If \code{sum(widths_min) > budget_in}, the function returns the widths +clamped to the floors (sum may still exceed budget); the caller is +responsible for detecting that case and paginating differently. +} +\keyword{internal} diff --git a/man/paginate_rows.Rd b/man/paginate_rows.Rd index 7aec55f..04c3594 100644 --- a/man/paginate_rows.Rd +++ b/man/paginate_rows.Rd @@ -15,7 +15,8 @@ paginate_rows( row_cont_msg, group_rule, suppress_repeated_groups = TRUE, - overflow_action = "error" + overflow_action = "error", + collect_overflows = FALSE ) } \arguments{ @@ -45,12 +46,25 @@ use; currently does not affect pagination because rules are 0-height.)} the row-overflow guard reports a single row whose committed height exceeds the available page content height (a row that wraps to taller than one page is almost always a sign of input that needs to change). -The same knob downgrades column-overflow events; see \code{\link[=export_tfl_page]{export_tfl_page()}}.} +The same knob downgrades column-overflow events; see \code{\link[=export_tfl_page]{export_tfl_page()}}. +Ignored when \code{collect_overflows = TRUE}.} + +\item{collect_overflows}{Logical. When \code{FALSE} (default) the function +behaves as before: row-overflow events are routed through +\code{overflow_action}. When \code{TRUE} the function does \emph{not} signal on +overflow but instead collects the events and returns them alongside +the page specs, so a caller can iterate (see the row-overflow retry +loop in \code{.tfl_table_to_pagelist_default()}).} } \value{ -A list of row-page specs, each with \verb{$rows}, \verb{$is_cont_top}, -\verb{$is_cont_bottom}, \verb{$group_starts}, and \verb{$row_heights_in} (the committed -per-row heights for that page in inches). +When \code{collect_overflows = FALSE} (default), a list of row-page +specs, each with \verb{$rows}, \verb{$is_cont_top}, \verb{$is_cont_bottom}, +\verb{$group_starts}, and \verb{$row_heights_in} (the committed per-row heights +for that page in inches). +When \code{collect_overflows = TRUE}, a 2-element list: +\verb{$pages} (as above) and \verb{$overflows} (a list of overflow events, +each \code{list(row, bottleneck_col, cell_height_in)}; empty if no +row overflowed). } \description{ Uses a per-page tentative recompute of \code{.compute_page_row_heights()} so that diff --git a/man/tfl_table.Rd b/man/tfl_table.Rd index 8ea4acc..90ed05a 100644 --- a/man/tfl_table.Rd +++ b/man/tfl_table.Rd @@ -35,6 +35,8 @@ tfl_table( cell_padding = grid::unit(c(0.2, 0.5), "lines"), line_height = 1.05, wrap_extra_padding = grid::unit(0.5, "lines"), + col_split_strategy = c("balanced", "wrap_first"), + row_overflow_max_retries = 5L, max_measure_rows = Inf ) } @@ -233,6 +235,32 @@ wrapped or \verb{\\n}-broken text. Default \code{unit(0.5, "lines")}. Set to \code{unit(0, "lines")} to disable. Only multi-line cells receive the extra; single-line cells are unaffected.} +\item{col_split_strategy}{Either \code{"balanced"} (default) or \code{"wrap_first"}. +Controls the order in which text wrapping and page-column-split +interact when a table is wider than one page. +\itemize{ +\item \code{"balanced"} (default, introduced for issue #35): page-split using +the \strong{minimum} survivable column widths for capacity planning, then +water-fill within each page so columns can be as wide as that page's +horizontal slack allows. Group columns are pinned at their minimum +width so per-page data columns get the most slack. Multi-page tables +end up with less-wrapped columns per page than the legacy strategy. +\item \code{"wrap_first"} (pre-issue-35 behaviour): water-fill the whole table +down to one page's content width, then page-split using the post- +wrap widths. Every page-column-split page shows the same heavily +wrapped columns. Kept temporarily for empirical comparison; will be +removed in a future release if the \code{"balanced"} default proves +consistently better. +}} + +\item{row_overflow_max_retries}{Non-negative integer. Maximum number of +times the \code{"balanced"} strategy will retry when \code{paginate_rows()} +reports a row whose wrapped height exceeds the page. Each retry +raises the bottleneck column's minimum width by 0.25 inches and +re-runs the width pipeline. \code{0L} disables the retry loop entirely +(the first row-overflow goes straight to \code{overflow_action}). Default +\code{5L}. Ignored when \code{col_split_strategy = "wrap_first"}.} + \item{max_measure_rows}{Positive numeric or \code{Inf} (default). Maximum number of unique cell strings sampled per column when computing content-based column widths. Strings are sampled in descending order of \code{nchar()} so diff --git a/tests/testthat/test-wrap.R b/tests/testthat/test-wrap.R index 4f6eeb0..5ca2744 100644 --- a/tests/testthat/test-wrap.R +++ b/tests/testthat/test-wrap.R @@ -298,3 +298,253 @@ test_that(".compute_wrapped_widths respects the longest-token floor", { }) expect_gte(out[[1]], floor_w - 1e-6) }) + +# .water_fill_to_budget() ---------------------------------------------------- + +test_that(".water_fill_to_budget no-op when sum already <= budget", { + out <- writetfl:::.water_fill_to_budget( + widths_in = c(1, 2, 1), + widths_min = c(0.5, 0.5, 0.5), + wrap_eligible = c(TRUE, TRUE, FALSE), + budget_in = 10 + ) + expect_equal(out, c(1, 2, 1)) +}) + +test_that(".water_fill_to_budget snaps widths_in up to widths_min on entry", { + out <- writetfl:::.water_fill_to_budget( + widths_in = c(0.1, 0.1), + widths_min = c(0.5, 0.5), + wrap_eligible = c(TRUE, TRUE), + budget_in = 5 + ) + expect_equal(out, c(0.5, 0.5)) +}) + +test_that(".water_fill_to_budget shrinks widest wrap-eligible together", { + out <- writetfl:::.water_fill_to_budget( + widths_in = c(4, 4, 1), + widths_min = c(0.5, 0.5, 1), + wrap_eligible = c(TRUE, TRUE, FALSE), + budget_in = 5 + ) + # The two equally-wide wrap cols share the deficit; the unbreakable col + # is preserved at its width. + expect_equal(out[[3]], 1) + expect_equal(out[[1]], out[[2]], tolerance = 1e-6) + expect_equal(sum(out), 5, tolerance = 1e-6) +}) + +test_that(".water_fill_to_budget honors floors even when budget is too tight", { + out <- writetfl:::.water_fill_to_budget( + widths_in = c(5, 5), + widths_min = c(1, 1), + wrap_eligible = c(TRUE, TRUE), + budget_in = 1 # impossible: both at floor sum to 2 + ) + # Hits floor; sum may still exceed budget. + expect_equal(out, c(1, 1)) + expect_gt(sum(out), 1) +}) + +test_that(".water_fill_to_budget leaves non-wrap cols alone", { + out <- writetfl:::.water_fill_to_budget( + widths_in = c(3, 3), + widths_min = c(1, 1), + wrap_eligible = c(FALSE, FALSE), + budget_in = 1 + ) + expect_equal(out, c(3, 3)) # nothing shrinkable; budget breached +}) + +# .reconcile_page_widths() --------------------------------------------------- + +test_that(".reconcile_page_widths assigns each non-group col its per-page width", { + # 3 data cols (no group cols) split across 2 pages: page 1 = {1, 2}, + # page 2 = {3}. + out <- writetfl:::.reconcile_page_widths( + per_page_widths = list(c(1.5, 2.0), c(3.0)), + col_groups = list(c(1L, 2L), c(3L)), + n_group_cols = 0L, + n_cols = 3L + ) + expect_equal(out, c(1.5, 2.0, 3.0)) +}) + +test_that(".reconcile_page_widths group cols take the MIN width across pages", { + # 1 group col (col 1) + 2 data cols (2, 3) split across 2 pages. + # Page 1: {1, 2} with widths (0.8, 1.5). + # Page 2: {1, 3} with widths (0.6, 2.0). + # Group col should resolve to min(0.8, 0.6) = 0.6. + out <- writetfl:::.reconcile_page_widths( + per_page_widths = list(c(0.8, 1.5), c(0.6, 2.0)), + col_groups = list(c(1L, 2L), c(1L, 3L)), + n_group_cols = 1L, + n_cols = 3L + ) + expect_equal(out, c(0.6, 1.5, 2.0)) +}) + +# col_split_strategy = "balanced" vs "wrap_first" end-to-end ----------------- + +test_that("balanced strategy gives wider per-page columns than wrap_first when table page-splits", { + # 2 wrap-eligible string cols (a, b) + 2 unbreakable single-token cols + # (c, d). Total natural > content_width, and the page-split puts + # (a, b, c) on page 1 and (d) on page 2. Under wrap_first the + # whole-table water-fill crushes a and b to their floors so the table + # fits *somewhere*; under balanced, page 1 water-fills locally so a and + # b get nearly the full page-1 slack. + df <- data.frame( + a = rep(paste(rep("alpha", 8), collapse = " "), 5), + b = rep(paste(rep("bravo", 8), collapse = " "), 5), + c = rep("unbreak_one_token_here", 5), + d = rep("another_long_token_unbreakable", 5), + stringsAsFactors = FALSE + ) + measure <- function(strat) { + tbl <- tfl_table(df, col_split_strategy = strat) + rcs <- writetfl:::resolve_col_specs(tbl) + cwr <- writetfl:::compute_col_widths( + rcs, tbl$data, content_width_in = 5, tbl, pg_width = 6, pg_height = 8.5, + margins = grid::unit(c(0.5, 0.5, 0.5, 0.5), "inches") + ) + vapply(cwr$resolved_cols, "[[", numeric(1L), "width_in") + } + w_first <- measure("wrap_first") + w_balan <- measure("balanced") + # Columns a and b end up much wider under balanced. + expect_gt(w_balan[[1]], w_first[[1]]) + expect_gt(w_balan[[2]], w_first[[2]]) + # Column c and d are unchanged (single unbreakable tokens, same min/natural). + expect_equal(w_balan[[3]], w_first[[3]], tolerance = 1e-3) + expect_equal(w_balan[[4]], w_first[[4]], tolerance = 1e-3) +}) + +test_that("balanced strategy is identical to wrap_first when table fits one page", { + # Small data that fits on a single page width: both strategies should + # arrive at the same widths. + df <- data.frame( + a = c("Alpha", "Beta", "Gamma"), + b = c(10L, 20L, 30L), + stringsAsFactors = FALSE + ) + measure <- function(strat) { + tbl <- tfl_table(df, col_split_strategy = strat) + rcs <- writetfl:::resolve_col_specs(tbl) + cwr <- writetfl:::compute_col_widths( + rcs, tbl$data, content_width_in = 5, tbl, pg_width = 6, pg_height = 8.5, + margins = grid::unit(c(0.5, 0.5, 0.5, 0.5), "inches") + ) + vapply(cwr$resolved_cols, "[[", numeric(1L), "width_in") + } + expect_equal(measure("balanced"), measure("wrap_first"), tolerance = 1e-6) +}) + +test_that("balanced strategy pins group columns at min width across pages", { + # 1 group col + 2 data cols, table page-splits. Group col width should + # equal its min (longest-unbreakable-token + h_pad) across both pages. + df <- data.frame( + grp = rep(c("G1", "G2"), each = 3), + a = rep(paste(rep("alpha", 8), collapse = " "), 6), + b = rep("unbreak_long_token_here_extra", 6), + stringsAsFactors = FALSE + ) + df <- dplyr::group_by(df, grp) + tbl <- tfl_table(df, col_split_strategy = "balanced") + rcs <- writetfl:::resolve_col_specs(tbl) + cwr <- writetfl:::compute_col_widths( + rcs, tbl$data, content_width_in = 5, tbl, pg_width = 6, pg_height = 8.5, + margins = grid::unit(c(0.5, 0.5, 0.5, 0.5), "inches") + ) + # Group col is column 1; its width_in should be the (smallest) per-page + # value, which is its min width (== max(min_col_width, "G1"/"G2" + pad)). + grp_w <- cwr$resolved_cols[[1]]$width_in + grp_min <- cwr$resolved_cols[[1]]$width_min_in %||% grp_w + expect_equal(grp_w, grp_min, tolerance = 1e-6) +}) + +# tfl_table arg validation --------------------------------------------------- + +test_that("tfl_table validates col_split_strategy", { + expect_error(tfl_table(data.frame(a = 1), col_split_strategy = "fancy"), + regexp = "col_split_strategy") + expect_no_error(tfl_table(data.frame(a = 1), col_split_strategy = "wrap_first")) + expect_no_error(tfl_table(data.frame(a = 1), col_split_strategy = "balanced")) +}) + +test_that("tfl_table validates row_overflow_max_retries (non-negative integer)", { + expect_error(tfl_table(data.frame(a = 1), row_overflow_max_retries = -1L), + regexp = "row_overflow_max_retries") + expect_error(tfl_table(data.frame(a = 1), + row_overflow_max_retries = c(1L, 2L)), + regexp = "row_overflow_max_retries") + expect_no_error(tfl_table(data.frame(a = 1), row_overflow_max_retries = 0L)) + expect_no_error(tfl_table(data.frame(a = 1), row_overflow_max_retries = 5L)) +}) + +# paginate_rows() collect_overflows mode -------------------------------------- + +test_that("paginate_rows(collect_overflows = TRUE) returns pages + overflow info instead of aborting", { + # 2-row data; force the first row's content to be very tall by feeding a + # cell_h_mat where row 1's cell height exceeds the page content height. + # The function should NOT abort under collect_overflows = TRUE. + data <- data.frame(a = c("x", "y"), stringsAsFactors = FALSE) + resolved_cols <- list(list(col = "a", label = "a", wrap = TRUE, + is_group_col = FALSE, width_in = 0.5)) + cell_h_mat <- matrix(c(20, 0.2), nrow = 2L, ncol = 1L) # row 1 = 20 in + res <- writetfl:::paginate_rows( + data, cell_h_mat, resolved_cols, group_vars = character(0L), + cont_row_h = 0.2, header_row_h = 0.3, content_height_in = 5, + row_cont_msg = c("(continued)", "(continued)"), group_rule = FALSE, + collect_overflows = TRUE + ) + expect_named(res, c("pages", "overflows")) + expect_gte(length(res$overflows), 1L) + expect_equal(res$overflows[[1L]]$row, 1L) + expect_equal(res$overflows[[1L]]$bottleneck_col, 1L) +}) + +test_that("row_overflow_max_retries = 0L disables the retry loop and errors immediately", { + # Single 8000-character cell forced into a narrow column - no number of + # retries will rescue it. + long_essay <- paste(rep(paste(rep("aa bb cc dd ee ff", 3), collapse = " "), + 150), + collapse = " ") + df <- data.frame(notes = long_essay, stringsAsFactors = FALSE) + tbl <- tfl_table( + df, + cols = list(tfl_colspec("notes", width = grid::unit(0.8, "inches"), + wrap = TRUE)), + row_overflow_max_retries = 0L + ) + f <- tempfile(fileext = ".pdf") + on.exit(unlink(f)) + expect_error( + export_tfl(tbl, file = f, pg_width = 4, pg_height = 8.5, + min_content_height = grid::unit(0.5, "inches")), + regexp = "exceeds the available page content height" + ) +}) + +test_that("row_overflow_max_retries > 0 still ultimately errors when content is genuinely too long", { + # Default retry cap (5L) - same impossible input. Retries widen the + # column but eventually exhaust without resolving; the final paginate_rows + # call goes through overflow_action = "error". + long_essay <- paste(rep(paste(rep("aa bb cc dd ee ff", 3), collapse = " "), + 150), + collapse = " ") + df <- data.frame(notes = long_essay, stringsAsFactors = FALSE) + tbl <- tfl_table( + df, + cols = list(tfl_colspec("notes", width = grid::unit(0.8, "inches"), + wrap = TRUE)), + row_overflow_max_retries = 5L + ) + f <- tempfile(fileext = ".pdf") + on.exit(unlink(f)) + expect_error( + export_tfl(tbl, file = f, pg_width = 4, pg_height = 8.5, + min_content_height = grid::unit(0.5, "inches")), + regexp = "exceeds the available page content height" + ) +}) diff --git a/vignettes/v02-tfl_table_intro.Rmd b/vignettes/v02-tfl_table_intro.Rmd index e9edf23..7570a25 100644 --- a/vignettes/v02-tfl_table_intro.Rmd +++ b/vignettes/v02-tfl_table_intro.Rmd @@ -668,6 +668,8 @@ see `vignette("v03-tfl_table_styling")`. | `wrap_cols` | `"auto"` | `"auto"` (auto-detect), `TRUE` (all data cols), `FALSE` (off), or character vector of column names | | `wrap_breaks` | `wrap_breaks()` | Break-character spec — defaults to whitespace; opt into `keep_before` chars like `-` or `/` | | `wrap_balance` | `"width"` | `"width"` (fast water-fill) or `"height"` (opt-in pass that lowers total table height) | +| `col_split_strategy` | `"balanced"` | `"balanced"` (page-split using min widths, then water-fill per page — more horizontal room per column on multi-page tables) or `"wrap_first"` (legacy: whole-table water-fill, then split using post-wrap widths) | +| `row_overflow_max_retries` | `5L` | When a row's wrapped height exceeds the page under `col_split_strategy = "balanced"`, raise the bottleneck column's minimum by 0.25 in and retry up to N times. `0L` disables the retry loop. | | `min_col_width` | `unit(0.5, "inches")` | Floor applied to auto-sized columns | | `allow_col_split` | `TRUE` | If `FALSE`, error when columns exceed page width | | `balance_col_pages` | `FALSE` | Redistribute columns evenly across column-split pages | diff --git a/vignettes/v03-tfl_table_styling.Rmd b/vignettes/v03-tfl_table_styling.Rmd index 93a7ace..f58c15d 100644 --- a/vignettes/v03-tfl_table_styling.Rmd +++ b/vignettes/v03-tfl_table_styling.Rmd @@ -598,6 +598,44 @@ Use the height pass when: The default stays `"width"` because it's deterministic, fast, visually tidy, and produces good results on most clinical TFLs. +### Balanced split-then-wrap — `col_split_strategy` + +When a table is *too wide* for a single page, two interacting decisions +have to be made: which columns go on which page (page-column-split) and +how aggressively each remaining column should wrap. The default +`col_split_strategy = "balanced"` does the page-split first using each +column's **minimum** survivable width, then water-fills each page's +columns within that page's actual horizontal slack. The legacy +`"wrap_first"` strategy does whole-table wrapping first and uses the +resulting widths to drive the split. + +The practical difference shows up on multi-page tables: + +| Strategy | Page-1 column behavior | +|---|---| +| `"balanced"` (default) | Each column gets the slack of the page it lands on. Strings on a 3-column page have ~3× more room than under `"wrap_first"`. | +| `"wrap_first"` | Every column is wrapped tightly enough to fit *all* columns on one page width, even though most end up on different pages anyway. | + +```{r col-split-strategy, eval = FALSE} +# Default: balanced. Per-page widths. +tbl <- tfl_table(wide_clinical_df) + +# Legacy: whole-table wrap first. +tbl <- tfl_table(wide_clinical_df, col_split_strategy = "wrap_first") +``` + +Group columns are pinned at their minimum width on every page under +`"balanced"` (data columns absorb the per-page slack). Together with the +HTML-`rowspan`-style group-label flow this keeps group cells compact +without wasting page real estate. + +When a row's wrapped height genuinely exceeds the page, the balanced +strategy iterates: it raises the bottleneck column's minimum by 0.25 +inches and re-runs the width pipeline, up to +`row_overflow_max_retries` times (default `5L`). After the cap the +existing `overflow_action` path fires. Set +`row_overflow_max_retries = 0L` to disable the loop entirely. + ### Visual gap between adjacent multi-line cells — `wrap_extra_padding` When two consecutive rows both contain wrapped (multi-line) cells, the diff --git a/vignettes/v04-troubleshooting.Rmd b/vignettes/v04-troubleshooting.Rmd index 05b1858..d392380 100644 --- a/vignettes/v04-troubleshooting.Rmd +++ b/vignettes/v04-troubleshooting.Rmd @@ -107,12 +107,13 @@ you see what the renderer was forced to clip. **Error:** `Row N of the table wraps to a height (X.XX in) that exceeds the available page content height (Y.YY in)…` A row's content wrapped to more vertical space than fits on a single -page. Almost always caused by an outsize cell (long narrative pasted -into a column whose width is set very narrow, an unbroken token forced -to wrap to many lines, etc.). No amount of pagination can rescue this -case — text-wrap can't make the row shorter than its content allows, and -splitting it across pages would leave words mid-sentence at the page -break. +page. Under the default `col_split_strategy = "balanced"` the package +will retry up to `row_overflow_max_retries` times (default `5L`), +raising the offending column's minimum width by 0.25 inches each +retry. The error appears only when retries are exhausted — at that +point an outsize cell (long narrative pasted into a column whose +fixed width is too narrow, an unbroken token forced to wrap to many +lines, etc.) is the most likely cause. **Solutions (in order of preference):** @@ -120,11 +121,17 @@ break. the right thing to fix here). - Widen the column so the cell wraps to fewer lines. - Increase `pg_height` so a multi-line row fits. +- Increase `row_overflow_max_retries` to give the retry loop more + iterations to find a wider column. Diminishing returns past ~10. - For diagnosis only, pass `overflow_action = "warn"` to `export_tfl()` to see what the renderer would produce. The output PDF will be visibly broken (text clipped at the page edge), but it tells you exactly which row was responsible. +To bypass the retry loop entirely (and get the pre-issue-#35 behaviour +where the first row-overflow signals immediately), set +`row_overflow_max_retries = 0L`. + ```r export_tfl(tbl, file = "out.pdf", overflow_action = "warn") #> Warning: Row 3 of the table wraps to a height (12.4 in) that exceeds