Skip to content

Add Reporting Information to print/summary.epi_df#691

Merged
brookslogan merged 16 commits intodevfrom
jmr/print.epi_df
Apr 20, 2026
Merged

Add Reporting Information to print/summary.epi_df#691
brookslogan merged 16 commits intodevfrom
jmr/print.epi_df

Conversation

@JavierMtzRdz
Copy link
Copy Markdown
Contributor

@JavierMtzRdz JavierMtzRdz commented Feb 28, 2026

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • Styling and documentation checks. Make a PR comment with:
    • /style to check the style and fix any issues.
    • /document to check the package documentation and fix any issues.
    • /preview-docs to preview the docs.
    • See Actions GitHub tab to track progress of these commands.
  • See DEVELOPMENT.md for more information on the development
    process.

Add Reporting Information to print/summary.epi_df

The print.epi_df method now provides a signal-level latency report that calculates reporting lags relative to the as_of metadata. Signals with notable latencies are marked with an alert flag, and the output specifically identifies any lagging keys. For objects with many signals, the output is now truncated with a summary of the remaining variables to preserve readability.

The summary.epi_df method has been expanded to include a regularity analysis, as mentioned in #688. This feature identifies whether the minimum and maximum time values are even or uneven across all epikeys. The summary also diagnoses implicit and explicit gaps.

I included the time analysis in summary.epi_df because print.epi_df was becoming too long. They are provided as helper functions in case we want to move them around. Though it may be more efficient to combine them if they are put together.

  • musing: Using cli to print those summaries could improve their appearance.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Examples:

# Demonstrating Improved print.epi_df and summary.epi_df reporting
pkgload::load_all(".")
#> ℹ Loading epiprocess
#> Loading required package: epidatasets
#> 
#> Registered S3 method overwritten by 'tsibble':
#>   method               from 
#>   as_tibble.grouped_df dplyr
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.5.2
#> 
#> Attaching package: 'dplyr'
#> 
#> The following object is masked from 'package:epiprocess':
#> 
#>     between
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Setup basic parameters
start_date <- as.Date("2024-01-01")
as_of_date <- as.Date("2024-01-15")

# Standard clean data
(case1 <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = c(start_date + 0:5, start_date + 0:5),
  value = 1:12
) %>% as_epi_df(as_of = as_of_date))
#> An `epi_df` object, 12 x 3 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-15
#> Latency info:
#> * value: lag 9 days (max time 2024-01-06) (!)
#> (!): notable latency (lag > 7 days, lagging keys, or deviates from mode by > 7 days)
#> 
#> # A tibble: 12 × 3
#>    geo_value time_value value
#>  * <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02     2
#>  3 ca        2024-01-03     3
#>  4 ca        2024-01-04     4
#>  5 ca        2024-01-05     5
#>  6 ca        2024-01-06     6
#>  7 hi        2024-01-01     7
#>  8 hi        2024-01-02     8
#>  9 hi        2024-01-03     9
#> 10 hi        2024-01-04    10
#> 11 hi        2024-01-05    11
#> 12 hi        2024-01-06    12

summary(case1)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-15
#> ----------
#> * min time value              = 2024-01-01 (even across epikeys)
#> * max time value              = 2024-01-06 (even across epikeys)
#> * time gaps                   = none detected
#> * average rows per time value = 2


# Uneven coverage and gaps
(edf_uneven <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = c(start_date + 0:5, start_date + 0:5),
  value = 1:12
))
#> # A tibble: 12 × 3
#>    geo_value time_value value
#>    <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02     2
#>  3 ca        2024-01-03     3
#>  4 ca        2024-01-04     4
#>  5 ca        2024-01-05     5
#>  6 ca        2024-01-06     6
#>  7 hi        2024-01-01     7
#>  8 hi        2024-01-02     8
#>  9 hi        2024-01-03     9
#> 10 hi        2024-01-04    10
#> 11 hi        2024-01-05    11
#> 12 hi        2024-01-06    12
edf_uneven <- edf_uneven[-7, ] # 'hi' starts at day 2 (uneven min)
edf_uneven <- edf_uneven[-3, ] # 'ca' missing day 3 (implicit gap)

(case2 <- as_epi_df(edf_uneven, as_of = as_of_date))
#> An `epi_df` object, 10 x 3 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-15
#> Latency info:
#> * value: lag 9 days (max time 2024-01-06) (!)
#> (!): notable latency (lag > 7 days, lagging keys, or deviates from mode by > 7 days)
#> 
#> # A tibble: 10 × 3
#>    geo_value time_value value
#>  * <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02     2
#>  3 ca        2024-01-04     4
#>  4 ca        2024-01-05     5
#>  5 ca        2024-01-06     6
#>  6 hi        2024-01-02     8
#>  7 hi        2024-01-03     9
#>  8 hi        2024-01-04    10
#>  9 hi        2024-01-05    11
#> 10 hi        2024-01-06    12
summary(case2)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-15
#> ----------
#> * min time value              = 2024-01-01 (uneven across epikeys)
#> * max time value              = 2024-01-06 (even across epikeys)
#> * time gaps                   = implicit (in 1/2 epikeys)
#> * average rows per time value = 1


# Explicit gaps and lagging
edf_lags <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = c(start_date + 0:5, start_date + 0:5),
  value = 1:12
)
edf_lags$value[12] <- NA # 'hi' ends at day 5 (lagging key vs ca)
edf_lags$value[2] <- NA # Explicit NA row for 'ca' at day 2

(case3 <- as_epi_df(edf_lags, as_of = start_date + 7) )
#> An `epi_df` object, 12 x 3 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-08
#> Latency info:
#> * value: lag 2 days (max time 2024-01-06); lagging keys: hi (!)
#> (!): notable latency (lag > 7 days, lagging keys, or deviates from mode by > 7 days)
#> 
#> # A tibble: 12 × 3
#>    geo_value time_value value
#>  * <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02    NA
#>  3 ca        2024-01-03     3
#>  4 ca        2024-01-04     4
#>  5 ca        2024-01-05     5
#>  6 ca        2024-01-06     6
#>  7 hi        2024-01-01     7
#>  8 hi        2024-01-02     8
#>  9 hi        2024-01-03     9
#> 10 hi        2024-01-04    10
#> 11 hi        2024-01-05    11
#> 12 hi        2024-01-06    NA
summary(case3)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-08
#> ----------
#> * min time value              = 2024-01-01 (even across epikeys)
#> * max time value              = 2024-01-06 (uneven across epikeys)
#> * time gaps                   = explicit (in 2/2 epikeys)
#> * average rows per time value = 2


# Many signals
df_many <- tibble(geo_value = "ca", time_value = start_date + 0:5)
for (i in 1:10) {
  df_many[[paste0("sig", i)]] <- 1:6
}
df_many$sig5[5:6] <- NA

(case4 <- as_epi_df(df_many, as_of = start_date + 7))
#> An `epi_df` object, 6 x 12 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-08
#> Latency info:
#> * sig1: lag 2 days (max time 2024-01-06)
#> * sig2: lag 2 days (max time 2024-01-06)
#> * sig3: lag 2 days (max time 2024-01-06)
#> * ... and 7 other signals
#> 
#> # A tibble: 6 × 12
#>   geo_value time_value  sig1  sig2  sig3  sig4  sig5  sig6  sig7  sig8  sig9
#> * <chr>     <date>     <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 ca        2024-01-01     1     1     1     1     1     1     1     1     1
#> 2 ca        2024-01-02     2     2     2     2     2     2     2     2     2
#> 3 ca        2024-01-03     3     3     3     3     3     3     3     3     3
#> 4 ca        2024-01-04     4     4     4     4     4     4     4     4     4
#> 5 ca        2024-01-05     5     5     5     5    NA     5     5     5     5
#> 6 ca        2024-01-06     6     6     6     6    NA     6     6     6     6
#> # ℹ 1 more variable: sig10 <int>
summary(case4)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-08
#> ----------
#> * min time value              = 2024-01-01 (even across epikeys)
#> * max time value              = 2024-01-06 (even across epikeys)
#> * time gaps                   = none detected
#> * average rows per time value = 1

Created on 2026-02-27 with reprex v2.1.1

@JavierMtzRdz
Copy link
Copy Markdown
Contributor Author

/style

@JavierMtzRdz
Copy link
Copy Markdown
Contributor Author

/document

@JavierMtzRdz
Copy link
Copy Markdown
Contributor Author

/preview-docs

@github-actions
Copy link
Copy Markdown

@JavierMtzRdz JavierMtzRdz marked this pull request as ready for review March 2, 2026 15:00
Copy link
Copy Markdown
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the test cases! This is looking generally good. I just want to try to make this a bit quicker to understand.

Comment thread .github/workflows/pr-commands.yaml
Comment thread R/methods-epi_df.R Outdated
Comment thread R/methods-epi_df.R Outdated
max_even <- length(unique(smry$max_t[!is.na(smry$max_t)])) <= 1

min_desc <- if (min_even) "even across epikeys" else "uneven across epikeys"
max_desc <- if (max_even) "even across epikeys" else "uneven across epikeys"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: "even" didn't first parse as an adjective for me
issue: I don't think we have user-facing definitions of "epikey", and it isn't standard in the community
issue: this could be misleading if we have differing time ranges for different signals.

suggestion: either
(a) change this to be about epikey x signal combos, and the messages to "same for every time series", "but some of the time series start later", "but some of the time series end earlier"
(b) re-use some of the by-signal latency information added to print.epi_df

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) Done! It is worth noting that I used epikey in accordance with revision_analysis, which prints this term. Should we remove it from there as well?
(b) Done! Also, the by-signal latency information was moved here.

Copy link
Copy Markdown
Contributor

@brookslogan brookslogan Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) Yes, we probably should. (File an Issue?)

Comment thread R/methods-epi_df.R Outdated
Comment thread R/methods-epi_df.R Outdated
Comment thread R/methods-epi_df.R Outdated
Copy link
Copy Markdown
Contributor Author

@JavierMtzRdz JavierMtzRdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have attended the previous issues. Since we provide latency summaries in both summary and print, I moved all the key combination x signal calculations in epi_ts_range. To address the potential problem with empty time series and missing signal columns, I returned calculations for all the rows, but added a column indicating that it is empty.

The remaining print latency is handled in print_latency_info as a result.

As the signal latency was moved to the summary, summary_time_latency reuses the epi_ts_range output of time range, latency, and gap information. Each message for the sections is handled within their respective functions for clarity.

Finally, I reused time_delta_to_n_steps to calculate latency. However, since as_of does not necessarily fall on the same day when using time_type = "week", I can get non-integer results. To prevent such issues, I added a require_integer parameter to time_delta_to_n_steps.

Below is a detailed list of examples.

# Demonstrating Improved print.epi_df and summary.epi_df reporting
pkgload::load_all(".")
#> ℹ Loading epiprocess
#> Loading required package: epidatasets
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.5.2
#> 
#> Attaching package: 'dplyr'
#> 
#> The following object is masked from 'package:epiprocess':
#> 
#>     between
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
#> Warning: package 'tibble' was built under R version 4.5.2

# Setup basic parameters
start_date <- as.Date("2024-01-01")
as_of_date <- as.Date("2024-01-15")

# Standard clean data ---
## No signal
(case0 <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = rep(start_date + 0:5, 2),
) %>% as_epi_df(as_of = as_of_date))
#> An `epi_df` object, 12 x 2 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-15
#> Latency (lag from as_of to latest observation by time series):
#> * No time series detected
#> # A tibble: 12 × 2
#>    geo_value time_value
#>  * <chr>     <date>    
#>  1 ca        2024-01-01
#>  2 ca        2024-01-02
#>  3 ca        2024-01-03
#>  4 ca        2024-01-04
#>  5 ca        2024-01-05
#>  6 ca        2024-01-06
#>  7 hi        2024-01-01
#>  8 hi        2024-01-02
#>  9 hi        2024-01-03
#> 10 hi        2024-01-04
#> 11 hi        2024-01-05
#> 12 hi        2024-01-06

summary(case0)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-15
#> ----------
#> Time range:
#> * min time value              = 2024-01-01
#> * max time value              = 2024-01-06
#> Gaps:
#> * time gaps                   = none detected
#> * average rows per time value = 2.00
#> Latency (lag from as_of to latest observation by time series):
#> * No time series detected

## Standard
(case1 <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = rep(start_date + 0:5, 2),
  value = 1:12
) %>% as_epi_df(as_of = as_of_date))
#> An `epi_df` object, 12 x 3 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-15
#> Latency (lag from as_of to latest observation by time series):
#> * lag  = 9 days
#> 
#> # A tibble: 12 × 3
#>    geo_value time_value value
#>  * <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02     2
#>  3 ca        2024-01-03     3
#>  4 ca        2024-01-04     4
#>  5 ca        2024-01-05     5
#>  6 ca        2024-01-06     6
#>  7 hi        2024-01-01     7
#>  8 hi        2024-01-02     8
#>  9 hi        2024-01-03     9
#> 10 hi        2024-01-04    10
#> 11 hi        2024-01-05    11
#> 12 hi        2024-01-06    12

summary(case1)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-15
#> ----------
#> Time range:
#> * min time value              = 2024-01-01 (same for every time series)
#> * max time value              = 2024-01-06 (same for every time series)
#> Gaps:
#> * time gaps                   = none detected
#> * average rows per time value = 2.00
#> Latency (lag from as_of to latest observation by time series):
#> * value: lag 9 days (max time 2024-01-06) (!)
#> (!): notable latency (lag > 7 days)

## all NA signal
(case1.5 <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = rep(start_date + 0:5, 2),
  value = 1:12,
  value2 = NA
) %>% as_epi_df(as_of = as_of_date))
#> An `epi_df` object, 12 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-15
#> Latency (lag from as_of to latest observation by time series):
#> * lag  = 9 days
#> * Empty time series detected
#> # A tibble: 12 × 4
#>    geo_value time_value value value2
#>  * <chr>     <date>     <int> <lgl> 
#>  1 ca        2024-01-01     1 NA    
#>  2 ca        2024-01-02     2 NA    
#>  3 ca        2024-01-03     3 NA    
#>  4 ca        2024-01-04     4 NA    
#>  5 ca        2024-01-05     5 NA    
#>  6 ca        2024-01-06     6 NA    
#>  7 hi        2024-01-01     7 NA    
#>  8 hi        2024-01-02     8 NA    
#>  9 hi        2024-01-03     9 NA    
#> 10 hi        2024-01-04    10 NA    
#> 11 hi        2024-01-05    11 NA    
#> 12 hi        2024-01-06    12 NA

summary(case1.5)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-15
#> ----------
#> Time range:
#> * min time value              = 2024-01-01 (same for every time series)
#> * max time value              = 2024-01-06 (same for every time series)
#> Gaps:
#> * time gaps                   = none detected
#> * average rows per time value = 2.00
#> Latency (lag from as_of to latest observation by time series):
#> * value: lag 9 days (max time 2024-01-06) (!)
#> * value2: all NA
#> (!): notable latency (lag > 7 days)

# Integer time indices ---
(case2 <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = rep(100:105, 2),
  value = 1:12
) %>% as_epi_df(as_of = 110))
#> An `epi_df` object, 12 x 3 with metadata:
#> * geo_type  = state
#> * time_type = integer
#> * as_of     = 110
#> Latency (lag from as_of to latest observation by time series):
#> * lag  = 5
#> 
#> # A tibble: 12 × 3
#>    geo_value time_value value
#>  * <chr>          <int> <int>
#>  1 ca               100     1
#>  2 ca               101     2
#>  3 ca               102     3
#>  4 ca               103     4
#>  5 ca               104     5
#>  6 ca               105     6
#>  7 hi               100     7
#>  8 hi               101     8
#>  9 hi               102     9
#> 10 hi               103    10
#> 11 hi               104    11
#> 12 hi               105    12

summary(case2)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 110
#> ----------
#> Time range:
#> * min time value              = 100 (same for every time series)
#> * max time value              = 105 (same for every time series)
#> Gaps:
#> * time gaps                   = none detected
#> * average rows per time value = 2.00
#> Latency (lag from as_of to latest observation by time series):
#> * value: lag 5 (max time 105) (!)
#> (!): notable latency (lag > 2 )

# Other Keys  ---
(case3 <- tibble(
  geo_value = rep(c("ca", "hi"), each = 4),
  age_group = rep(rep(c("0-17", "18+"), each = 2), 2),
  time_value = rep(c(start_date, start_date + 1), 4),
  value = 1:8
) %>% as_epi_df(as_of = as_of_date, other_keys = "age_group"))
#> An `epi_df` object, 8 x 4 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * other_keys = age_group
#> * as_of     = 2024-01-15
#> Latency (lag from as_of to latest observation by time series):
#> * lag  = 13 days
#> 
#> # A tibble: 8 × 4
#>   geo_value age_group time_value value
#> * <chr>     <chr>     <date>     <int>
#> 1 ca        0-17      2024-01-01     1
#> 2 ca        0-17      2024-01-02     2
#> 3 ca        18+       2024-01-01     3
#> 4 ca        18+       2024-01-02     4
#> 5 hi        0-17      2024-01-01     5
#> 6 hi        0-17      2024-01-02     6
#> 7 hi        18+       2024-01-01     7
#> 8 hi        18+       2024-01-02     8

summary(case3)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * other_keys = age_group
#> * as_of     = 2024-01-15
#> ----------
#> Time range:
#> * min time value              = 2024-01-01 (same for every time series)
#> * max time value              = 2024-01-02 (same for every time series)
#> Gaps:
#> * time gaps                   = none detected
#> * average rows per time value = 4.00
#> Latency (lag from as_of to latest observation by time series):
#> * value: lag 13 days (max time 2024-01-02) (!)
#> (!): notable latency (lag > 7 days)

# Late start and implicit gaps ---
edf_base <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = rep(start_date + 0:5, 2),
  value = 1:12
)

edf_uneven <- edf_base[-c(3, 7), ]
# 'hi' starts late
# 'ca' missing day 2

(case4 <- as_epi_df(edf_uneven, as_of = as_of_date))
#> An `epi_df` object, 10 x 3 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-15
#> Latency (lag from as_of to latest observation by time series):
#> * lag  = 9 days
#> 
#> # A tibble: 10 × 3
#>    geo_value time_value value
#>  * <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02     2
#>  3 ca        2024-01-04     4
#>  4 ca        2024-01-05     5
#>  5 ca        2024-01-06     6
#>  6 hi        2024-01-02     8
#>  7 hi        2024-01-03     9
#>  8 hi        2024-01-04    10
#>  9 hi        2024-01-05    11
#> 10 hi        2024-01-06    12
summary(case4)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-15
#> ----------
#> Time range:
#> * min time value              = 2024-01-01 (but some time series start later)
#> * max time value              = 2024-01-06 (same for every time series)
#> Gaps:
#> * implicit (missing rows in 1/2 key combinations, affecting 1 signal)
#> * average rows per time value = 1.67
#> Latency (lag from as_of to latest observation by time series):
#> * value: lag 9 days (max time 2024-01-06) (!)
#> (!): notable latency (lag > 7 days)

# Explicit NAs ---
edf_lags <- tibble(
  geo_value = rep(c("ca", "hi"), each = 6),
  time_value = rep(start_date + 0:5, 2),
  value = 1:12
)
edf_lags$value[2] <- NA  # 'ca' gap at Jan 2
edf_lags$value[12] <- NA # 'hi' missing Jan 6 (lag)

(case5 <- as_epi_df(edf_lags, as_of = start_date + 7))
#> An `epi_df` object, 12 x 3 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-08
#> Latency (lag from as_of to latest observation by time series):
#> * lag  = 2–3 days
#> 
#> # A tibble: 12 × 3
#>    geo_value time_value value
#>  * <chr>     <date>     <int>
#>  1 ca        2024-01-01     1
#>  2 ca        2024-01-02    NA
#>  3 ca        2024-01-03     3
#>  4 ca        2024-01-04     4
#>  5 ca        2024-01-05     5
#>  6 ca        2024-01-06     6
#>  7 hi        2024-01-01     7
#>  8 hi        2024-01-02     8
#>  9 hi        2024-01-03     9
#> 10 hi        2024-01-04    10
#> 11 hi        2024-01-05    11
#> 12 hi        2024-01-06    NA
summary(case5)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-08
#> ----------
#> Time range:
#> * min time value              = 2024-01-01 (same for every time series)
#> * max time value              = 2024-01-06 (but some time series end earlier)
#> Gaps:
#> * explicit (non-lag NAs in 1/2 key combinations, affecting 1 signal)
#> * average rows per time value = 2.00
#> Latency (lag from as_of to latest observation by time series):
#> * value: lag 2–3 days (max time 2024-01-06); lagging keys: hi (!)
#> (!): notable latency (lagging keys)

# Many signals and multivariate lags ---
df_many <- tibble(geo_value = "ca", time_value = start_date + 0:9)
for (i in 1:10) {
  df_many[[paste0("sig", i)]] <- 1:10
}
# sig5 lags by 2 days, sig7 has an internal hole
df_many$sig5[9:10] <- NA
df_many$sig7[4:6] <- NA

(case6 <- as_epi_df(df_many, as_of = start_date + 12))
#> An `epi_df` object, 10 x 12 with metadata:
#> * geo_type  = state
#> * time_type = day
#> * as_of     = 2024-01-13
#> Latency (lag from as_of to latest observation by time series):
#> * lag across all time series = 3–5 days
#> 
#> # A tibble: 10 × 12
#>    geo_value time_value  sig1  sig2  sig3  sig4  sig5  sig6  sig7  sig8  sig9
#>  * <chr>     <date>     <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1 ca        2024-01-01     1     1     1     1     1     1     1     1     1
#>  2 ca        2024-01-02     2     2     2     2     2     2     2     2     2
#>  3 ca        2024-01-03     3     3     3     3     3     3     3     3     3
#>  4 ca        2024-01-04     4     4     4     4     4     4    NA     4     4
#>  5 ca        2024-01-05     5     5     5     5     5     5    NA     5     5
#>  6 ca        2024-01-06     6     6     6     6     6     6    NA     6     6
#>  7 ca        2024-01-07     7     7     7     7     7     7     7     7     7
#>  8 ca        2024-01-08     8     8     8     8     8     8     8     8     8
#>  9 ca        2024-01-09     9     9     9     9    NA     9     9     9     9
#> 10 ca        2024-01-10    10    10    10    10    NA    10    10    10    10
#> # ℹ 1 more variable: sig10 <int>
summary(case6)
#> An `epi_df` x, with metadata:
#> * geo_type  = state
#> * as_of     = 2024-01-13
#> ----------
#> Time range:
#> * min time value              = 2024-01-01 (same for every time series)
#> * max time value              = 2024-01-10 (but some time series end earlier)
#> Gaps:
#> * explicit (non-lag NAs in 1/1 key combinations, affecting 1 signal)
#> * average rows per time value = 1.00
#> Latency (lag from as_of to latest observation by time series):
#> * sig1: lag 3 days (max time 2024-01-10)
#> * sig2: lag 3 days (max time 2024-01-10)
#> * sig3: lag 3 days (max time 2024-01-10)
#> * sig4: lag 3 days (max time 2024-01-10)
#> * sig5: lag 5 days (max time 2024-01-08)
#> * sig6: lag 3 days (max time 2024-01-10)
#> * sig7: lag 3 days (max time 2024-01-10)
#> * sig8: lag 3 days (max time 2024-01-10)
#> * ... and 2 other signals

Created on 2026-04-02 with reprex v2.1.1

Comment thread R/methods-epi_df.R Outdated
Comment thread R/methods-epi_df.R Outdated
max_even <- length(unique(smry$max_t[!is.na(smry$max_t)])) <= 1

min_desc <- if (min_even) "even across epikeys" else "uneven across epikeys"
max_desc <- if (max_even) "even across epikeys" else "uneven across epikeys"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) Done! It is worth noting that I used epikey in accordance with revision_analysis, which prints this term. Should we remove it from there as well?
(b) Done! Also, the by-signal latency information was moved here.

Comment thread R/methods-epi_df.R Outdated
@JavierMtzRdz JavierMtzRdz requested a review from brookslogan April 2, 2026 21:16
@brookslogan brookslogan merged commit 3263371 into dev Apr 20, 2026
3 checks passed
@brookslogan brookslogan deleted the jmr/print.epi_df branch April 20, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In print.epi_df: add notes about even/uneven min and max time_value by epikey, whether there are gaps, implicit or explicit

2 participants