feat: add yeo-johnson step by dshemetov · Pull Request #179 · cmu-delphi/exploration-tooling

dshemetov · 2025-03-11T20:09:21Z

the lambda parameter fitting is done via 0th order optimization i ripped from recipes::yeo_johnson

TODO

layer skeleton in place
write inverse yj transform
tests
slather.layer_epi_YeoJohnson edge cases (see the TODO comment in the function... I'm not clear on what to expect from the terms interface in a layer)
get feedback
PR into epipredict feat: add step_/layer_ epi_YeoJohnson epipredict#451

dajmcdon

I made it partway. It's looking good so far!

dajmcdon · 2025-03-19T14:36:34Z

+  ...,
+  role = NA,
+  trained = FALSE,
+  lambdas = NULL,


If NULL is the only possible value here, then it should be removed as an arg to this function. (You can set it to NULL in the output and keep it in the constructor function below.)

dajmcdon · 2025-03-19T14:39:15Z

+  limits = c(-5, 5),
+  num_unique = 5,
+  na_rm = TRUE,
+  epi_keys_checked = NULL,


What are these used for? Probably similar to lambdas that if the user can't set them, they shouldn't be exposed.

Is this intended to allow for other epikeys beside geo_value? Grouping across them? If so, the name and default should match other step_epi_*() functions.

They're exposed in the recipes step this is based on. Not sure why that exposes λ if you're not supposed to set it https://recipes.tidymodels.org/reference/step_YeoJohnson.html

(I think David's comment was meant for the lambdas = NULL comment above?

But yea,

lambdas = NULL, limits = [c](https://rdrr.io/r/base/c.html)(-5, 5), num_unique = 5, na_rm = TRUE,

are copied from the original step. I don't seriously expect users to want to overwrite with their own lambdas though, so happy to remove that, if we don't care to keep the interface the same.)

epi_keys_checked ... I just lazily copied from step_adjust_ahead for an easy way to get access to recipes$template in prep and bake. I don't really anticipate grouping by anything other than geo_value + other_keys, so I'll remove it and find the epikeys another way.

dsweber2 · 2025-03-20T21:57:10Z

+filtered_data %>%
+  mutate(cases = log(cases)) %>%
+  ggplot(aes(time_value, cases)) +
+  geom_line(color = "blue") +
+  geom_line(data = out1 %>% mutate(cases = log(cases)),
+            aes(time_value, cases), color = "green") +
+  geom_line(data = out2 %>% mutate(cases = log(cases)),
+            aes(time_value, cases), color = "red") +
+  facet_wrap(~geo_value, scales = "free_y") +
+  theme_minimal() +
+  labs(title = "Yeo-Johnson transformation", x = "Time", y = "Log Cases")


Suggested change

filtered_data %>%

mutate(cases = log(cases)) %>%

ggplot(aes(time_value, cases)) +

geom_line(color = "blue") +

geom_line(data = out1 %>% mutate(cases = log(cases)),

aes(time_value, cases), color = "green") +

geom_line(data = out2 %>% mutate(cases = log(cases)),

aes(time_value, cases), color = "red") +

facet_wrap(~geo_value, scales = "free_y") +

theme_minimal() +

labs(title = "Yeo-Johnson transformation", x = "Time", y = "Log Cases")

all_together <- rbind(

filtered_data %>%

mutate(name = "raw"),

out1 %>% mutate(name = "yeo-johnson"),

out2 %>% mutate(name = "quarter-root")

)

all_together %>%

ggplot(aes(time_value, cases, color = name)) +

geom_line() +

facet_grid(~geo_value, scales = "free_y") +

theme_minimal() +

labs(title = "Yeo-Johnson transformation", x = "Time", y = "Log Cases") +

scale_y_log10()

This will generate an actual legend and makes toggling log-scale easier.

As for the result of the transform, the difference in scale between NY and CA is a bit confusing tbh. Not sure why NY is scaled so much more aggressively; is it because of the literal actual zero? For practical purposes we'd probably want to smooth this dataset anyways

also the blip where it goes literally negative in CA in early July is a bit concerning

looked into it, it's an actual negative value in the raw signal

filtered_data %>% filter(geo_value == "ca", time_value == "2021-06-29") An `epi_df` object, 1 x 3 with metadata: * geo_type = state * time_type = day * as_of = 2024-03-20 # A tibble: 1 × 3 geo_value time_value cases <chr> <date> <dbl> 1 ca 2021-06-29 -3940 > out1 %>% filter(geo_value == "ca", time_value == "2021-06-29") An `epi_df` object, 1 x 3 with metadata: * geo_type = state * time_type = day * as_of = 2024-03-20 # A tibble: 1 × 3 geo_value time_value cases <chr> <date> <dbl> 1 ca 2021-06-29 -11320. > out2 %>% filter(geo_value == "ca", time_value == "2021-06-29") An `epi_df` object, 1 x 3 with metadata: * geo_type = state * time_type = day * as_of = 2024-03-20 # A tibble: 1 × 3 geo_value time_value cases <chr> <date> <dbl> 1 ca 2021-06-29 NaN

Yea it's a huge data anomaly. I wonder if we should fix this test dataset. On the one hand, it's educational of data reality, on the other hand, what the hell.

Yeah I take as a demo that it will do ~ the right thing in the presence of both positive and negative values, just wanted to check it was legit and not an artifact of implementation problems.

The difference in scaling for NY & CA is a bit confusing. I forget exactly what the optimization routine is minimizing in its choice of lambda; iirc it's literally the variance? Probably worth noting in the description of the step

* step and layer work with a single outcome and layer_yj(.pred) * need to work on multiple outcomes case

Co-authored-by: Daniel McDonald <dajmcdon@gmail.com>

Co-authored-by: David Weber <david.weber2@pm.me>

dshemetov · 2025-03-31T19:47:13Z

Replaced with cmu-delphi/epipredict#451

dsweber2 assigned dshemetov Mar 11, 2025

dshemetov force-pushed the step branch from 24972d2 to bf19c91 Compare March 14, 2025 18:32

dshemetov marked this pull request as ready for review March 14, 2025 20:58

dshemetov requested a review from dsweber2 March 15, 2025 00:56

dshemetov force-pushed the step branch 2 times, most recently from 864474b to c7575a7 Compare March 17, 2025 22:44

dshemetov changed the title ~~wip: add yeo-johnson~~ feat: add yeo-johnson Mar 17, 2025

dshemetov changed the title ~~feat: add yeo-johnson~~ feat: add yeo-johnson step Mar 17, 2025

dshemetov force-pushed the step branch from c7575a7 to ce3b19d Compare March 17, 2025 22:49

dshemetov requested a review from dajmcdon March 18, 2025 00:24

dshemetov force-pushed the step branch from 4213b67 to 959ef4b Compare March 18, 2025 01:07

dajmcdon reviewed Mar 19, 2025

View reviewed changes

dsweber2 reviewed Mar 20, 2025

View reviewed changes

Comment thread tests/testthat/test-yeo-johnson.R Outdated

dsweber2 reviewed Mar 20, 2025

View reviewed changes

Comment thread tests/testthat/test-yeo-johnson.R Outdated

dshemetov force-pushed the step branch from 542dab1 to 48f0100 Compare March 21, 2025 00:52

dshemetov and others added 14 commits March 20, 2025 19:22

feat: add yeo-johnson

67c82a9

* step and layer work with a single outcome and layer_yj(.pred) * need to work on multiple outcomes case

Update R/new_epipredict_steps/step_yeo_johnson.R

0319acc

Co-authored-by: Daniel McDonald <dajmcdon@gmail.com>

Update R/new_epipredict_steps/step_yeo_johnson.R

8d029c2

Co-authored-by: Daniel McDonald <dajmcdon@gmail.com>

Update R/new_epipredict_steps/step_yeo_johnson.R

cb6b431

Co-authored-by: Daniel McDonald <dajmcdon@gmail.com>

fix: temp columns lambda_ -> .lambda_

415d99a

fix: remove epi_keys_checked

ca2df2f

Update R/new_epipredict_steps/step_yeo_johnson.R

17fac6a

Co-authored-by: David Weber <david.weber2@pm.me>

Update test-yeo-johnson.Rmd

a14c932

Co-authored-by: David Weber <david.weber2@pm.me>

Update tests/testthat/test-yeo-johnson.R

66a15ef

Co-authored-by: David Weber <david.weber2@pm.me>

Update R/new_epipredict_steps/layer_yeo_johnson.R

80c5cd0

Co-authored-by: David Weber <david.weber2@pm.me>

Update tests/testthat/test-yeo-johnson.R

cd87b0b

Co-authored-by: David Weber <david.weber2@pm.me>

Update tests/testthat/test-yeo-johnson.R

88cb475

Co-authored-by: David Weber <david.weber2@pm.me>

merge

8f200f3

test: inverse transform with multiple outcomes works

b565598

dshemetov force-pushed the step branch from 345e1ed to b565598 Compare March 21, 2025 02:22

dshemetov added 2 commits March 20, 2025 19:41

ci+test: update r version, remove snap

d579ef8

fix: tests

28c8471

dsweber2 reviewed Mar 21, 2025

View reviewed changes

Comment thread R/new_epipredict_steps/step_yeo_johnson.R

dsweber2 reviewed Mar 21, 2025

View reviewed changes

Comment thread R/new_epipredict_steps/step_yeo_johnson.R Outdated

doc+fix+test: fix other_keys tests, terms handling, docs pass

812d00c

dsweber2 reviewed Mar 21, 2025

View reviewed changes

Comment thread R/new_epipredict_steps/layer_yeo_johnson.R Outdated

dshemetov and others added 3 commits March 21, 2025 13:48

Update R/new_epipredict_steps/layer_yeo_johnson.R

e8eef18

Co-authored-by: David Weber <david.weber2@pm.me>

fix: delete unused function

9e57488

doc: fix @examples

5d5f1a6

dshemetov requested a review from dajmcdon March 24, 2025 20:52

dshemetov closed this Mar 31, 2025

dshemetov deleted the step branch March 31, 2025 19:47

Conversation

dshemetov commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dajmcdon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshemetov Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dshemetov commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dshemetov commented Mar 11, 2025 •

edited

Loading

dshemetov Mar 19, 2025 •

edited

Loading