Imp/tsmixer basic by eschibli · Pull Request #2555 · unit8co/darts

eschibli · 2024-10-08T04:56:45Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Implements #2510

Summary

Adds the option to project to the output temporal space at the end of TS-Mixer, rather than the beginning. This was how most of the results in the original google-research paper were achieved (ie, the architecture in Fig #1 of the paper). This may allow higher performance in cases where past covariates are important by allowing a more direct series of residual connections along the input time dimension.

I allowed support for future covariates by instead projecting them into the lookback temporal space, but this probably won't perform well in cases where they are more important than the historical targets and past covariates.

Other Information

The original paper and source code do not clarify whether the final temporal projection should go before or after the final feature projection as they hardcoded hidden_size to output_dim and therefore did not have need a final feature projection. I erred on the side of putting the temporal projection first, as otherwise the common output_dim==1 could lead to unexpected, catastrophic compression before the temporal projection step.

review-notebook-app · 2024-10-23T01:50:25Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

madtoinou · 2024-10-24T07:51:50Z

Hi @eschibli,

First of all, thanks for opening this PR!

For the linting, it will make your life much easier if you follow these instruction, or you can also run it manually

…mixer-basic

codecov · 2024-10-27T18:49:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.63%. Comparing base (e6f180d) to head (1e48108).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2555      +/-   ##
==========================================
- Coverage   95.69%   95.63%   -0.07%     
==========================================
  Files         156      156              
  Lines       16926    16940      +14     
==========================================
+ Hits        16197    16200       +3     
- Misses        729      740      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

eschibli · 2024-10-28T19:54:51Z

Hi @eschibli,
...

Thanks @madtoinou. I was not able to get Gradle running on my machine and didn't realize ruff was that easy to set up so sorry for spamming your test pipeline.

I don't believe the failing mac build is a result of my changes so it should be good for review now.

dennisbader · 2024-11-02T12:05:59Z

Hi @eschibli, thanks for the PR. Yes, the failing mac tests are unrelated to your PR, we're working on it :).
Also, give us some time to review, our capacity is currently a bit limited 🙏

…mixer-basic

eschibli · 2024-11-03T22:40:12Z

Understood Dennis

madtoinou

It looks great, thank you for this nice PR @eschibli! Some minor comment about the order of the operations/projections to make the flow more intuitive.

Could you also extend the TSMixer notebook to include a section where the difference in performance with "project_first_layer=True/False" and future covariates can be visualized?

madtoinou · 2024-11-11T09:54:59Z

+        # In the original paper this was not implimented for future covariates,
+        # but rather than ignoring them or raising an error we remap them to the input time dimension.
+        # Suboptimal but may be useful in some cases.
+        elif self.future_cov_dim:


to make it a bit more intuitive, I would move this code below, inside the if self.future_cov_dim and change the condition to if not self.project_first_layer in order to group the operation on each kind of features:

"target"; project to output time dimension in the first layer if project_first_layer = True otherwise we stay in input time dimension

"target"; do the feature_mixing_hist (not changed)

"fut_cov"; project the future covariates to input time dimension if project_first_layer=False (the logic you added)

concatenate the future covariates to the target features (not changed)

static covariates (not changed)

"target"; projection to the output time dimension if it did not occur earlier

"target"; application of fc_out, critical for probabilistic forecasts

This implementation is out-of-date with the current version of the PR, and I think the current version makes more sense.

I set it up this way originally because the TSMixer performance test was failing if I didn't generate the layers in exactly the same order. In this PR I have relaxed the tolerance as it wasn't stable.

eschibli · 2025-06-08T23:00:21Z

@madtoinou are you happy with these changes?

eschibli · 2025-08-01T15:50:01Z

@madtoinou @dennisbader sorry to nag but could you please clarify if you are unhappy with this addition or if more changes are needed to approve this?

dennisbader · 2025-08-04T09:58:24Z

@madtoinou could you give an update on this one? :)

madtoinou · 2025-08-20T09:18:03Z

Hi @eschibli,

Sorry for the delay, I started reviewing but it took me longer than expected.

These changes are welcomed, I have the impression that it will bring considerable performance gain but I want to make sure that the current behavior is still accessible and reproducible.

eschibli · 2026-02-11T23:20:01Z

Sorry to nag but is there anything I could do to get this merged? It should reproduce current behavior when project_after isn't passed a positive integer -- though it's not trivial to prove this to be exactly as the changes @madtoinou suggested result in initializing weights in a different order -- and it allows the other architecture in the paper to be used when project_after is passed num_blocks. In my work I've found better results with an intermediate value when past and future covs are important but I can't share those unfortunately.

dennisbader · 2026-02-13T07:45:19Z

Hi @eschibli and really sorry that this got a bit lost. We'll review it until next Friday (most likely on next Friday).

dennisbader

Thanks a lot for the PR and the updates at @eschibli, it looks really good 🚀

We're almost there :)

I added a couple of minor suggestions and one where we should discuss a bit more (the one where encoder_to_decoder is not applied at a different step compared to the original implementation).

Sorry again for the delay on this one 🙏

dennisbader · 2026-02-15T10:20:08Z

            "normalize_before": normalize_before,
        }

+        # Projects from the input time dimension to the output time dimension


self.fc_hist is not used anymore (I guess replaced by the new encoder_to_decoder?). We should remove one of the two

dennisbader · 2026-02-15T10:21:49Z

+        num_encoder_blocks
+            The number of mixer blocks in the encoder.
+        num_decoder_blocks
+            The number of mixer blocks in the decoder.


These are not available, and description of project_after_n_blocks is missing

Suggested change

num_encoder_blocks

The number of mixer blocks in the encoder.

num_decoder_blocks

The number of mixer blocks in the decoder.

num_blocks

The number of mixer blocks in the model. The number includes the first block and all subsequent blocks.

dennisbader · 2026-02-15T10:28:19Z

+        # Raise exception for nonsensical number of encoder and decoder blocks
+        if (
+            num_encoder_blocks < 0
+            or num_decoder_blocks < 0
+            or (num_encoder_blocks + num_decoder_blocks != self.num_blocks)
+        ):
+            raise_log(
+                ValueError(
+                    f"Invalid number of encoder and decoder blocks. "
+                    f"project_after_n_blocks must be between 0 and {self.num_blocks} inclusive."
+                ),
+            )


We should move this sanity check to __init__, simply check that 0 <= project_after_n_blocks <= num_blocks, and rephrase the message to

f"`project_after_n_blocks` must be between 0 and {num_blocks} inclusive."`

dennisbader · 2026-02-15T10:58:17Z

+            # (B, C, S) -> (B, 1, C * S)
+            x_static_hist = x_static.reshape(x_static.shape[0], 1, -1)
+            # repeat to match lookback time dim: (B, 1, C * S) -> (B, L, C * S)
+            x_static_hist = x_static_hist.repeat(1, self.input_chunk_length, 1)
+
+            # (B, C, S) -> (B, 1, C * S)
+            x_static_future = x_static.reshape(x_static.shape[0], 1, -1)
+            # repeat to match horizon time dim: (B, 1, C * S) -> (B, T, C * S)
+            x_static_future = x_static_future.repeat(1, self.output_chunk_length, 1)


This can be simplified. Also, it would be nice to only create x_static_hist / x_static_future if they are really required (e.g. future only if the encoder is not used).

Any type of operation that can be avoided has positive impact on the model throughput :)

Suggested change

# (B, C, S) -> (B, 1, C * S)

x_static_hist = x_static.reshape(x_static.shape[0], 1, -1)

# repeat to match lookback time dim: (B, 1, C * S) -> (B, L, C * S)

x_static_hist = x_static_hist.repeat(1, self.input_chunk_length, 1)

# (B, C, S) -> (B, 1, C * S)

x_static_future = x_static.reshape(x_static.shape[0], 1, -1)

# repeat to match horizon time dim: (B, 1, C * S) -> (B, T, C * S)

x_static_future = x_static_future.repeat(1, self.output_chunk_length, 1)

# (B, C, S) -> (B, 1, C * S)

x_static = x_static.reshape(x_static.shape[0], 1, -1)

# repeat to match lookback time dim: (B, 1, C * S) -> (B, L, C * S)

x_static_hist = x_static.repeat(1, self.input_chunk_length, 1)

# repeat to match horizon time dim: (B, 1, C * S) -> (B, T, C * S)

x_static_future = x_static.repeat(1, self.output_chunk_length, 1)

dennisbader · 2026-02-15T11:15:18Z

+
+        # Project time dimension (B, L, H_S) -> (B, T, H_S)
+        x = x.transpose(1, 2)
+        x = self.encoder_to_decoder(x)  # Linear map


Hmm.. The encoder_to_decoder (previously called fc_hist) is now applied after feature_mixing_hist.

This means feature_mixing_hist is only applied on the input chunk (L), instead of on the output chunk (T). From Figure 4 in the paper, the temporal projection for historic features is applied also before the feature_mixing_hist.

Meaning, users will probably not be able to reproduce results from earlier Darts versions.

dennisbader · 2026-02-15T11:17:29Z

-        for mixing_layer in self.conditional_mixer:
-            # conditional mixer layers with static covariates (B, T, 2 * H_S), (B, T, C * S) -> (B, T, H_S)
-            x = mixing_layer(x, x_static=x_static)
+        # Apply decoder mixer layers


let's add the shape transformation

dennisbader · 2026-02-15T11:17:59Z

-            x_static = x_static.reshape(x_static.shape[0], 1, -1)
-            # repeat to match horizon (B, 1, C * S) -> (B, T, C * S)
-            x_static = x_static.repeat(1, self.output_chunk_length, 1)
+        # Apply encoder mixer layers


can you add the shape transformation (e.g. (B, ...) -> (B, ...))

eschibli added 4 commits September 3, 2024 08:03

Add tsmixer-basic

e12a751

Implimented project_first

1bd17da

Merge branch 'master' of https://github.com/unit8co/darts

34e431f

Updated changelog

bb72922

eschibli requested review from dennisbader and madtoinou as code owners October 8, 2024 04:56

eschibli added 6 commits October 8, 2024 21:53

Linting

1934928

Additional linting

c5de724

Further updated changelog

b8a2e88

More linting?

c6c7e97

Removed unnecessary layer init

79612f0

linting

6cdc9db

eschibli added 2 commits October 23, 2024 21:13

Reverted example

171dc34

linting????

f9797a3

eschibli added 4 commits October 24, 2024 18:39

auto formatting

e528f85

Merge branch 'master' of https://github.com/unit8co/darts into Imp/ts…

133547e

…mixer-basic

Merge branch 'master' of https://github.com/unit8co/darts into Imp/ts…

5afff01

…mixer-basic

Added test

3a392a3

Improved test coverage

ebe02d1

Docustring tweak

0a90f24

Merge branch 'master' of https://github.com/unit8co/darts into Imp/ts…

2674e1c

…mixer-basic

Merge branch 'master' into Imp/tsmixer-basic

2bf09ce

madtoinou reviewed Nov 12, 2024

View reviewed changes

Merge branch 'master' into Imp/tsmixer-basic

245ae09

eschibli and others added 8 commits April 19, 2025 17:30

Relaxed TSMixer performance theshold

46cf888

Try again

c374724

Merge branch 'master' into Imp/tsmixer-basic

67a89f1

Try again

ef1a322

Corrected changelog

0dd0c30

Added additional test

a4891e2

Merge branch 'master' into Imp/tsmixer-basic

a46645a

Merge branch 'master' into Imp/tsmixer-basic

c5a5fe4

eschibli requested a review from madtoinou May 11, 2025 01:36

eschibli added 2 commits May 29, 2025 10:35

Merge branch 'master' into Imp/tsmixer-basic

af74b5b

Merge branch 'master' into Imp/tsmixer-basic

8550314

eschibli and others added 7 commits June 17, 2025 08:56

Merge branch 'master' into Imp/tsmixer-basic

df935cc

Merge branch 'master' into Imp/tsmixer-basic

2c156f5

Merge branch 'master' into Imp/tsmixer-basic

694ca4a

Merge branch 'master' into Imp/tsmixer-basic

94c6d71

Merge branch 'master' into Imp/tsmixer-basic

f31af84

Merge branch 'master' into Imp/tsmixer-basic

784e9ea

Merge branch 'master' into Imp/tsmixer-basic

7ca964b

eschibli and others added 3 commits January 7, 2026 14:54

Merge branch 'unit8co:master' into Imp/tsmixer-basic

176dd5c

Corrected some code comments

7ad36bd

Merge branch 'master' into Imp/tsmixer-basic

552c92b

dennisbader added 2 commits February 15, 2026 10:57

Merge branch 'master' into Imp/tsmixer-basic

83264d4

minor updates

1e48108

dennisbader requested changes Feb 15, 2026

View reviewed changes

Conversation

eschibli commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Other Information

Uh oh!

review-notebook-app Bot commented Oct 23, 2024

Uh oh!

madtoinou commented Oct 24, 2024

Uh oh!

codecov Bot commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eschibli commented Oct 28, 2024

Uh oh!

dennisbader commented Nov 2, 2024

Uh oh!

eschibli commented Nov 3, 2024

Uh oh!

madtoinou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eschibli commented Jun 8, 2025

Uh oh!

eschibli commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dennisbader commented Aug 4, 2025

Uh oh!

madtoinou commented Aug 20, 2025

Uh oh!

eschibli commented Feb 11, 2026

Uh oh!

dennisbader commented Feb 13, 2026

Uh oh!

dennisbader left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eschibli commented Oct 8, 2024 •

edited

Loading

codecov Bot commented Oct 27, 2024 •

edited

Loading

eschibli commented Aug 1, 2025 •

edited

Loading