Multiple forward per backward

Currently, our model does one forward pass and uses the intermediate states to do one backward pass. However, a backward pass is over 3x as expensive as a forward pass, so we could change the ratio of forward to backward passes to speed up the model.\
One such approach would be [MESA](https://arxiv.org/abs/2205.14083), which adds `KL(model(x), ema_model(x))`. Another method is [RHO-Loss](https://proceedings.mlr.press/v162/mindermann22a.html), which prioritizes some samples over others, by running `(model(x) - oracle(x)).topk()`. Both of these methods claim to improve sample efficiency by up to 18x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple forward per backward #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple forward per backward #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions