Skip to content

Turn off the ironing out loop by default#1170

Open
tsmbland wants to merge 6 commits intomainfrom
ironing_out_off
Open

Turn off the ironing out loop by default#1170
tsmbland wants to merge 6 commits intomainfrom
ironing_out_off

Conversation

@tsmbland
Copy link
Collaborator

@tsmbland tsmbland commented Mar 4, 2026

Description

Turns off the ironing out loop by default (max_ironing_out_iterations = 1), as requested by @ahawkes, but maintains a regression test for a patched version of the simple model with the loop turned on (iterations = 10, the previous default). To do this I had to modify the patching API to allow patching the example model's model.toml (this will also be useful for #1105)

Turns out for the "simple" example this doesn't have any impact, but for others it does

Fixes #1165

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc
  • Update release notes for the latest release if this PR adds a new feature or fixes a bug
    present in the previous release

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@codecov
Copy link

codecov bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 93.75000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.35%. Comparing base (04b86f7) to head (55ef9ae).

Files with missing lines Patch % Lines
src/cli/example.rs 85.71% 0 Missing and 1 partial ⚠️
src/example/patches.rs 96.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1170      +/-   ##
==========================================
+ Coverage   87.30%   87.35%   +0.04%     
==========================================
  Files          57       57              
  Lines        7706     7719      +13     
  Branches     7706     7719      +13     
==========================================
+ Hits         6728     6743      +15     
+ Misses        673      671       -2     
  Partials      305      305              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsmbland tsmbland marked this pull request as ready for review March 4, 2026 14:13
Copilot AI review requested due to automatic review settings March 4, 2026 14:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the default behavior of the model “ironing out” loop by reducing max_ironing_out_iterations from 10 to 1, and updates expected regression outputs accordingly. It also extends the patched-example mechanism so a patched example can optionally include a model.toml patch (intended to preserve coverage of the loop-enabled configuration via a patched “simple” model).

Changes:

  • Set the default max_ironing_out_iterations to 1 (and update the input schema default accordingly).
  • Extend the patched-example registry to include an optional TOML patch, adding a simple_ironing_out patched example that sets iterations back to 10.
  • Refresh a large set of regression “golden” CSV outputs to match the new default behavior; add new golden outputs for simple_ironing_out.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/data/two_regions/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/two_regions/assets.csv Updated expected asset outputs under the new default iteration count.
tests/data/two_outputs/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/two_outputs/assets.csv Updated expected asset outputs under the new default iteration count.
tests/data/simple_npv/assets.csv Updated expected asset outputs for the NPV patched scenario under the new default.
tests/data/simple_ironing_out/commodity_prices.csv New golden output for the patched “simple” model with ironing-out enabled (iterations=10).
tests/data/simple_ironing_out/commodity_flows.csv New golden output for the patched “simple” model with ironing-out enabled (iterations=10).
tests/data/simple_ironing_out/assets.csv New golden output for the patched “simple” model with ironing-out enabled (iterations=10).
tests/data/simple/debug_solver.csv Updated debug solver expectations reflecting fewer iterations by default.
tests/data/simple/debug_dispatch_assets.csv Updated debug dispatch expectations reflecting fewer iterations by default.
tests/data/simple/debug_commodity_balance_duals.csv Updated debug dual expectations reflecting fewer iterations by default.
tests/data/simple/debug_appraisal_results.csv Updated appraisal debug expectations reflecting fewer iterations by default.
tests/data/muse1_default/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/muse1_default/commodity_flows.csv Updated expected commodity flow outputs under the new default iteration count.
tests/data/muse1_default/assets.csv Updated expected asset outputs under the new default iteration count.
tests/data/circularity/commodity_prices.csv Updated expected commodity price outputs under the new default iteration count.
tests/data/circularity/assets.csv Updated expected asset outputs under the new default iteration count.
src/model/parameters.rs Changes the default max_ironing_out_iterations from 10 to 1.
src/example/patches.rs Extends patch registry to include optional TOML patch; adds simple_ironing_out patch entry.
src/cli/example.rs Updates CLI patched-example extraction to apply file patches + optional TOML patch.
schemas/input/model.yaml Updates schema default for max_ironing_out_iterations to 1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

fn extract_example(name: &str, patch: bool, dest: &Path) -> Result<()> {
if patch {
let patches = get_patches(name)?;
let (file_patches, toml_patch) = get_patches(name)?;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_patches(name)? returns a reference to a (Vec<FilePatch>, Option<String>), but this code destructures it as if it were an owned tuple. This won’t compile (type mismatch &(…) vs (…)). Either bind the returned reference and access its fields, or change get_patches to return an owned tuple (e.g., cloned) so destructuring works.

Suggested change
let (file_patches, toml_patch) = get_patches(name)?;
let (file_patches, toml_patch) = get_patches(name)?.clone();

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it does compile...

@tsmbland
Copy link
Collaborator Author

tsmbland commented Mar 4, 2026

@alexdewar Now getting slightly different results on macos which is concerning. I seem to remember something like this has happened before...?

@tsmbland tsmbland requested review from Aurashk and alexdewar March 4, 2026 14:25
@alexdewar
Copy link
Collaborator

@alexdewar Now getting slightly different results on macos which is concerning. I seem to remember something like this has happened before...?

These differences are pretty large too:

thread 'two_regions' (11401) panicked at tests/regression.rs:78:5:
The following errors occurred:
  * commodity_flows.csv: line 751:
    + "2045,8,gas,all-year.night,-0.86000992"
    - "2045,8,gas,all-year.night,-1.39307392"
  * commodity_flows.csv: line 752:
    + "2045,8,electricity,all-year.night,0.514976"
    - "2045,8,electricity,all-year.night,0.834176"
  * commodity_flows.csv: line 753:
    + "2045,8,CO2f,all-year.night,47.20784992"
    - "2045,8,CO2f,all-year.night,76.46891392"
  * commodity_flows.csv: line 955:
    + "2045,25,gas,all-year.night,-0.5330640000000001"
    - "2045,25,gas,all-year.night,-0.0"
  * commodity_flows.csv: line 956:
    + "2045,25,electricity,all-year.night,0.3192000000000001"
    - "2045,25,electricity,all-year.night,0.0"
  * commodity_flows.csv: line 957:
    + "2045,25,CO2f,all-year.night,29.261064000000008"
    - "2045,25,CO2f,all-year.night,0.0"

Not sure what's going on here. If you want to compare the macOS results with e.g. the Windows ones, we upload them as test artifacts you can see on the summary page: https://github.com/EnergySystemsModellingLab/MUSE2/actions/runs/22673536177?pr=1170

@tsmbland
Copy link
Collaborator Author

tsmbland commented Mar 5, 2026

The difference concerns electricity supply in the night timeslice in 2045 by two assets (8 and 25):

  • on mac: 0.32 electricity supply by asset 25, 0.51 by asset 8 (= 0.83 total)
  • on windows/ubuntu: 0 electricity supply by asset 25, 0.83 by asset 8 (= 0.83 total)

These are both gasCCGT assets in R2, and although the commission year is different, the parameters for these assets are identical. Therefore, as far as the objective is concerned, the two solutions are equally good. It's probably the case for a lot of optimisations that there's no single optimal solution, rather a space of equally good solutions, but 99.9% of the time highs seems to pick the same solution every time, regardless of OS (I think this always ends up being a "corner" solution, i.e. where one or more parameters in the space is at its limit. The two solutions above are different corner solutions where supply by asset 25 is at its upper and lower limit respectively).

I don't know the factors that cause it to choose one optimum over another, but it seems that tiny floating-point differences between platforms can have an impact (perhaps tilting the energy landscape slightly so that one solution becomes ever-so-slightly favourable over another, or affecting pivot decisions in the optimisation path so it lands on a different vertex).

The best solution I can think of is to modify the objective so that we can guarantee a single optimum solution. One idea would be L2 regularisation, adding a small epsilon-scaled term to the objective to minimise the sum of squares of the variables. Generally this favours smaller values over large "extreme" values, which for our purposes means spreading activity over multiple assets rather than allocating all activity to a single asset - probably not a bad thing to aim for in any case. (In this particular example, the mac solution would be the single unique optimum).

In practice I'm not sure how easy this would be. highs-sys has Highs_passHessian which I believe makes this possible, but this isn't exposed in the highs crate we're using. I believe it's also incompatible with integer variables, which we are using in some situations, but I think we could circumvent this with a two-step approach. This is all based on a conversation with AI so take with a pinch of salt.

Apart from that I don't really have any other ideas...

@tsmbland
Copy link
Collaborator Author

tsmbland commented Mar 5, 2026

Annoying that this PR is getting blocked by something completely unrelated! For the time being, do you think it would be ok to turn off this particular regression test until we can get this fixed? What else can we do...?

@tsmbland
Copy link
Collaborator Author

tsmbland commented Mar 5, 2026

Or revert this particular example model back to using the ironing out loop and update it later?

@alexdewar
Copy link
Collaborator

The difference concerns electricity supply in the night timeslice in 2045 by two assets (8 and 25):

  • on mac: 0.32 electricity supply by asset 25, 0.51 by asset 8 (= 0.83 total)
  • on windows/ubuntu: 0 electricity supply by asset 25, 0.83 by asset 8 (= 0.83 total)

These are both gasCCGT assets in R2, and although the commission year is different, the parameters for these assets are identical. Therefore, as far as the objective is concerned, the two solutions are equally good. It's probably the case for a lot of optimisations that there's no single optimal solution, rather a space of equally good solutions, but 99.9% of the time highs seems to pick the same solution every time, regardless of OS (I think this always ends up being a "corner" solution, i.e. where one or more parameters in the space is at its limit. The two solutions above are different corner solutions where supply by asset 25 is at its upper and lower limit respectively).

I don't know the factors that cause it to choose one optimum over another, but it seems that tiny floating-point differences between platforms can have an impact (perhaps tilting the energy landscape slightly so that one solution becomes ever-so-slightly favourable over another, or affecting pivot decisions in the optimisation path so it lands on a different vertex).

This sounds plausible.

The best solution I can think of is to modify the objective so that we can guarantee a single optimum solution. One idea would be L2 regularisation, adding a small epsilon-scaled term to the objective to minimise the sum of squares of the variables. Generally this favours smaller values over large "extreme" values, which for our purposes means spreading activity over multiple assets rather than allocating all activity to a single asset - probably not a bad thing to aim for in any case. (In this particular example, the mac solution would be the single unique optimum).

In practice I'm not sure how easy this would be. highs-sys has Highs_passHessian which I believe makes this possible, but this isn't exposed in the highs crate we're using. I believe it's also incompatible with integer variables, which we are using in some situations, but I think we could circumvent this with a two-step approach. This is all based on a conversation with AI so take with a pinch of salt.

Apart from that I don't really have any other ideas...

This is an interesting idea, but probably quite a lot of work and, as you say, out of scope for this PR anyway.

As an aside, you can actually call the functions in the lower-level highs-sys crate by getting the pointer from your Model struct and passing that in (it'll have to be in an unsafe block), but if we know we want this functionality it would be cleaner to open a PR for the highs crate adding it (I did that to get the objective value thing added).

Annoying that this PR is getting blocked by something completely unrelated! For the time being, do you think it would be ok to turn off this particular regression test until we can get this fixed? What else can we do...?

How about we just skip this test on non-x86 platforms for now?

This should work:

diff --git a/tests/regression.rs b/tests/regression.rs
index 41c933b9..4307bc3e 100644
--- a/tests/regression.rs
+++ b/tests/regression.rs
@@ -26,6 +26,9 @@ define_regression_test!(circularity);
 // Patched examples
 define_regression_test_with_patches!(simple_divisible);
 define_regression_test_with_patches!(simple_npv);
+
+// We get different results on ARM Macs, for reasons that aren't clear
+#[cfg(target_arch = "x86_64")]
 define_regression_test_with_patches!(simple_ironing_out);
 
 // ------  END: regression tests  ------

It would be better if we opened an issue and linked to it in the comment. It would be nice if the examples gave similar results on all the platforms we care about, but it probably shouldn't be a high priority to fix this. But let's keep track of it anyway!

@tsmbland
Copy link
Collaborator Author

tsmbland commented Mar 5, 2026

I'm going to hold this until #1173. That will be another major change to all of the example models, and there's every chance that that will magically fix the issue here. (Or, it could just as likely introduce more issues just like this...)

In any case, this does reveal an underlying issue, so I've opened #1174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Turn off ironing out loop by default

3 participants