[ENH] OpenmMLBenchmarkRunner class by EmanAbdelhaleem · Pull Request #1676 · openml/openml-python

EmanAbdelhaleem · 2026-02-20T10:21:18Z

This is a draft PR for creating a OpenmMLBenchmarkRunner class that handles running benchmarks in parallel instead of using for loops

codecov-commenter · 2026-02-20T10:28:01Z

Codecov Report

❌ Patch coverage is 0% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.15%. Comparing base (ede1497) to head (c413501).

Files with missing lines	Patch %	Lines
openml/benchmarks/benchmark_runner.py	0.00%	55 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1676      +/-   ##
==========================================
- Coverage   52.80%   52.15%   -0.66%     
==========================================
  Files          37       38       +1     
  Lines        4363     4418      +55     
==========================================
  Hits         2304     2304              
- Misses       2059     2114      +55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

EmanAbdelhaleem · 2026-02-23T04:59:35Z

@jgyasu @fkiraly I need a review here.
I also have 2 questions regarding the pause and resume methods:

Should the pause and resume methods get called when the user presses some keyboard key like P for pause and R for resume? or do you have a different workflow in mind?
If the user has paused while some task was in progress, should the task end first before pausing or should the task get interrupted and deprecated?

jgyasu · 2026-02-23T05:18:09Z

I haven't reviewed the code yet but I will answer your question for now,

Should the pause and resume methods get called when the user presses some keyboard key like P for pause and R for resume? or do you have a different workflow in mind?

The goal is to make the benchmark runner robust to crashes or estimator failures. For example, if the 4th estimator fails during execution, the entire benchmark run currently breaks and previously computed results may not be safely stored (correct me if I am wrong?). Instead, the runner should persist results incrementally after each successful run locally.

This would allow us to:

Continuously save results as they are produced.
Avoid losing progress if a failure occurs.
Restart the benchmark later and automatically skip already completed runs.
Execute only the remaining combinations.

So, this is more about checkpointing and resumability than about manual pause/resume. Of course you can then think that the user can pause the experiment with an keyboard interrupt too, and in that case the checkpointing and resumability would work too. @fkiraly do you want to add anything?

EmanAbdelhaleem · 2026-02-23T07:58:14Z

this is more about checkpointing

As far as I know, we deal with classical models currently, right? so, wouldn't checkpointing be much?

Also, what do you exactly mean by storing the results? Is it saving the output that is printed on the terminal or something else?
As far as I know, the current behaviour of benchmarking is as follows:

Select a benchmarking Suite
Iterate over the tasks
Run model on task
Store the results by publishing the runs (if you want)

So, storing is done by publishing, and you can publish each single run when its finished, you don't need to wait till the experiment is done. The problem with looping is that if some run failed, all the benchmark experiment is gonna crash, I solved this by using threading instead of looping.

So, I am not sure what do you exactly mean by save results as they are produced. You can already do this by publishing the run, right?

jgyasu · 2026-02-23T08:08:43Z

As far as I know, we deal with classical models currently, right? so, wouldn't checkpointing be much?

What checkpointing has to do with classical models? I did not get you

So, I am not sure what do you exactly mean by save results as they are produced. You can already do this by publishing the run, right?

What if there is an error in publishing? What happens when the internet connection breaks?

Also, what do you exactly mean by storing the results? Is it saving the output that is printed on the terminal or something else?

Storing the result of the benchmark runs in local disk, maybe in JSON or CSV format (design question). And then the results should be loadable in-memory as dataframes.

EmanAbdelhaleem · 2026-02-23T08:31:24Z

What checkpointing has to do with classical models? I did not get you

As far as I know, checkpointing is about saving the model's sate. This is necessary with deep learning as it usually takes time, but with classical models. A single run usually doesn't take much time, So, I think checkpointing might be much. Let me know what you think.

What if there is an error in publishing? What happens when the internet connection breaks?

I think it is not gonna get published then? You still can save the results of the run locally even with the model pickled if you use the method run.to_filesystem(path). However, it's not saved in a dataframe, it's serialized to be uploaded later using run.from_filesystem(path) that can instantiate an OpenMLRun object based on the saved files.

I think what I can automate here is the path passed to these methods. What do you think?

created OpenmMLBenchamarkRunner class

c413501

geetu040 assigned EmanAbdelhaleem Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[ENH] OpenmMLBenchmarkRunner class#1676

[ENH] OpenmMLBenchmarkRunner class#1676
EmanAbdelhaleem wants to merge 1 commit intoopenml:mainfrom
EmanAbdelhaleem:benchmark

EmanAbdelhaleem commented Feb 20, 2026

Uh oh!

codecov-commenter commented Feb 20, 2026

Uh oh!

EmanAbdelhaleem commented Feb 23, 2026 •

edited

Loading

Uh oh!

jgyasu commented Feb 23, 2026 •

edited

Loading

Uh oh!

EmanAbdelhaleem commented Feb 23, 2026

Uh oh!

jgyasu commented Feb 23, 2026

Uh oh!

EmanAbdelhaleem commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

EmanAbdelhaleem commented Feb 20, 2026

Uh oh!

codecov-commenter commented Feb 20, 2026

Codecov Report

Uh oh!

EmanAbdelhaleem commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgyasu commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmanAbdelhaleem commented Feb 23, 2026

Uh oh!

jgyasu commented Feb 23, 2026

Uh oh!

EmanAbdelhaleem commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EmanAbdelhaleem commented Feb 23, 2026 •

edited

Loading

jgyasu commented Feb 23, 2026 •

edited

Loading

EmanAbdelhaleem commented Feb 23, 2026 •

edited

Loading