ESDP

Early Stop Decision Polishing for Long-read Bacterial Genome Polishing

Stop polishing when it matters, not when it is scheduled

Research-use software. Predictions should be interpreted together with assembly-quality metrics and domain expertise.

Version: 1.0.2

Overview

Oxford Nanopore sequencing enables high-contiguity bacterial genome assemblies, but iterative polishing pipelines are commonly executed with a fixed number of rounds. In practice, this can lead to unnecessary computation after assembly quality has already stabilized.

ESDP is a machine learning-based decision-support framework designed to recommend whether polishing can stop early or should continue. Rather than predicting exact polishing rounds directly for deployment, ESDP combines statistical prediction with explicit operational safeguards to produce conservative and transparent stopping recommendations.

The framework was developed and evaluated on 805 polishing records derived from 41 bacterial samples spanning 9 genera, five polishing rounds, and four coverage groups (10X, 20X, 40X, and FULL). To avoid information leakage, evaluation was performed with sample-level separation, using 32 training samples and 9 held-out test samples.

ESDP supports both command-line and service-based use. The software includes a REST API implemented with FastAPI, containerized deployment, reproducible model artifacts, and dedicated verification scripts for decision logic, API behavior, benchmarking, and sensitivity analysis.

In the current benchmark, Random Forest was selected as the final model for downstream deployment and benchmarking based on its overall balance across exact classification, ordinal agreement, and error minimization on the held-out test set.

Why ESDP?

Fixed-round polishing workflows are simple to run, but they often continue processing after assembly quality has already stabilized. This can lead to unnecessary computational cost, limited visibility into convergence behavior, and little practical support for adaptive stopping decisions.

ESDP addresses this problem by using machine learning to recommend a conservative polishing strategy based on early-round assembly-quality signals. Instead of relying only on a raw classifier output, the framework combines statistical prediction with explicit decision rules to support transparent and reproducible early-stop recommendations.

Problem Formulation

During development, several challenges had to be addressed to make adaptive polishing decisions reliable and deployable.

1. From round-level prediction to operational stopping classes

A direct five-class formulation based on exact polishing rounds proved difficult to model robustly and was less suitable for deployment. To make the task more stable and operationally meaningful, ESDP reformulates the problem into three stopping categories:

Early: stop after 1–2 rounds
Medium: stop after 3–4 rounds
Late: continue to round 5

This representation improves decision stability while preserving the ordinal structure of the polishing process.

2. Capturing polishing dynamics, not only absolute quality

Single-round metrics alone do not fully describe whether polishing is still improving meaningfully or has already plateaued. To address this, ESDP uses engineered features that summarize both assembly quality and convergence behavior, including:

round-to-round delta features
first-round normalized metrics
ratio-based and cumulative descriptors
plateau indicators
domain-specific summary features

These features help the model distinguish between early convergence, transitional cases, and assemblies that still benefit from continued polishing.

3. Preventing leakage and preserving realistic evaluation

Because multiple records can be derived from the same biological sample across rounds and coverage groups, row-level splitting would produce information leakage. ESDP therefore uses sample-level grouping during training and evaluation so that held-out predictions reflect genuinely unseen isolates rather than repeated observations from the same sample.

4. Separating prediction from decision logic

A key design principle of ESDP is that model prediction and deployment-time decision support are treated separately. The final recommendation layer applies explicit safeguards to make outputs more conservative and interpretable, including:

first-round quality overrides
confidence-aware escalation to more conservative recommendations
optional forced-conservative operation

This design makes ESDP more suitable for practical workflow integration than a classifier alone.

Key Features

3-class stopping framework

Instead of predicting exact polishing rounds directly for deployment, ESDP predicts stopping categories that are more stable and operationally meaningful:

Class	Strategy	Recommended rounds	Description
1	Early	1–2	Assembly quality stabilizes early
2	Medium	3–4	Transitional regime with moderate additional benefit
3	Late	5	Continued polishing is recommended

This formulation preserves the ordinal structure of the polishing process while reducing unnecessary sensitivity to adjacent round-level differences. The held-out test results also showed that most residual errors occurred between adjacent classes rather than as extreme misclassifications.

Feature engineering for polishing dynamics

ESDP uses a structured feature space built from per-round polishing and assembly-quality measurements. The feature set includes:

round-to-round delta features
ratio-based descriptors
cumulative improvement variables
first-round normalized metrics
plateau indicators
domain-specific summary features

These features were designed to capture not only absolute assembly quality, but also convergence behavior and diminishing returns across polishing rounds.

Decision-support layer

A key design element of ESDP is the separation between statistical prediction and operational decision logic. Model outputs are processed by a dedicated decision module that applies explicit safeguards to support conservative and transparent recommendations. These include:

first-round quality overrides
confidence-aware escalation to more conservative recommendations
optional forced-conservative operation

Returned outputs include the predicted class, recommended number of polishing rounds, class probabilities, confidence score, decision rationale, warning flags, and metadata on applied overrides.

Interfaces and deployment

ESDP supports both command-line execution and service-based inference. The software includes:

a REST API implemented with FastAPI
containerized deployment
bundled model artifacts and feature metadata
verification scripts for decision logic, API behavior, and pipeline integration
reproducible benchmarking and sensitivity-analysis workflows

This design allows the framework to function both as software and as a reproducible decision system for adaptive genome polishing.

Evaluation outputs

The evaluation workflow generates standardized artifacts for inspection and reuse, including:

trained models
preprocessing objects
selected feature lists
confusion matrices
feature-importance visualizations
comparative performance summaries

These outputs support reproducibility, model inspection, and downstream benchmarking.

Performance Highlights

Dataset statistics

41 bacterial samples
805 polishing records
9 bacterial genera
5 polishing rounds
4 coverage groups: 10X, 20X, 40X, and FULL
Sample-level split: 32 training samples / 9 held-out test samples

The current benchmark dataset was generated from Oxford Nanopore bacterial read sets assembled with Flye and processed through an iterative polishing workflow under a fixed experimental configuration.

Final model performance

On the held-out sample-level test set, the selected Random Forest model achieved:

Metric	Value
Accuracy	0.629
Balanced Accuracy	0.592
Macro-F1	0.568
MAE	0.482
Accuracy ±1 class	0.888
QWK	0.561

Class-wise performance was strongest for the early and late stopping categories, while the intermediate class remained the most difficult to resolve. For Random Forest, recall was 0.822 for class 1, 0.286 for class 2, and 0.667 for class 3.

Practical benchmark

Across 34 valid test trajectories derived from 9 held-out samples, ESDP achieved:

0.60 CPU-hours saved per trajectory on average (95% CI: 0.44–0.76)
44.71% mean CPU reduction (95% CI: 32.94–56.50)
33/34 trajectories with zero QV loss (97.1%)
34/34 trajectories within the predefined acceptable loss range (100.0%)
200.17% mean efficiency gain relative to fixed five-round polishing

These results indicate that ESDP can provide practical computational savings while preserving assembly quality in the evaluated benchmark setting.

Model Details

Final model

During development, ESDP compared multiple supervised learning strategies, including XGBoost, Random Forest, ordinal regression, and ensemble configurations. On the held-out sample-level test set, Random Forest achieved the best overall balance across exact classification, ordinal agreement, and error minimization, and was therefore selected as the final model for downstream benchmarking and deployment.

Decision-support design

A key design principle of ESDP is the separation between statistical prediction and deployment-time decision support. Model outputs are processed through a dedicated decision layer that applies explicit safeguards to support conservative and interpretable recommendations. These safeguards include:

first-round quality overrides
confidence-aware escalation toward more conservative recommendations
optional forced-conservative operation

Returned outputs include the predicted stopping class, recommended number of polishing rounds, class probabilities, confidence score, rationale, warning flags, and metadata on applied overrides.

Feature groups

ESDP uses engineered features designed to capture both absolute assembly quality and polishing dynamics. These include:

base assembly and polishing metrics
round-to-round delta features
first-round normalized metrics
ratio-based descriptors
cumulative improvement variables
plateau indicators
domain-specific summary features

To reduce training-serving skew, preprocessing is embedded within the serialized model pipeline. Infinite values are converted to missing values, missing features are preserved until inference-time preprocessing, and imputation and scaling are applied consistently during both training and deployment.

Training and evaluation

The current benchmark dataset contains 805 polishing records derived from 41 bacterial samples, spanning 9 genera, 5 polishing rounds, and 4 coverage groups (10X, 20X, 40X, and FULL). Evaluation was performed with sample-level separation to prevent leakage between coverage-derived observations from the same isolate, resulting in 32 training samples and 9 held-out test samples.

On the held-out test set, the final Random Forest model achieved:

Metric	Value
Accuracy	0.629
Balanced Accuracy	0.592
Macro-F1	0.568
MAE	0.482
Accuracy ±1 class	0.888
QWK	0.561

Most residual errors occurred between adjacent stopping categories rather than as extreme misclassifications. Class-wise recall for the final Random Forest model was 0.822 for class 1, 0.286 for class 2, and 0.667 for class 3.

Baseline comparison

To assess practical value, ESDP was compared against three baseline decision strategies on the held-out test set:

Always Late (fixed five-round polishing)
QV-threshold rule
R1-only Random Forest

Strategy	Balanced Accuracy	Macro-F1	MAE	QWK
ESDP	0.592	0.568	0.482	0.561
Always Late	0.333	0.231	0.735	0.000
QV Threshold	0.333	0.231	0.735	0.000
R1-only RF	0.566	0.571	0.465	0.538

These comparisons show that ESDP clearly outperforms naive fixed or heuristic strategies, while also indicating that first-round signals already contain substantial predictive information. The full ESDP framework retained the best overall balanced accuracy and quadratic weighted kappa.

Tools & Dependencies

ESDP builds on the following core tools and libraries:

Flye - long-read assembler for bacterial genome reconstruction
Racon - consensus polishing for long-read assemblies
Medaka - neural network-based polishing for Oxford Nanopore data
QUAST - assembly quality assessment
BUSCO - genome completeness assessment
scikit-learn - machine learning framework
XGBoost - gradient boosting models evaluated during development
imbalanced-learn - resampling utilities used during model development
FastAPI - REST API layer
Docker - containerized deployment

Citation

If you use ESDP in your research, please cite the archived Zenodo release:

@software{lucas_esdp_2026,
  author       = {Jimmy Lucas and Roger de Pedro},
  title        = {ESDP: Early Stop Decision Polishing},
  year         = {2026},
  publisher    = {Zenodo},
  version      = {v1.0.2},
  doi          = {10.5281/zenodo.18910597},
  url          = {https://doi.org/10.5281/zenodo.18910597}
}

Acknowledgments

We thank the bioinformatics community for the development and maintenance of the open-source software that supports this project, including Flye, Racon, Medaka, QUAST, BUSCO, scikit-learn, XGBoost, FastAPI, and Docker. We also acknowledge the NCBI Sequence Read Archive for providing public access to the sequencing datasets used to construct the benchmark dataset.

Contact

Maintainer

GitHub: @jimmlucas

Issues & Support

Bug Reports: GitHub Issues
Feature Requests: GitHub Discussions
Questions: GitHub Discussions Q&A

Community

GitHub: @AMRmicrobiology

License

ESDP is released under the MIT License. See the LICENSE file for details.

Limitations

Current limitations of ESDP include:

The benchmark dataset comprises 805 polishing records derived from 41 bacterial samples, so the effective biological diversity represented during development remains limited.
The current evaluation was performed under a specific Oxford Nanopore + Flye polishing configuration, and performance may differ across other assemblers, polishers, sequencing conditions, or species distributions.
The intermediate stopping category remains the most difficult to resolve, indicating that transitional polishing states are less separable than clear early or late stopping scenarios.
ESDP requires the availability of the expected assembly-quality metrics and feature inputs used by the trained pipeline.

Future Improvements

Planned directions for future development include:

Broader external validation across additional bacterial genera, coverage regimes, and polishing workflows.
Recalibration of confidence thresholds for different operational settings.
Incorporation of richer early-round features to improve transitional-state detection.
Exploration of alternative decision architectures for more robust medium-class recommendations.
Prospective evaluation in routine deployment settings.

References

Flye
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37(5):540-546.
Racon
Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737-746.
Medaka
Oxford Nanopore Technologies. Medaka. Available from: https://github.com/nanoporetech/medaka
QUAST
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072-1075.
BUSCO
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210-3212.
XGBoost
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 785-794.
scikit-learn
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-2830.
SMOTE
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-357.
Ordinal classification
Frank E, Hall M. A simple approach to ordinal classification. In: European Conference on Machine Learning; 2001. p. 145-156.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
dataSet_preparation		dataSet_preparation
docs		docs
models		models
outputs		outputs
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
10_sensitivity_analysis.py		10_sensitivity_analysis.py
1_csv_merge.py		1_csv_merge.py
2_exploratory_analysis.py		2_exploratory_analysis.py
3_feature_engineering.py		3_feature_engineering.py
4_label_optimal_round.py		4_label_optimal_round.py
5_train_models.py		5_train_models.py
7_inference_pipeline.py		7_inference_pipeline.py
8_evaluate_models.py		8_evaluate_models.py
9_benchmark_resources.py		9_benchmark_resources.py
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api_service.py		api_service.py
config.yaml		config.yaml
decision.json		decision.json
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
esdp_cli.py		esdp_cli.py
esdp_decide.py		esdp_decide.py
requirements.txt		requirements.txt
run_pipeline.sh		run_pipeline.sh
run_test.sh		run_test.sh
test.json		test.json

Folders and files

Latest commit

History

Repository files navigation

ESDP

Early Stop Decision Polishing for Long-read Bacterial Genome Polishing

Table of Contents

Overview

Why ESDP?

Problem Formulation

1. From round-level prediction to operational stopping classes

2. Capturing polishing dynamics, not only absolute quality

3. Preventing leakage and preserving realistic evaluation

4. Separating prediction from decision logic

Key Features

3-class stopping framework

Feature engineering for polishing dynamics

Decision-support layer

Interfaces and deployment

Evaluation outputs

Performance Highlights

Dataset statistics

Final model performance

Practical benchmark

Model Details

Final model

Decision-support design

Feature groups

Training and evaluation

Baseline comparison

Tools & Dependencies

Citation

Acknowledgments

Contact

Maintainer

Issues & Support

Community

License

Limitations

Future Improvements

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages