Stop polishing when it matters, not when it is scheduled
Research-use software. Predictions should be interpreted together with assembly-quality metrics and domain expertise.
Version: 1.0.2
- Installation
- Build Dataset
- Usage
- Overview
- Problem Formulation
- Key Features
- Performance Highlights
- Model Details
- Tools & Dependencies
- Citation
- Acknowledgments
- Contact
- License
- Limitations
- Future Improvements
- References
Oxford Nanopore sequencing enables high-contiguity bacterial genome assemblies, but iterative polishing pipelines are commonly executed with a fixed number of rounds. In practice, this can lead to unnecessary computation after assembly quality has already stabilized.
ESDP is a machine learning-based decision-support framework designed to recommend whether polishing can stop early or should continue. Rather than predicting exact polishing rounds directly for deployment, ESDP combines statistical prediction with explicit operational safeguards to produce conservative and transparent stopping recommendations.
The framework was developed and evaluated on 805 polishing records derived from 41 bacterial samples spanning 9 genera, five polishing rounds, and four coverage groups (10X, 20X, 40X, and FULL). To avoid information leakage, evaluation was performed with sample-level separation, using 32 training samples and 9 held-out test samples.
ESDP supports both command-line and service-based use. The software includes a REST API implemented with FastAPI, containerized deployment, reproducible model artifacts, and dedicated verification scripts for decision logic, API behavior, benchmarking, and sensitivity analysis.
In the current benchmark, Random Forest was selected as the final model for downstream deployment and benchmarking based on its overall balance across exact classification, ordinal agreement, and error minimization on the held-out test set.
Fixed-round polishing workflows are simple to run, but they often continue processing after assembly quality has already stabilized. This can lead to unnecessary computational cost, limited visibility into convergence behavior, and little practical support for adaptive stopping decisions.
ESDP addresses this problem by using machine learning to recommend a conservative polishing strategy based on early-round assembly-quality signals. Instead of relying only on a raw classifier output, the framework combines statistical prediction with explicit decision rules to support transparent and reproducible early-stop recommendations.
During development, several challenges had to be addressed to make adaptive polishing decisions reliable and deployable.
A direct five-class formulation based on exact polishing rounds proved difficult to model robustly and was less suitable for deployment. To make the task more stable and operationally meaningful, ESDP reformulates the problem into three stopping categories:
- Early: stop after 1–2 rounds
- Medium: stop after 3–4 rounds
- Late: continue to round 5
This representation improves decision stability while preserving the ordinal structure of the polishing process.
Single-round metrics alone do not fully describe whether polishing is still improving meaningfully or has already plateaued. To address this, ESDP uses engineered features that summarize both assembly quality and convergence behavior, including:
- round-to-round delta features
- first-round normalized metrics
- ratio-based and cumulative descriptors
- plateau indicators
- domain-specific summary features
These features help the model distinguish between early convergence, transitional cases, and assemblies that still benefit from continued polishing.
Because multiple records can be derived from the same biological sample across rounds and coverage groups, row-level splitting would produce information leakage. ESDP therefore uses sample-level grouping during training and evaluation so that held-out predictions reflect genuinely unseen isolates rather than repeated observations from the same sample.
A key design principle of ESDP is that model prediction and deployment-time decision support are treated separately. The final recommendation layer applies explicit safeguards to make outputs more conservative and interpretable, including:
- first-round quality overrides
- confidence-aware escalation to more conservative recommendations
- optional forced-conservative operation
This design makes ESDP more suitable for practical workflow integration than a classifier alone.
Instead of predicting exact polishing rounds directly for deployment, ESDP predicts stopping categories that are more stable and operationally meaningful:
| Class | Strategy | Recommended rounds | Description |
|---|---|---|---|
| 1 | Early | 1–2 | Assembly quality stabilizes early |
| 2 | Medium | 3–4 | Transitional regime with moderate additional benefit |
| 3 | Late | 5 | Continued polishing is recommended |
This formulation preserves the ordinal structure of the polishing process while reducing unnecessary sensitivity to adjacent round-level differences. The held-out test results also showed that most residual errors occurred between adjacent classes rather than as extreme misclassifications.
ESDP uses a structured feature space built from per-round polishing and assembly-quality measurements. The feature set includes:
- round-to-round delta features
- ratio-based descriptors
- cumulative improvement variables
- first-round normalized metrics
- plateau indicators
- domain-specific summary features
These features were designed to capture not only absolute assembly quality, but also convergence behavior and diminishing returns across polishing rounds.
A key design element of ESDP is the separation between statistical prediction and operational decision logic. Model outputs are processed by a dedicated decision module that applies explicit safeguards to support conservative and transparent recommendations. These include:
- first-round quality overrides
- confidence-aware escalation to more conservative recommendations
- optional forced-conservative operation
Returned outputs include the predicted class, recommended number of polishing rounds, class probabilities, confidence score, decision rationale, warning flags, and metadata on applied overrides.
ESDP supports both command-line execution and service-based inference. The software includes:
- a REST API implemented with FastAPI
- containerized deployment
- bundled model artifacts and feature metadata
- verification scripts for decision logic, API behavior, and pipeline integration
- reproducible benchmarking and sensitivity-analysis workflows
This design allows the framework to function both as software and as a reproducible decision system for adaptive genome polishing.
The evaluation workflow generates standardized artifacts for inspection and reuse, including:
- trained models
- preprocessing objects
- selected feature lists
- confusion matrices
- feature-importance visualizations
- comparative performance summaries
These outputs support reproducibility, model inspection, and downstream benchmarking.
- 41 bacterial samples
- 805 polishing records
- 9 bacterial genera
- 5 polishing rounds
- 4 coverage groups: 10X, 20X, 40X, and FULL
- Sample-level split: 32 training samples / 9 held-out test samples
The current benchmark dataset was generated from Oxford Nanopore bacterial read sets assembled with Flye and processed through an iterative polishing workflow under a fixed experimental configuration.
On the held-out sample-level test set, the selected Random Forest model achieved:
| Metric | Value |
|---|---|
| Accuracy | 0.629 |
| Balanced Accuracy | 0.592 |
| Macro-F1 | 0.568 |
| MAE | 0.482 |
| Accuracy ±1 class | 0.888 |
| QWK | 0.561 |
Class-wise performance was strongest for the early and late stopping categories, while the intermediate class remained the most difficult to resolve. For Random Forest, recall was 0.822 for class 1, 0.286 for class 2, and 0.667 for class 3.
Across 34 valid test trajectories derived from 9 held-out samples, ESDP achieved:
- 0.60 CPU-hours saved per trajectory on average (95% CI: 0.44–0.76)
- 44.71% mean CPU reduction (95% CI: 32.94–56.50)
- 33/34 trajectories with zero QV loss (97.1%)
- 34/34 trajectories within the predefined acceptable loss range (100.0%)
- 200.17% mean efficiency gain relative to fixed five-round polishing
These results indicate that ESDP can provide practical computational savings while preserving assembly quality in the evaluated benchmark setting.
During development, ESDP compared multiple supervised learning strategies, including XGBoost, Random Forest, ordinal regression, and ensemble configurations. On the held-out sample-level test set, Random Forest achieved the best overall balance across exact classification, ordinal agreement, and error minimization, and was therefore selected as the final model for downstream benchmarking and deployment.
A key design principle of ESDP is the separation between statistical prediction and deployment-time decision support. Model outputs are processed through a dedicated decision layer that applies explicit safeguards to support conservative and interpretable recommendations. These safeguards include:
- first-round quality overrides
- confidence-aware escalation toward more conservative recommendations
- optional forced-conservative operation
Returned outputs include the predicted stopping class, recommended number of polishing rounds, class probabilities, confidence score, rationale, warning flags, and metadata on applied overrides.
ESDP uses engineered features designed to capture both absolute assembly quality and polishing dynamics. These include:
- base assembly and polishing metrics
- round-to-round delta features
- first-round normalized metrics
- ratio-based descriptors
- cumulative improvement variables
- plateau indicators
- domain-specific summary features
To reduce training-serving skew, preprocessing is embedded within the serialized model pipeline. Infinite values are converted to missing values, missing features are preserved until inference-time preprocessing, and imputation and scaling are applied consistently during both training and deployment.
The current benchmark dataset contains 805 polishing records derived from 41 bacterial samples, spanning 9 genera, 5 polishing rounds, and 4 coverage groups (10X, 20X, 40X, and FULL). Evaluation was performed with sample-level separation to prevent leakage between coverage-derived observations from the same isolate, resulting in 32 training samples and 9 held-out test samples.
On the held-out test set, the final Random Forest model achieved:
| Metric | Value |
|---|---|
| Accuracy | 0.629 |
| Balanced Accuracy | 0.592 |
| Macro-F1 | 0.568 |
| MAE | 0.482 |
| Accuracy ±1 class | 0.888 |
| QWK | 0.561 |
Most residual errors occurred between adjacent stopping categories rather than as extreme misclassifications. Class-wise recall for the final Random Forest model was 0.822 for class 1, 0.286 for class 2, and 0.667 for class 3.
To assess practical value, ESDP was compared against three baseline decision strategies on the held-out test set:
- Always Late (fixed five-round polishing)
- QV-threshold rule
- R1-only Random Forest
| Strategy | Balanced Accuracy | Macro-F1 | MAE | QWK |
|---|---|---|---|---|
| ESDP | 0.592 | 0.568 | 0.482 | 0.561 |
| Always Late | 0.333 | 0.231 | 0.735 | 0.000 |
| QV Threshold | 0.333 | 0.231 | 0.735 | 0.000 |
| R1-only RF | 0.566 | 0.571 | 0.465 | 0.538 |
These comparisons show that ESDP clearly outperforms naive fixed or heuristic strategies, while also indicating that first-round signals already contain substantial predictive information. The full ESDP framework retained the best overall balanced accuracy and quadratic weighted kappa.
ESDP builds on the following core tools and libraries:
- Flye - long-read assembler for bacterial genome reconstruction
- Racon - consensus polishing for long-read assemblies
- Medaka - neural network-based polishing for Oxford Nanopore data
- QUAST - assembly quality assessment
- BUSCO - genome completeness assessment
- scikit-learn - machine learning framework
- XGBoost - gradient boosting models evaluated during development
- imbalanced-learn - resampling utilities used during model development
- FastAPI - REST API layer
- Docker - containerized deployment
If you use ESDP in your research, please cite the archived Zenodo release:
@software{lucas_esdp_2026,
author = {Jimmy Lucas and Roger de Pedro},
title = {ESDP: Early Stop Decision Polishing},
year = {2026},
publisher = {Zenodo},
version = {v1.0.2},
doi = {10.5281/zenodo.18910597},
url = {https://doi.org/10.5281/zenodo.18910597}
}We thank the bioinformatics community for the development and maintenance of the open-source software that supports this project, including Flye, Racon, Medaka, QUAST, BUSCO, scikit-learn, XGBoost, FastAPI, and Docker. We also acknowledge the NCBI Sequence Read Archive for providing public access to the sequencing datasets used to construct the benchmark dataset.
- GitHub: @jimmlucas
- Bug Reports: GitHub Issues
- Feature Requests: GitHub Discussions
- Questions: GitHub Discussions Q&A
- GitHub: @AMRmicrobiology
ESDP is released under the MIT License. See the LICENSE file for details.
Current limitations of ESDP include:
- The benchmark dataset comprises 805 polishing records derived from 41 bacterial samples, so the effective biological diversity represented during development remains limited.
- The current evaluation was performed under a specific Oxford Nanopore + Flye polishing configuration, and performance may differ across other assemblers, polishers, sequencing conditions, or species distributions.
- The intermediate stopping category remains the most difficult to resolve, indicating that transitional polishing states are less separable than clear early or late stopping scenarios.
- ESDP requires the availability of the expected assembly-quality metrics and feature inputs used by the trained pipeline.
Planned directions for future development include:
- Broader external validation across additional bacterial genera, coverage regimes, and polishing workflows.
- Recalibration of confidence thresholds for different operational settings.
- Incorporation of richer early-round features to improve transitional-state detection.
- Exploration of alternative decision architectures for more robust medium-class recommendations.
- Prospective evaluation in routine deployment settings.
-
Flye
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37(5):540-546. -
Racon
Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737-746. -
Medaka
Oxford Nanopore Technologies. Medaka. Available from: https://github.com/nanoporetech/medaka -
QUAST
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072-1075. -
BUSCO
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210-3212. -
XGBoost
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 785-794. -
scikit-learn
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-2830. -
SMOTE
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-357. -
Ordinal classification
Frank E, Hall M. A simple approach to ordinal classification. In: European Conference on Machine Learning; 2001. p. 145-156.