All notable changes to the DataBUS project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.0.0 - 2026-03-05
- Universal YAML template (
data/template_example.yml). - Example CSV data file (
data/data_example.csv) demonstrating the full column set. - Comprehensive test suite with coverage reporting via Codecov.
- CI pipeline with Ruff linting, pytest + coverage, and Codecov upload (
.github/workflows/ci.yml). - MkDocs documentation site with auto-generated API reference via mkdocstrings.
- Tutorials rewritten to reflect the actual two-pass workflow (
databus_example.py). - OpenSSF Best Practices badge tracking.
- Major refactor of the validation/upload architecture (BU-334, BU-349): each validator now also handles insertion when a populated
databusdict is supplied, eliminating the separateneotomaUploadermodule and reducing code duplication. - Refactored
pull_paramsinto smaller, testable helper functions inutils.py, removing the dependency on pandas. - Contact handling consolidated: all contact types (PI, collector, processor, analyst) now go through
valid_contact, with chronology modeler assignment handled withinvalid_chronologies. This significantly reduces repeated code. - Data upload now tracks inserted IDs so that data uncertainties can be linked correctly.
- Chronology handling improved to properly manage calendar years, default chronologies, and sample age linkage.
- Geopolitical unit insertion updated to handle entities like Scotland under the UK.
- Improved logging with
logging_dictand per-file.valid.logoutput. - Adopted Ruff as the sole linter and formatter, replacing previous tooling.
- Switched to
uvfor dependency management and script execution.
- Chron controls now handle calendar years properly.
- U-Th series insertion works correctly when the number of geochron indices differs from sample indices.
- Fixed dataset–publication and dataset–database linking during upload.
- Fixed collector insertion for NODE community datasets.
- Fixed variable validation to handle null values without comparing null against null.
- Numerous typos across
chroncontrols.py,sample.py,Chronology.py, and others.
1.0.0 - 2025-11-27
- Support for speleothem datasets (SISAL community): U-Th series, external speleothem data, speleothem reference inserts, and entity samples.
ExternalSpeleothemclass and correspondingvalid_external_speleothemvalidator.UThSeriesclass with independent insertion of U-series analytical data.- Lead-210 (
210Pb) community support with lead model classes and geochronology workflows. - Ostracode surface sample support.
- Script for batch speleothem reference inserts after initial upload.
hash_fileandcheck_filehelpers for file integrity verification before upload.safe_stepwrapper for error-safe validation with automatic logging and rollback.CITATION.cfffor academic citation.code_of_conduct.md.
- Expanded contact name parsing to handle initials and periods in given names.
- Improved handling of diverse data groups across communities.
- Geochronology data handling for SISAL-specific dating methods.
- Entity cover insertion errors in the database layer.
- Various fixes for community-specific edge cases (NODE, 210Pb, SISAL).
- Initial release of DataBUS.
- Core data classes:
Site,CollectionUnit,AnalysisUnit,Sample,Dataset,Datum,Variable,Chronology,ChronControl,Geochron,GeochronControl,Contact,Geog,Hiatus,Response. - Validation framework with
neotomaValidatormodule. - Helper utilities:
template_to_dict,read_csv,pull_params,pull_required. - CLI argument parsing via
parse_arguments. - Basic pollen dataset upload workflow.
[]: https://github.com/NeotomaDB/DataBUS/releases/tag/v0.0.1