feat(units): add an expression parser#173
Open
HaoZeke wants to merge 8 commits intometatensor:mainfrom
Open
Conversation
Port lumol's Rust expression parser to C++, enabling compound unit expressions like kJ/mol/A^2 and (eV*u)^(1/2) with automatic dimensional validation. Uses SI as internal reference frame to handle non-coherent base units correctly. Adds 2-arg unit_conversion_factor(from, to) alongside the existing 3-arg form (now deprecated). Updates internal C++ callers in model.cpp and system.cpp to the new API.
Register unit_conversion_factor_v2 TorchScript op and add Python dispatcher that routes 2-arg calls to the new parser and 3-arg calls (with deprecation warning) to the legacy wrapper. Update TorchScript callers in model.py to use v2 directly.
C++ tests (15 cases, 29 assertions): simple conversions, compound expressions, fractional powers, case insensitivity, dimension mismatch errors, unknown tokens, backward compat with 3-arg API. Python tests (12 functions): 2-arg API, 3-arg deprecation, ASE cross-validation, compound expressions, error handling, empty string identity, valid unit validation.
Replace per-quantity unit tables in misc.rst with a flat token table grouped by SI dimension, compound expression examples, and the new 2-arg API. Add changelog entries for the parser and deprecation.
- models.cpp: error message changed from "unknown unit 'X' for Y" to
"unknown unit token 'X'" after replacing per-quantity lookup with
expression parser
- models.cpp: use valid unit ("eV") in JSON serialization test instead
of "something" which the parser rejects
- model.cpp: sort quantity names in warning for deterministic output
(unordered_map iteration order is not guaranteed)
- cxx/misc.rst: disambiguate doxygen reference for overloaded
unit_conversion_factor (2-arg and 3-arg)
979302f to
92bf24c
Compare
92bf24c to
0fe0ef7
Compare
Contributor
|
Thanks a lot! I think it would be better if we can use the new functionality to check if the |
GardevoirX
reviewed
Mar 4, 2026
Address PR metatensor#173 review feedback from GardevoirX: - Add s, second, ms, us, ns, ps with full-word aliases to time tokens - Add tests verifying ModelOutput rejects mismatched quantity/unit dims - Add tests for standalone micro sign (U+00B5) -> Dalton resolution - Update docs token table and doxygen with new time unit coverage - Fix stray dash in RST list-table Dimensionless row
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #154.
Replaced the per-quantity lookup tables with a Shunting-Yard expression parser
that works on arbitrary compound unit strings in the spirit of lumol.
Each token resolves to an SI conversion factor and a 5-element dimension vector
[L, T, M, Q, Theta]. The parser composes these through multiplication, division,
and exponentiation. Conversion factor between two expressions = ratio of their
SI factors after verifying dimension equality.
API changes
Expression syntax
Operators:
*(multiply),/(divide),^(power),()(grouping).Whitespace ignored. Case-insensitive. Numeric literals allowed in exponents.
Fractional exponents via parenthesized division:
^(1/2).Token table
Single flat
unordered_mapwith 30+ entries covering length (angstrom, bohr, nm,m, cm, mm, um), energy (eV, meV, hartree, ry, joule, kcal, kJ), time (fs, ps),
mass (u, kg, g, electronmass), charge (e, coulomb), dimensionless (mol), and
derived (hbar).
Notes
kelvinis NOT in the token table because temperature conversions betweenoffset-based scales (Celsius, Fahrenheit) are non-multiplicative.
DIM_TEMPERATUREexists as dimension [0,0,0,0,1] for potential future use butno tokens currently carry it. (maybe once we do an API break, can revisit during
mini-metatomic)Contributor (creator of pull-request) checklist
Reviewer checklist