Note: The Pylint badge is a static indicator. For actual Pylint scores, see the automated Pylint reports in PR comments generated by our CI checks.
The Mapping Suite SDK, or MSSDK, is a software development kit (SDK) designed to standardize and simplify the handling of packages that contain transformation rules and related artefacts for mapping data from XML to RDF (RDF Mapping Language).
A mapping package is a standardized collection of files and directories that contains all the necessary components for transforming data from one format to another, specifically from XML to RDF using RDF Mapping Language (RML).
A mapping package consists of the following core components:
-
Metadata - Essential identifying information about the package including:
- Identifier
- Title
- Issue date
- Description
- Mapping version
- Ontology version
- Type
- Eligibility constraints
- Signature (hash digest for integrity verification)
-
Conceptual Mapping Asset - Excel spreadsheets that define high-level mapping concepts and relationships between source data and target ontologies.
-
Technical Mapping Suite - A collection of implementation-specific mapping files:
- RML Mapping files - Define transformations from heterogeneous data structures to RDF
-
Vocabulary Mapping Suite - Files that define specific value transformations and mappings between source and target data values (JSON, CSV, XML).
-
Test Data Suites - Collections of test data files used for validation and verification of mapping processes.
-
SPARQL Test Suites - Collections of SPARQL query files used for testing and validation of the transformed data.
-
SHACL Test Suites - Collections of SHACL (Shapes Constraint Language) files used for RDF data validation.
mapping-package/
├── metadata.json # Package metadata
├── transformation/ # Transformation assets
│ ├── conceptual_mappings.xlsx # Excel file with conceptual mappings
│ ├── mappings/ # Technical mapping suite
│ │ ├── mapping1.rml.ttl # RML mapping files
│ │ ├── mapping2.rml.ttl
│ │ └── mapping3.rml.ttl
│ └── resources/ # Vocabulary mapping suite
│ ├── codelist1.json # Value mapping files in various formats
│ └── codelist2.csv
├── validation/ # Validation assets
│ ├── shacl/ # SHACL test suites
│ │ └── shacl_suite1/ # Domain-specific SHACL shapes
│ │ └── shape1.ttl # SHACL shape files
│ └── sparql/ # SPARQL test suites
│ └── sparql_suite1/ # Category-specific SPARQL queries
│ ├── query1.rq # SPARQL query files
│ └── query2.rq
└── test_data/ # Test data suites
├── test_data_suite1/ # Test case directory
│ └── input.xml # Input test data
└── test_data_suite2/ # Another test case directory
└── input.xml # Input test data
The diagram above is the full layout. Not all packages include every part; structure varies by version.
Version variations:
- v3L (lightweight) - Only what’s needed for transformation: metadata, technical mappings (RML), and vocabulary resources. It has no conceptual mapping, test_data, validation (SHACL/SPARQL), or output.
- v1 -
metadata.jsonwith XSD version constraints (min_xsd_version); no@context. - v2 -
metadata.jsonwith eForms SDK constraints (eforms_sdk_versions); no@context. - v3 (full) - Metadata as JSON-LD (
@context,project_identifier) and must includetransformation/conceptual_mappings.xlsxplus the full transformation/validation/test_data layout.
This structure supports consistent loading, validation, and conversion across versions.
Requires Python 3.10–3.12 (Python 3.13+ is not yet supported due to pandas 2.1.4 compatibility, which is currently needed by some of our users)
Install the SDK using pip:
pip install mapping-suite-sdkor using poetry:
poetry add mapping-suite-sdkSupported package versions: v1, v2, v3, and v3L (v3 lightweight). Besides the CLI, you can load, convert, and validate packages programmatically using load_mapping_package (no CLI required): it auto-detects version, optionally converts to v3 or v3L, and can validate and/or persist to MongoDB. Version-specific loaders are also available.
Version-agnostic (recommended): use the version-agnostic loader to load from folder with auto-detection; optionally convert to v3 or v3L and validate:
from pathlib import Path
from mapping_suite_sdk.tools.services.load_mapping_package import load_mapping_package
# Auto-detect version, convert to v3 (full) or v3L (lightweight), optionally validate
package = load_mapping_package(
Path("/path/to/mapping/package"),
include_test_data=True, # True → v3, False → v3L
validate_package=False, # set True to validate before/after conversion
)Version-specific: load from folder, archive, or GitHub (example for v2):
from pathlib import Path
from mapping_suite_sdk.mapping_package_v2.services.load_mapping_package_v2 import (
load_mapping_package_v2_from_folder,
load_mapping_package_v2_from_archive,
load_mapping_packages_v2_from_github,
)
package = load_mapping_package_v2_from_folder(mapping_package_folder_path=Path("/path/to/package"))
# Or: load_mapping_package_v2_from_archive(...), load_mapping_packages_v2_from_github(...)Same pattern exists for v1, v3, and v3L under mapping_suite_sdk.mapping_package_v1, mapping_package_v3, etc.
Version-specific serialisers write a package to a folder (e.g. serialise_mapping_package_v2_to_folder). Conversion (see below) also serialises in-place.
Conversion matrix: The SDK only supports converting to v3 or v3L. You can convert from v1, v2, or v3 into v3 or v3L as below. There is no conversion to v1 or v2 (e.g. v1→v2 is not implemented), and no conversion from v3 or v3L back to v1 or v2.
| From → To | v1 | v2 | v3 | v3L |
|---|---|---|---|---|
| v1 | — | ✗ | ✓ | ✓ |
| v2 | ✗ | — | ✓ | ✓ |
| v3 | ✗ | ✗ | — | ✓ |
| v3L | ✗ | ✗ | ✗ | — |
Update (convert) a single package (in-place):
# v1 or v2 → v3 (full package)
mssdk convert --to-version v3 --from-version v2 from-package /path/to/package
mssdk convert --to-version v3 --from-version v1 from-package /path/to/package
# v1, v2, or v3 → v3L (lightweight)
mssdk convert --to-version v3L --from-version v3 from-package /path/to/package
mssdk convert --to-version v3L --from-version v2 from-package /path/to/packageUpdate all packages in a folder (in-place):
mssdk convert --to-version v3 --from-version v2 from-folder /path/to/mappings/folder
mssdk convert --to-version v3L --from-version v3 from-folder /path/to/mappings/folderPackages already in the target version are skipped. Use --verbose for detailed logs.
Validate packages (structure, hash, etc.) from folder, archive, or GitHub:
# Validate all packages under a folder (optionally fix invalid hashes)
mssdk validate from-folder /path/to/mappings/folder
mssdk validate from-folder --update-hash /path/to/mappings/folder
# Validate a single package from a ZIP
mssdk validate from-archive path/to/package.zip
mssdk validate from-archive --include-test-data path/to/package.zip
# Validate packages from a GitHub repo
mssdk validate from-github https://github.com/org/repo mappings/*
mssdk validate from-github --branch main https://github.com/org/repo mappings/*Validate options: --include-test-data, --include-output, --update-hash (from-folder), --branch (from-github), --verbose.
Extract packages from ZIP archives or GitHub using ArchiveExtractor and GitHubExtractor:
from pathlib import Path
from mapping_suite_sdk.core.adapters.extractor import ArchiveExtractor, GitHubExtractor
# Archive: extract to path or temporary directory
extractor = ArchiveExtractor()
output_path = extractor.extract(Path("package.zip"), Path("output_directory"))
with extractor.extract_temporary(Path("package.zip")) as temp_path:
pass # cleanup automatic
# GitHub: extract packages matching a pattern
extractor = GitHubExtractor()
with extractor.extract_temporary(
repository_url="https://github.com/org/repo",
packages_path_pattern="mappings/package*",
branch_or_tag_name="v1.0.0"
) as package_paths:
for path in package_paths:
print(f"Found package at: {path}")Use MongoDBRepository with the model class for the version you use (e.g. MappingPackageV2, MappingPackageV3):
from pathlib import Path
from pymongo import MongoClient
from mapping_suite_sdk.core.adapters.repository import MongoDBRepository
from mapping_suite_sdk.mapping_package_v2.models.mapping_package_v2 import MappingPackageV2
from mapping_suite_sdk.mapping_package_v2.services.load_mapping_package_v2 import (
load_mapping_package_v2_from_folder,
load_mapping_package_v2_from_mongo_db,
)
mongo_client = MongoClient("mongodb://localhost:27017/")
repository = MongoDBRepository(
model_class=MappingPackageV2,
mongo_client=mongo_client,
database_name="mapping_suites",
collection_name="packages",
)
package = load_mapping_package_v2_from_folder(mapping_package_folder_path=Path("/path/to/package"))
repository.create(package)
retrieved_package = load_mapping_package_v2_from_mongo_db(
mapping_package_id=package.id,
mapping_package_repository=repository,
)
packages = repository.read_many({"metadata.version": "1.0.0"})Tracing is supported via the SDK config and tracer helpers. Import from the modules that define them (e.g. mapping_suite_sdk.config for mssdk_config; see docs for set_mssdk_tracing, get_mssdk_tracing, add_span_processor_to_mssdk_tracer_provider). Enable tracing and add span processors (e.g. console or OTLP) as needed; see the full documentation for examples.
Contributions to the Mapping Suite SDK are welcome! Use fork and pull request workflow.
git clone https://github.com/meaningfy-ws/mapping-suite-sdk.git
cd mapping-suite-sdk
make install
make test-unit # runs generate-models then unit tests (LinkML → Python)Python models are generated from LinkML schemas in resources/schema/. After schema changes, run make generate-models before tests.
- LinkML 1.9.5 onwards introduces breaking changes in our data
- Click 8.2 onwards introduces breaking changes in our CLI
- Pandas 2.1.4 and OpenTelemetry 1.29.0 are required due to a downstream consumer which relies on Airflow 2.10.x
- Issues: Report bugs and feature requests on our GitHub Issues
- Email: Contact the team at hi@meaningfy.ws
- Website: Visit our website at meaningfy.ws