Skip to content

meaningfy-ws/mapping-suite-sdk

Repository files navigation

mapping-suite-sdk

pylint

Note: The Pylint badge is a static indicator. For actual Pylint scores, see the automated Pylint reports in PR comments generated by our CI checks.

PyPI version PyPI Downloads

Stack Overflow

Quality Gate Status Bugs Code Smells Coverage Duplicated Lines (%) Lines of Code Reliability Rating Security Rating Technical Debt Maintainability Rating Vulnerabilities

The Mapping Suite SDK, or MSSDK, is a software development kit (SDK) designed to standardize and simplify the handling of packages that contain transformation rules and related artefacts for mapping data from XML to RDF (RDF Mapping Language).

Mapping package anatomy

A mapping package is a standardized collection of files and directories that contains all the necessary components for transforming data from one format to another, specifically from XML to RDF using RDF Mapping Language (RML).

Structure Overview

A mapping package consists of the following core components:

  1. Metadata - Essential identifying information about the package including:

    • Identifier
    • Title
    • Issue date
    • Description
    • Mapping version
    • Ontology version
    • Type
    • Eligibility constraints
    • Signature (hash digest for integrity verification)
  2. Conceptual Mapping Asset - Excel spreadsheets that define high-level mapping concepts and relationships between source data and target ontologies.

  3. Technical Mapping Suite - A collection of implementation-specific mapping files:

    • RML Mapping files - Define transformations from heterogeneous data structures to RDF
  4. Vocabulary Mapping Suite - Files that define specific value transformations and mappings between source and target data values (JSON, CSV, XML).

  5. Test Data Suites - Collections of test data files used for validation and verification of mapping processes.

  6. SPARQL Test Suites - Collections of SPARQL query files used for testing and validation of the transformed data.

  7. SHACL Test Suites - Collections of SHACL (Shapes Constraint Language) files used for RDF data validation.

Package Structure Diagram

mapping-package/
├── metadata.json                  # Package metadata
├── transformation/                # Transformation assets
│   ├── conceptual_mappings.xlsx   # Excel file with conceptual mappings
│   ├── mappings/                  # Technical mapping suite
│   │   ├── mapping1.rml.ttl       # RML mapping files
│   │   ├── mapping2.rml.ttl
│   │   └── mapping3.rml.ttl
│   └── resources/                 # Vocabulary mapping suite
│       ├── codelist1.json         # Value mapping files in various formats
│       └── codelist2.csv
├── validation/                    # Validation assets
│   ├── shacl/                     # SHACL test suites
│   │   └── shacl_suite1/                # Domain-specific SHACL shapes
│   │       └── shape1.ttl         # SHACL shape files
│   └── sparql/                    # SPARQL test suites
│       └── sparql_suite1/              # Category-specific SPARQL queries
│           ├── query1.rq          # SPARQL query files
│           └── query2.rq
└── test_data/                     # Test data suites
    ├── test_data_suite1/                # Test case directory
    │   └── input.xml              # Input test data
    └── test_data_suite2/                # Another test case directory
        └── input.xml              # Input test data

The diagram above is the full layout. Not all packages include every part; structure varies by version.

Version variations:

  • v3L (lightweight) - Only what’s needed for transformation: metadata, technical mappings (RML), and vocabulary resources. It has no conceptual mapping, test_data, validation (SHACL/SPARQL), or output.
  • v1 - metadata.json with XSD version constraints (min_xsd_version); no @context.
  • v2 - metadata.json with eForms SDK constraints (eforms_sdk_versions); no @context.
  • v3 (full) - Metadata as JSON-LD (@context, project_identifier) and must include transformation/conceptual_mappings.xlsx plus the full transformation/validation/test_data layout.

This structure supports consistent loading, validation, and conversion across versions.

Quick Start

Requires Python 3.10–3.12 (Python 3.13+ is not yet supported due to pandas 2.1.4 compatibility, which is currently needed by some of our users)

Install the SDK using pip:

pip install mapping-suite-sdk

or using poetry:

poetry add mapping-suite-sdk

Supported package versions: v1, v2, v3, and v3L (v3 lightweight). Besides the CLI, you can load, convert, and validate packages programmatically using load_mapping_package (no CLI required): it auto-detects version, optionally converts to v3 or v3L, and can validate and/or persist to MongoDB. Version-specific loaders are also available.

Loading a Mapping Package

Version-agnostic (recommended): use the version-agnostic loader to load from folder with auto-detection; optionally convert to v3 or v3L and validate:

from pathlib import Path
from mapping_suite_sdk.tools.services.load_mapping_package import load_mapping_package

# Auto-detect version, convert to v3 (full) or v3L (lightweight), optionally validate
package = load_mapping_package(
    Path("/path/to/mapping/package"),
    include_test_data=True,   # True → v3, False → v3L
    validate_package=False,   # set True to validate before/after conversion
)

Version-specific: load from folder, archive, or GitHub (example for v2):

from pathlib import Path
from mapping_suite_sdk.mapping_package_v2.services.load_mapping_package_v2 import (
    load_mapping_package_v2_from_folder,
    load_mapping_package_v2_from_archive,
    load_mapping_packages_v2_from_github,
)

package = load_mapping_package_v2_from_folder(mapping_package_folder_path=Path("/path/to/package"))
# Or: load_mapping_package_v2_from_archive(...), load_mapping_packages_v2_from_github(...)

Same pattern exists for v1, v3, and v3L under mapping_suite_sdk.mapping_package_v1, mapping_package_v3, etc.

Serializing a Mapping Package

Version-specific serialisers write a package to a folder (e.g. serialise_mapping_package_v2_to_folder). Conversion (see below) also serialises in-place.

CLI: Update (convert) and validate packages

Conversion matrix: The SDK only supports converting to v3 or v3L. You can convert from v1, v2, or v3 into v3 or v3L as below. There is no conversion to v1 or v2 (e.g. v1→v2 is not implemented), and no conversion from v3 or v3L back to v1 or v2.

From → To v1 v2 v3 v3L
v1
v2
v3
v3L

Update (convert) a single package (in-place):

# v1 or v2 → v3 (full package)
mssdk convert --to-version v3 --from-version v2 from-package /path/to/package
mssdk convert --to-version v3 --from-version v1 from-package /path/to/package

# v1, v2, or v3 → v3L (lightweight)
mssdk convert --to-version v3L --from-version v3 from-package /path/to/package
mssdk convert --to-version v3L --from-version v2 from-package /path/to/package

Update all packages in a folder (in-place):

mssdk convert --to-version v3 --from-version v2 from-folder /path/to/mappings/folder
mssdk convert --to-version v3L --from-version v3 from-folder /path/to/mappings/folder

Packages already in the target version are skipped. Use --verbose for detailed logs.

Validate packages (structure, hash, etc.) from folder, archive, or GitHub:

# Validate all packages under a folder (optionally fix invalid hashes)
mssdk validate from-folder /path/to/mappings/folder
mssdk validate from-folder --update-hash /path/to/mappings/folder

# Validate a single package from a ZIP
mssdk validate from-archive path/to/package.zip
mssdk validate from-archive --include-test-data path/to/package.zip

# Validate packages from a GitHub repo
mssdk validate from-github https://github.com/org/repo mappings/*
mssdk validate from-github --branch main https://github.com/org/repo mappings/*

Validate options: --include-test-data, --include-output, --update-hash (from-folder), --branch (from-github), --verbose.

Extractors

Extract packages from ZIP archives or GitHub using ArchiveExtractor and GitHubExtractor:

from pathlib import Path
from mapping_suite_sdk.core.adapters.extractor import ArchiveExtractor, GitHubExtractor

# Archive: extract to path or temporary directory
extractor = ArchiveExtractor()
output_path = extractor.extract(Path("package.zip"), Path("output_directory"))
with extractor.extract_temporary(Path("package.zip")) as temp_path:
    pass  # cleanup automatic

# GitHub: extract packages matching a pattern
extractor = GitHubExtractor()
with extractor.extract_temporary(
    repository_url="https://github.com/org/repo",
    packages_path_pattern="mappings/package*",
    branch_or_tag_name="v1.0.0"
) as package_paths:
    for path in package_paths:
        print(f"Found package at: {path}")

MongoDB Support

Use MongoDBRepository with the model class for the version you use (e.g. MappingPackageV2, MappingPackageV3):

from pathlib import Path
from pymongo import MongoClient
from mapping_suite_sdk.core.adapters.repository import MongoDBRepository
from mapping_suite_sdk.mapping_package_v2.models.mapping_package_v2 import MappingPackageV2
from mapping_suite_sdk.mapping_package_v2.services.load_mapping_package_v2 import (
    load_mapping_package_v2_from_folder,
    load_mapping_package_v2_from_mongo_db,
)

mongo_client = MongoClient("mongodb://localhost:27017/")
repository = MongoDBRepository(
    model_class=MappingPackageV2,
    mongo_client=mongo_client,
    database_name="mapping_suites",
    collection_name="packages",
)

package = load_mapping_package_v2_from_folder(mapping_package_folder_path=Path("/path/to/package"))
repository.create(package)

retrieved_package = load_mapping_package_v2_from_mongo_db(
    mapping_package_id=package.id,
    mapping_package_repository=repository,
)
packages = repository.read_many({"metadata.version": "1.0.0"})

OpenTelemetry Tracing

Tracing is supported via the SDK config and tracer helpers. Import from the modules that define them (e.g. mapping_suite_sdk.config for mssdk_config; see docs for set_mssdk_tracing, get_mssdk_tracing, add_span_processor_to_mssdk_tracer_provider). Enable tracing and add span processors (e.g. console or OTLP) as needed; see the full documentation for examples.

Contributing

Contributions to the Mapping Suite SDK are welcome! Use fork and pull request workflow.

Development Setup

git clone https://github.com/meaningfy-ws/mapping-suite-sdk.git
cd mapping-suite-sdk
make install
make test-unit   # runs generate-models then unit tests (LinkML → Python)

Python models are generated from LinkML schemas in resources/schema/. After schema changes, run make generate-models before tests.

Dependency Restrictions

  • LinkML 1.9.5 onwards introduces breaking changes in our data
  • Click 8.2 onwards introduces breaking changes in our CLI
  • Pandas 2.1.4 and OpenTelemetry 1.29.0 are required due to a downstream consumer which relies on Airflow 2.10.x

Get in Touch

About

SDK designed to standardize and simplify the handling of packages that contain transformation rules and related artefacts for mapping data to RDF (RDF Mapping Language).

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors