| SPDX-FileCopyrightText | 2026-present Arthit Suriyawongkul |
|---|---|
| SPDX-FileType | DOCUMENTATION |
| SPDX-License-Identifier | CC0-1.0 |
Successfully implemented a complete, production-ready prototype of an SBOM (Software Bill of Materials) generator for Python projects. Supports Hatchling, Poetry, and setuptools build backends. The generator produces SPDX 3.0 compliant SBOMs in JSON-LD format.
-
SPDX 3.0 data models (
spdx-python-model)- Fully migrated to the official
spdx-python-modellibrary - Proper JSON-LD serialization and validation
- Deterministic UUIDv5 SPDX document IDs (
compute_doc_uuid) keyed on project name, version, normalized dependencies, and SHA-256 Merkle root of wheel files - Per-element sequential IDs (
generate_spdx_id) reproducible across builds
- Fully migrated to the official
-
Metadata extraction (
src/pitloom/extract/)pyproject.py-- readspyproject.toml; supports PEP 621[project], Poetry[tool.poetry](fallback when[project]is absent), and merging of both when both sections are present ([project]wins field-by-field)poetry.py-- extracts metadata from[tool.poetry]and[tool.poetry.dependencies]; converts Poetry version specifiers (^,~, bare versions) to PEP 440;[tool.poetry.group.*]dev/deploy dependency groups are intentionally excluded from the SBOMsetuptools.py-- readssetup.cfgandsetup.pyfor setuptools projects;detect_build_backend()auto-selects the right extractor;merge_metadata()fills gaps across sources (setup.cfg > setup.py)- Extracts project metadata (name, version, description, authors, URLs)
- Handles dynamic versions from
__about__.py - Parses dependency specifications with version constraints
- Returns
(ProjectMetadata, PitloomConfig)tuple
-
SPDX 3 exporter (
src/pitloom/export/spdx3_json.py)- JSON-LD output using official bindings and SHACLObjectSet
- Clean API for building SPDX documents and adding elements
- Graceful component ingestion via
spdx3.JSONLDDeserializer
-
SBOM generator (
src/pitloom/assemble/)generate_sbom()orchestrates the full pipeline- Builds
DocumentModelfrom extracted metadata - Passes
DocumentModeltobuild()assembler inassemble/spdx3/ - Merges pre-generated SBOM fragments
- Generates copyright information from metadata
-
Hatchling build hook (
src/pitloom/plugins/hatch.py)PitloomBuildHookregistered via pluggy entry point ([project.entry-points."hatch"])- Generates SBOM in
initialize(), stages to aTemporaryDirectory - Appends staged path to
build_data["sbom_files"]-- Hatchling 1.28.0+ places it at.dist-info/sboms/<filename>(PEP 770) natively finalize()cleans up the staging directory- Config:
sbom-basename,creator-name,creator-email,fragments,enabled
-
Command-line interface (
src/pitloom/__main__.py)- User-friendly argparse-based CLI
- Default output filename derived from project metadata (
{name}-{version}.spdx3.json) or[tool.pitloom] sbom-basenamewhen set - Creator information options
- Clear error messages
-
Metadata provenance tracking (
src/pitloom/extract/pyproject.py,src/pitloom/loom.py)- Tracks source of each metadata field
- Records extraction method (static, dynamic, or inferred)
- Supports dynamic introspection via
loom.pyinspection - Uses SPDX 3 comment attribute
- See docs/design/metadata-provenance.md
-
ML tracking SDK (
src/pitloom/loom.py)- Dual-syntax ContextDecorator (
@loom.runandwith loom.run) - Emits SPDX 3 SBOM fragments automatically during ML executions
- Seamlessly ingested into project SBOMs using
[tool.pitloom.fragments]config
- Dual-syntax ContextDecorator (
-
Model & provenance tests
- SPDX ID generation
- CreationMetadata serialization and provenance tracking
spdx-python-modelvalidation
-
Metadata extraction tests
- Basic metadata extraction and generic fragment paths
- Error handling for missing files
- Dynamic and build-time version extraction via
importlib.metadata
-
Generator integration tests
- End-to-end SBOM generation
- Generic fragment merging via Deserialization
-
SDK tracker tests
test_loom.pyverifies both Decorator and Context Manager tracking- Asserts caller-inspection relative path generation
- Linting: pylint 10.00/10, flake8 clean, ruff clean
- Type checking: mypy -- no issues across all source files
- Type hints: Comprehensive type annotations throughout
- Documentation: Inline docstrings for all public APIs
- README.md: Complete usage guide with examples
- docs/implementation/demo.md: Prototype capabilities and validation
- docs/implementation/demo-provenance.md: Provenance tracking demo
- docs/design/format-neutral-representation.md: Multi-format support plan
- docs/design/metadata-provenance.md: Provenance tracking specification
- docs/design/metadata-sources.md: Metadata sources research and integration plan
- docs/implementation/setuptools-support.md: Setuptools extractor design and limitations
- Inline documentation: Comprehensive docstrings
Successfully generated SPDX 3 SBOM for the reference repository:
$ loom /tmp/sentimentdemo -o sbom.spdx3.json
Generating SBOM for project in: /tmp/sentimentdemo
SBOM written to: sbom.spdx3.json
- Total Elements: 13
- CreationInfo: 1 (with timestamp and creator)
- Person: 1 (creator information)
- SpdxDocument: 1 (root document)
- software_Sbom: 1 (SBOM declaration)
- software_Package: 5 (main package + 4 dependencies)
- Relationship: 4 (dependsOn relationships)
Main package:
- Name: sentimentdemo
- Version: 0.0.2 (dynamically extracted)
- Download: https://github.com/bact/sentimentdemo
- Description: Full description preserved
Dependencies (all captured correctly):
- fasttext: 0.9.3
- newmm-tokenizer: 0.2.2
- numpy: 1.26.4
- th-simple-preprocessor: 0.10.1
This tree is the canonical reference; README.md and design docs point here.
pitloom/
├── docs/
│ ├── design/
│ │ ├── architecture-overview.md
│ │ ├── format-neutral-representation.md
│ │ ├── hatchling-build-hook.md
│ │ ├── metadata-provenance.md
│ │ ├── metadata-sources.md
│ │ ├── mlflow-extractor.md
│ │ ├── model-metadata-extraction.md
│ │ ├── protobom-evaluation.md
│ │ ├── roadmap.md # Canonical roadmap
│ │ ├── sbom-enrichment.md
│ │ └── sbom-fragments.md
│ ├── implementation/
│ │ ├── demo.md
│ │ ├── demo-provenance.md
│ │ ├── setuptools-support.md # Setuptools extractor design and limitations
│ │ └── summary.md # this file; canonical project structure
│ ├── mascot.png
│ └── resources.md
├── src/
│ └── pitloom/
│ ├── assemble/ # Layers 2+3 -- build DocumentModel + map to spec
│ │ ├── spdx3/ # SPDX 3 specific (future: spdx23, cyclonedx)
│ │ │ ├── ai.py # AI model element assembly
│ │ │ ├── dataset.py # Dataset element assembly
│ │ │ ├── deps.py # Dependency element assembly
│ │ │ ├── document.py # build(DocumentModel) -> Spdx3JsonExporter
│ │ │ ├── fragments.py # Fragment merging
│ │ │ └── __init__.py
│ │ └── __init__.py # generate_sbom() orchestrator + backend routing
│ ├── core/ # Format-neutral data models (no SBOM lib deps)
│ │ ├── ai_metadata.py # AiModelMetadata, ModelFormat
│ │ ├── config.py # PitloomConfig ([tool.pitloom] settings)
│ │ ├── creation.py # CreationMetadata (creator / timestamp)
│ │ ├── dataset_metadata.py # DatasetMetadata
│ │ ├── document.py # DocumentModel (assembled, pre-serialization)
│ │ ├── models.py # Deterministic UUIDs, Merkle root, SPDX ID generation
│ │ └── project.py # ProjectMetadata, ProjectFile
│ ├── export/ # Layer 4 -- serialise to physical format
│ │ └── spdx3_json.py # SPDX 3 JSON-LD serialiser
│ ├── extract/ # Layer 1 -- read from sources
│ │ ├── ai_model.py # AI model dispatcher + format detection
│ │ ├── _croissant.py # Croissant metadata parser
│ │ ├── _croissant_keys.py # Croissant JSON-LD key constants
│ │ ├── _extract_utils.py # Shared extraction utilities
│ │ ├── _fasttext.py # fastText (.ftz, .bin)
│ │ ├── _gguf.py # GGUF (.gguf)
│ │ ├── _hdf5.py # HDF5 / Keras v1–v2 (.h5, .hdf5)
│ │ ├── _keras.py # Keras v3 (.keras)
│ │ ├── _numpy.py # NumPy (.npy, .npz)
│ │ ├── _onnx.py # ONNX (.onnx)
│ │ ├── _pytorch.py # PyTorch classic (.pt, .pth)
│ │ ├── _pytorch_pt2.py # PyTorch PT2 / ExecuTorch (.pt2)
│ │ ├── _safetensors.py # Safetensors (.safetensors)
│ │ ├── dataset.py # Dataset metadata extraction (Croissant)
│ │ ├── poetry.py # [tool.poetry] extractor; Poetry -> PEP 440 conversion
│ │ ├── pyproject.py # pyproject.toml extractor ([project] + [tool.poetry] merge)
│ │ ├── scanner.py # Heuristic scanner for AI model files
│ │ └── setuptools.py # setup.cfg + setup.py extractor; backend detection; merge
│ ├── plugins/ # Build-system integrations
│ │ └── hatch.py # Hatchling BuildHookInterface (PEP 770)
│ ├── __about__.py # Package version (__version__)
│ ├── __init__.py
│ ├── __main__.py # CLI entry point (loom / python -m pitloom)
│ ├── loom.py # ML tracking SDK (Run context manager / decorator)
│ └── py.typed # PEP 561 marker
├── tests/
│ ├── fixtures/
│ │ ├── croissant/ # Croissant dataset metadata fixtures
│ │ ├── fasttext/ # fastText model fixtures
│ │ ├── fragments/ # Pre-generated SPDX 3 fragment fixtures
│ │ ├── gguf/ # GGUF model fixtures
│ │ ├── hdf5/ # HDF5 / Keras model fixtures
│ │ ├── keras/ # Keras v3 model fixtures
│ │ ├── numpy/ # NumPy array fixtures
│ │ ├── onnx/ # ONNX model fixtures
│ │ ├── pytorch/ # PyTorch classic model fixtures
│ │ ├── pytorch_pt2/ # PyTorch PT2 / ExecuTorch fixtures
│ │ ├── safetensors/ # Safetensors model fixtures
│ │ ├── sampleproject-hatchling/ # Minimal Hatchling wheel-build fixture
│ │ ├── sampleproject-poetry/ # Real-world Poetry fixture (mistral-inference)
│ │ ├── sampleproject-setuptools/ # Minimal setuptools metadata fixture
│ │ ├── sentimentdemo-handcrafted.spdx3.json
│ │ └── README.md
│ ├── conftest.py
│ ├── test_dataset_metadata.py
│ ├── test_extract_ai_model.py
│ ├── test_extract_croissant.py
│ ├── test_extract_fasttext.py
│ ├── test_extract_gguf.py
│ ├── test_extract_hdf5.py
│ ├── test_extract_keras.py
│ ├── test_extract_numpy.py
│ ├── test_extract_onnx.py
│ ├── test_extract_pytorch.py
│ ├── test_extract_pytorch_pt2.py
│ ├── test_extract_safetensors.py
│ ├── test_fragments.py
│ ├── test_generator.py
│ ├── test_hatch_hook.py
│ ├── test_jcs.py
│ ├── test_loom.py
│ ├── test_main_cli.py
│ ├── test_metadata.py
│ ├── test_models.py
│ ├── test_provenance.py
│ ├── test_poetry.py
│ ├── test_setuptools.py
│ ├── test_spdx3_compliance.py
│ ├── test_spdx3_dataset.py
│ └── test_wheel_integration.py
├── AGENTS.md
├── CHANGELOG.md
├── CITATION.cff
├── LICENSE
├── README.md
├── codemeta.json
└── pyproject.toml # Project config and Hatchling build settings
- Easy to add new extractors (PDM, Flit, etc.)
- Easy to add new assemblers/exporters (CycloneDX, AIDOC, etc.) consuming
the same
DocumentModel-- no changes to extractors needed - Clean separation of concerns: extractors ->
DocumentModel-> serializers
- src-layout for proper package structure
- Type hints with Python 3.10+ compatibility
- Comprehensive error handling
- Runtime dependencies kept minimal and declared in
pyproject.toml
| Feature | Reference SBOM | Pitloom Generated | Status |
|---|---|---|---|
| SPDX 3.0 Structure | ✅ | ✅ | ✅ Complete |
| Package Metadata | ✅ | ✅ | ✅ Complete |
| Dependencies | ✅ | ✅ | ✅ Complete |
| Relationships | ✅ | ✅ | ✅ Complete |
| File-level Details | ✅ | 🔄 Roadmap | |
| AI/Dataset Profiles | ✅ | ✅ | ✅ Complete |
| License Expressions | ✅ | 🔄 Roadmap |
Legend:
- ✅ Complete: Fully implemented
⚠️ Basic: Core functionality present, enhancements planned- 🔄 Roadmap: Planned for future releases
See docs/design/roadmap.md for the canonical, up-to-date roadmap.