SPDX-FileCopyrightText	2026-present Arthit Suriyawongkul
SPDX-FileType	DOCUMENTATION
SPDX-License-Identifier	CC0-1.0

Pitloom SBOM generator - implementation summary

Project overview

Successfully implemented a complete, production-ready prototype of an SBOM (Software Bill of Materials) generator for Python projects. Supports Hatchling, Poetry, and setuptools build backends. The generator produces SPDX 3.0 compliant SBOMs in JSON-LD format.

What was delivered

✅ Core functionality

SPDX 3.0 data models (spdx-python-model)
- Fully migrated to the official spdx-python-model library
- Proper JSON-LD serialization and validation
- Deterministic UUIDv5 SPDX document IDs (compute_doc_uuid) keyed on project name, version, normalized dependencies, and SHA-256 Merkle root of wheel files
- Per-element sequential IDs (generate_spdx_id) reproducible across builds
Metadata extraction (src/pitloom/extract/)
- pyproject.py -- reads pyproject.toml; supports PEP 621 [project], Poetry [tool.poetry] (fallback when [project] is absent), and merging of both when both sections are present ([project] wins field-by-field)
- poetry.py -- extracts metadata from [tool.poetry] and [tool.poetry.dependencies]; converts Poetry version specifiers (^, ~, bare versions) to PEP 440; [tool.poetry.group.*] dev/deploy dependency groups are intentionally excluded from the SBOM
- setuptools.py -- reads setup.cfg and setup.py for setuptools projects; detect_build_backend() auto-selects the right extractor; merge_metadata() fills gaps across sources (setup.cfg > setup.py)
- Extracts project metadata (name, version, description, authors, URLs)
- Handles dynamic versions from __about__.py
- Parses dependency specifications with version constraints
- Returns (ProjectMetadata, PitloomConfig) tuple
SPDX 3 exporter (src/pitloom/export/spdx3_json.py)
- JSON-LD output using official bindings and SHACLObjectSet
- Clean API for building SPDX documents and adding elements
- Graceful component ingestion via spdx3.JSONLDDeserializer
SBOM generator (src/pitloom/assemble/)
- generate_sbom() orchestrates the full pipeline
- Builds DocumentModel from extracted metadata
- Passes DocumentModel to build() assembler in assemble/spdx3/
- Merges pre-generated SBOM fragments
- Generates copyright information from metadata
Hatchling build hook (src/pitloom/plugins/hatch.py)
- PitloomBuildHook registered via pluggy entry point ([project.entry-points."hatch"])
- Generates SBOM in initialize(), stages to a TemporaryDirectory
- Appends staged path to build_data["sbom_files"] -- Hatchling 1.28.0+ places it at .dist-info/sboms/<filename> (PEP 770) natively
- finalize() cleans up the staging directory
- Config: sbom-basename, creator-name, creator-email, fragments, enabled
Command-line interface (src/pitloom/__main__.py)
- User-friendly argparse-based CLI
- Default output filename derived from project metadata ({name}-{version}.spdx3.json) or [tool.pitloom] sbom-basename when set
- Creator information options
- Clear error messages
Metadata provenance tracking (src/pitloom/extract/pyproject.py, src/pitloom/loom.py)
- Tracks source of each metadata field
- Records extraction method (static, dynamic, or inferred)
- Supports dynamic introspection via loom.py inspection
- Uses SPDX 3 comment attribute
- See docs/design/metadata-provenance.md
ML tracking SDK (src/pitloom/loom.py)
- Dual-syntax ContextDecorator (@loom.run and with loom.run)
- Emits SPDX 3 SBOM fragments automatically during ML executions
- Seamlessly ingested into project SBOMs using [tool.pitloom.fragments] config

✅ Testing (comprehensive coverage - all passing)

Model & provenance tests
- SPDX ID generation
- CreationMetadata serialization and provenance tracking
- spdx-python-model validation
Metadata extraction tests
- Basic metadata extraction and generic fragment paths
- Error handling for missing files
- Dynamic and build-time version extraction via importlib.metadata
Generator integration tests
- End-to-end SBOM generation
- Generic fragment merging via Deserialization
SDK tracker tests
- test_loom.py verifies both Decorator and Context Manager tracking
- Asserts caller-inspection relative path generation

✅ Quality assurance

Linting: pylint 10.00/10, flake8 clean, ruff clean
Type checking: mypy -- no issues across all source files
Type hints: Comprehensive type annotations throughout
Documentation: Inline docstrings for all public APIs

✅ Documentation

README.md: Complete usage guide with examples
docs/implementation/demo.md: Prototype capabilities and validation
docs/implementation/demo-provenance.md: Provenance tracking demo
docs/design/format-neutral-representation.md: Multi-format support plan
docs/design/metadata-provenance.md: Provenance tracking specification
docs/design/metadata-sources.md: Metadata sources research and integration plan
docs/implementation/setuptools-support.md: Setuptools extractor design and limitations
Inline documentation: Comprehensive docstrings

Validation with sentimentdemo

Successfully generated SPDX 3 SBOM for the reference repository:

$ loom /tmp/sentimentdemo -o sbom.spdx3.json
Generating SBOM for project in: /tmp/sentimentdemo
SBOM written to: sbom.spdx3.json

Generated SBOM structure

Total Elements: 13
CreationInfo: 1 (with timestamp and creator)
Person: 1 (creator information)
SpdxDocument: 1 (root document)
software_Sbom: 1 (SBOM declaration)
software_Package: 5 (main package + 4 dependencies)
Relationship: 4 (dependsOn relationships)

Captured information

Main package:

Name: sentimentdemo
Version: 0.0.2 (dynamically extracted)
Download: https://github.com/bact/sentimentdemo
Description: Full description preserved

Dependencies (all captured correctly):

fasttext: 0.9.3
newmm-tokenizer: 0.2.2
numpy: 1.26.4
th-simple-preprocessor: 0.10.1

Technical achievements

1. Clean architecture

This tree is the canonical reference; README.md and design docs point here.

pitloom/
├── docs/
│   ├── design/
│   │   ├── architecture-overview.md
│   │   ├── format-neutral-representation.md
│   │   ├── hatchling-build-hook.md
│   │   ├── metadata-provenance.md
│   │   ├── metadata-sources.md
│   │   ├── mlflow-extractor.md
│   │   ├── model-metadata-extraction.md
│   │   ├── protobom-evaluation.md
│   │   ├── roadmap.md             # Canonical roadmap
│   │   ├── sbom-enrichment.md
│   │   └── sbom-fragments.md
│   ├── implementation/
│   │   ├── demo.md
│   │   ├── demo-provenance.md
│   │   ├── setuptools-support.md  # Setuptools extractor design and limitations
│   │   └── summary.md             # this file; canonical project structure
│   ├── mascot.png
│   └── resources.md
├── src/
│   └── pitloom/
│       ├── assemble/            # Layers 2+3 -- build DocumentModel + map to spec
│       │   ├── spdx3/           # SPDX 3 specific (future: spdx23, cyclonedx)
│       │   │   ├── ai.py        # AI model element assembly
│       │   │   ├── dataset.py   # Dataset element assembly
│       │   │   ├── deps.py      # Dependency element assembly
│       │   │   ├── document.py  # build(DocumentModel) -> Spdx3JsonExporter
│       │   │   ├── fragments.py # Fragment merging
│       │   │   └── __init__.py
│       │   └── __init__.py      # generate_sbom() orchestrator + backend routing
│       ├── core/                # Format-neutral data models (no SBOM lib deps)
│       │   ├── ai_metadata.py      # AiModelMetadata, ModelFormat
│       │   ├── config.py           # PitloomConfig ([tool.pitloom] settings)
│       │   ├── creation.py         # CreationMetadata (creator / timestamp)
│       │   ├── dataset_metadata.py # DatasetMetadata
│       │   ├── document.py         # DocumentModel (assembled, pre-serialization)
│       │   ├── models.py           # Deterministic UUIDs, Merkle root, SPDX ID generation
│       │   └── project.py          # ProjectMetadata, ProjectFile
│       ├── export/              # Layer 4 -- serialise to physical format
│       │   └── spdx3_json.py    # SPDX 3 JSON-LD serialiser
│       ├── extract/             # Layer 1 -- read from sources
│       │   ├── ai_model.py         # AI model dispatcher + format detection
│       │   ├── _croissant.py       # Croissant metadata parser
│       │   ├── _croissant_keys.py  # Croissant JSON-LD key constants
│       │   ├── _extract_utils.py   # Shared extraction utilities
│       │   ├── _fasttext.py        # fastText (.ftz, .bin)
│       │   ├── _gguf.py            # GGUF (.gguf)
│       │   ├── _hdf5.py            # HDF5 / Keras v1–v2 (.h5, .hdf5)
│       │   ├── _keras.py           # Keras v3 (.keras)
│       │   ├── _numpy.py           # NumPy (.npy, .npz)
│       │   ├── _onnx.py            # ONNX (.onnx)
│       │   ├── _pytorch.py         # PyTorch classic (.pt, .pth)
│       │   ├── _pytorch_pt2.py     # PyTorch PT2 / ExecuTorch (.pt2)
│       │   ├── _safetensors.py     # Safetensors (.safetensors)
│       │   ├── dataset.py          # Dataset metadata extraction (Croissant)
│       │   ├── poetry.py           # [tool.poetry] extractor; Poetry -> PEP 440 conversion
│       │   ├── pyproject.py        # pyproject.toml extractor ([project] + [tool.poetry] merge)
│       │   ├── scanner.py          # Heuristic scanner for AI model files
│       │   └── setuptools.py       # setup.cfg + setup.py extractor; backend detection; merge
│       ├── plugins/             # Build-system integrations
│       │   └── hatch.py         # Hatchling BuildHookInterface (PEP 770)
│       ├── __about__.py         # Package version (__version__)
│       ├── __init__.py
│       ├── __main__.py          # CLI entry point (loom / python -m pitloom)
│       ├── loom.py              # ML tracking SDK (Run context manager / decorator)
│       └── py.typed             # PEP 561 marker
├── tests/
│   ├── fixtures/
│   │   ├── croissant/           # Croissant dataset metadata fixtures
│   │   ├── fasttext/            # fastText model fixtures
│   │   ├── fragments/           # Pre-generated SPDX 3 fragment fixtures
│   │   ├── gguf/                # GGUF model fixtures
│   │   ├── hdf5/                # HDF5 / Keras model fixtures
│   │   ├── keras/               # Keras v3 model fixtures
│   │   ├── numpy/               # NumPy array fixtures
│   │   ├── onnx/                # ONNX model fixtures
│   │   ├── pytorch/             # PyTorch classic model fixtures
│   │   ├── pytorch_pt2/         # PyTorch PT2 / ExecuTorch fixtures
│   │   ├── safetensors/         # Safetensors model fixtures
│   │   ├── sampleproject-hatchling/   # Minimal Hatchling wheel-build fixture
│   │   ├── sampleproject-poetry/      # Real-world Poetry fixture (mistral-inference)
│   │   ├── sampleproject-setuptools/  # Minimal setuptools metadata fixture
│   │   ├── sentimentdemo-handcrafted.spdx3.json
│   │   └── README.md
│   ├── conftest.py
│   ├── test_dataset_metadata.py
│   ├── test_extract_ai_model.py
│   ├── test_extract_croissant.py
│   ├── test_extract_fasttext.py
│   ├── test_extract_gguf.py
│   ├── test_extract_hdf5.py
│   ├── test_extract_keras.py
│   ├── test_extract_numpy.py
│   ├── test_extract_onnx.py
│   ├── test_extract_pytorch.py
│   ├── test_extract_pytorch_pt2.py
│   ├── test_extract_safetensors.py
│   ├── test_fragments.py
│   ├── test_generator.py
│   ├── test_hatch_hook.py
│   ├── test_jcs.py
│   ├── test_loom.py
│   ├── test_main_cli.py
│   ├── test_metadata.py
│   ├── test_models.py
│   ├── test_provenance.py
│   ├── test_poetry.py
│   ├── test_setuptools.py
│   ├── test_spdx3_compliance.py
│   ├── test_spdx3_dataset.py
│   └── test_wheel_integration.py
├── AGENTS.md
├── CHANGELOG.md
├── CITATION.cff
├── LICENSE
├── README.md
├── codemeta.json
└── pyproject.toml               # Project config and Hatchling build settings

2. Extensible design

Easy to add new extractors (PDM, Flit, etc.)
Easy to add new assemblers/exporters (CycloneDX, AIDOC, etc.) consuming the same DocumentModel -- no changes to extractors needed
Clean separation of concerns: extractors -> DocumentModel -> serializers

3. Best practices

src-layout for proper package structure
Type hints with Python 3.10+ compatibility
Comprehensive error handling
Runtime dependencies kept minimal and declared in pyproject.toml

Comparison with reference SBOM

Feature	Reference SBOM	Pitloom Generated	Status
SPDX 3.0 Structure	✅	✅	✅ Complete
Package Metadata	✅	✅	✅ Complete
Dependencies	✅	✅	✅ Complete
Relationships	✅	✅	✅ Complete
File-level Details	✅	⚠️	🔄 Roadmap
AI/Dataset Profiles	✅	✅	✅ Complete
License Expressions	✅	⚠️	🔄 Roadmap

Legend:

✅ Complete: Fully implemented
⚠️ Basic: Core functionality present, enhancements planned
🔄 Roadmap: Planned for future releases

Roadmap

See docs/design/roadmap.md for the canonical, up-to-date roadmap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pitloom SBOM generator - implementation summary

Project overview

What was delivered

✅ Core functionality

✅ Testing (comprehensive coverage - all passing)

✅ Quality assurance

✅ Documentation

Validation with sentimentdemo

Generated SBOM structure

Captured information

Technical achievements

1. Clean architecture

2. Extensible design

3. Best practices

Comparison with reference SBOM

Roadmap

FilesExpand file tree

summary.md

Latest commit

History

summary.md

File metadata and controls

Pitloom SBOM generator - implementation summary

Project overview

What was delivered

✅ Core functionality

✅ Testing (comprehensive coverage - all passing)

✅ Quality assurance

✅ Documentation

Validation with sentimentdemo

Generated SBOM structure

Captured information

Technical achievements

1. Clean architecture

2. Extensible design

3. Best practices

Comparison with reference SBOM

Roadmap