biomapper2

This is a package for mapping biomedical entities to the KRAKEN knowledge graph, whether starting from text names or vocabulary/ontology IDs (local IDs or CURIEs).

It supports both single-entity lookups and dataset-level batch processing, and does:

entity linking (text name → CURIE)
ID normalization (messy local ID → CURIE)
entity resolution (CURIE → canonical CURIE, by leveraging the CURIE equivalencies in the KRAKEN knowledge graph)

All CURIEs are represented in Biolink-standard format.

⚠️ Note: This package is in active development. Feedback and issues welcome!

Quick access via PyPI

If you just want to map entities against the hosted production API without running your own server, install the lightweight client package:

pip install ddharmon

See ddharmon on PyPI for usage. It wraps the same REST API documented below, pointed at the production Kestrel instance.

Setup

Install uv (if not already installed)

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

For other platforms, see uv installation docs.

Clone and install

git clone https://github.com/Phenome-Health/biomapper2.git
cd biomapper2
uv sync --dev

This will create a virtual environment and install all dependencies.

Then create a .env file from the template:

cp .env.example .env

Edit .env to fill in your KESTREL_API_KEY. The file also contains KESTREL_API_URL and optional API authentication settings — see the comments in .env.example for details.

Then run the pytest suite to confirm all is working.

Usage

Map a single entity to knowledge graph

from biomapper2.mapper import Mapper

mapper = Mapper()

item = {
    'name': 'carnitine',
    'kegg': ['C00487'],
    'pubchem': '10917'
}

mapped_item = mapper.map_entity_to_kg(
    item=item,
    name_field='name',
    provided_id_fields=['kegg', 'pubchem'],
    entity_type='metabolite'
)

Map a dataset to knowledge graph

from biomapper2.mapper import Mapper

mapper = Mapper()

mapper.map_dataset_to_kg(
    dataset='data/examples/olink_protein_metadata.tsv',
    entity_type='protein',
    name_column='Assay',
    provided_id_columns=['UniProt'],
    array_delimiters=['_']
)

See examples/ for complete working examples.

REST API

biomapper2 includes a FastAPI server that exposes the mapping pipeline over HTTP.

Run the server

# Local development (with hot reload)
uv run uvicorn biomapper2.api.main:app --reload --port 8001

# Or via Docker
docker compose --profile prod up -d

The API docs are available at:

Swagger UI: http://localhost:8001/api/v1/docs
ReDoc: http://localhost:8001/api/v1/redoc
OpenAPI spec: http://localhost:8001/api/v1/openapi.json

Endpoints

Method	Endpoint	Description
GET	`/api/v1/health`	Health check (no auth required)
GET	`/api/v1/entity-types`	List supported entity types
GET	`/api/v1/annotators`	List available annotators
GET	`/api/v1/vocabularies`	List supported vocabularies
POST	`/api/v1/map/entity`	Map a single entity
POST	`/api/v1/map/batch`	Map multiple entities (max 1000)
POST	`/api/v1/map/dataset`	Map an uploaded TSV/CSV file
POST	`/api/v1/map/dataset/stream`	Stream mapping results as NDJSON

Authentication

Set BIOMAPPER_API_KEY or BIOMAPPER2_API_KEYS (comma-separated) in your .env file to require API key authentication via the X-API-Key header. If no keys are configured, the API runs in open-access mode.

Docker

Quick start

cp .env.example .env    # then edit .env with your KESTREL_API_KEY

docker compose --profile prod up -d
curl http://localhost:8001/api/v1/health

Development

# Start with live code reload (mounts src/ and tests/)
docker compose --profile dev up

# Run tests inside the container
docker compose --profile dev run --rm biomapper2-dev uv run pytest -m "not integration" -v

# Run quality checks
docker compose --profile dev run --rm biomapper2-dev ./scripts/check.sh

Building manually

docker build --target prod -t biomapper2 .       # Production image
docker build --target dev -t biomapper2:dev .     # Development image

Generate KG-performance across datasets

from biomapper.visualizer import Visualizer

viz = Visualizer()

# collect metrics from jsons named {dataset}_{entity}_MAPPED_a_summary_stats.json
stats_df = viz.aggregate_stats(
    stats_dir='data/examples/synthetic_stats/'
)

viz.render_heatmap(
    df=stats_df,
    output_path='docs/assets/comparison_viz' # defaults to producing pdf and png, configurable via Visualizer(
)

Run examples

uv run python examples/basic_entity_kg_mapping.py
uv run python examples/basic_dataset_kg_mapping.py

Run tests

uv run pytest          # Run all tests
uv run pytest -v       # Run with verbose output
uv run pytest -vs      # Run with verbose output and logging/prints displayed

Note: Tests run automatically on every commit via GitHub Actions (CI/CD).

Development

Quick Start

Run all code quality checks before committing:

./scripts/check.sh     # Run ruff, black, pyright, and pytests
./scripts/fix.sh       # Auto-fix formatting and linting issues

For detailed contribution guidelines, code style standards, and workflow practices, see docs/CONTRIBUTING.md.

Project structure

src/biomapper2/
├── mapper.py                   # Main Mapper class - entry point for entity/dataset mapping
├── models.py                   # Pydantic Entity model for type-safe pipeline processing
├── config.py                   # Configuration (KG API endpoint, logging, etc.)
├── api/                        # FastAPI REST API
│   ├── main.py                 # Application setup, middleware, lifespan
│   ├── auth.py                 # API key authentication
│   ├── models/                 # Request/response Pydantic models
│   └── routes/                 # Endpoint implementations (mapping, discovery)
├── core/
│   ├── annotation_engine.py    # Orchestrates annotation of entities with ontology local IDs
│   ├── annotators/             # Individual annotator implementations (Kestrel text search, etc.)
│   │   ├── base.py             # Base annotator interface
│   │   └── kestrel_text.py     # Kestrel text search annotator
│   ├── normalizer/             # ID normalization package
│   │   ├── normalizer.py       # Main Normalizer class
│   │   ├── validators.py       # ID validation functions for different vocabularies
│   │   ├── cleaners.py         # ID cleaning/standardization functions
│   │   └── vocab_config.py     # Biolink prefix mappings and validator configurations
│   ├── linker.py               # Links curies to knowledge graph nodes
│   └── resolver.py             # Resolves one-to-many entity→KG matches
├── utils.py                    # Utility functions
└── visualizer.py               # Visualize KG performance across datasets

Dockerfile                      # Multi-stage build (builder → dev → prod)
compose.yaml                    # Docker Compose with prod and dev profiles
examples/                       # Working code examples
tests/                          # Pytest test suite
data/                           # Example and groundtruth datasets
scripts/                        # Development scripts (check.sh, fix.sh)

Configuration

Environment variables (set in .env):

KESTREL_API_URL - Knowledge graph API endpoint (defaults to production)
KESTREL_API_KEY - API key for the Kestrel API

Additional settings in src/biomapper2/config.py:

BIOLINK_VERSION_DEFAULT - Default Biolink model version
LOG_LEVEL - Logging verbosity (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
data		data
deploy		deploy
docs		docs
examples		examples
scripts		scripts
src/biomapper2		src/biomapper2
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

biomapper2

Quick access via PyPI

Setup

Install uv (if not already installed)

Clone and install

Usage

Map a single entity to knowledge graph

Map a dataset to knowledge graph

REST API

Run the server

Endpoints

Authentication

Docker

Quick start

Development

Building manually

Generate KG-performance across datasets

Run examples

Run tests

Development

Quick Start

Project structure

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

biomapper2

Quick access via PyPI

Setup

Install uv (if not already installed)

Clone and install

Usage

Map a single entity to knowledge graph

Map a dataset to knowledge graph

REST API

Run the server

Endpoints

Authentication

Docker

Quick start

Development

Building manually

Generate KG-performance across datasets

Run examples

Run tests

Development

Quick Start

Project structure

Configuration

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages