The OpenEV Data API is engineered using Hexagonal Architecture (Ports and Adapters), implemented via a Rust Cargo Workspace. This architectural choice strictly enforces the Dependency Rule, ensuring that the business logic (Core) remains independent of frameworks, databases and external interfaces.
The system serves two primary objectives:
-
Data Processing Pipeline (ETL): Transform the layered JSON dataset into multiple ready-to-consume formats:
- Complete canonical JSON (all vehicles expanded and validated)
- SQLite database (embedded, ready-to-query)
- PostgreSQL schema + data (production-ready)
- CSV export (analysis and spreadsheet integration)
- XML export (legacy system compatibility)
-
API Server: Provide a high-performance REST API to query vehicle data, deployable in multiple environments (containers, serverless, edge).
The system prioritizes correctness (via strict typing), reproducibility (deterministic builds), and portability (multiple output formats and deployment targets).
Build-Time (CI/CD Pipeline):
- Dataset compilation happens during the release process
- Validation and merge logic executed once
- Multiple output artifacts generated and attached to releases
- Ensures data quality before distribution
Runtime (API Server):
- Serves pre-compiled, validated data
- No runtime validation overhead
- Optimized for read performance
- Stateless and horizontally scalable
The workspace is divided into three distinct crates, each representing a specific architectural layer:
-
crates/ev-core(The Core / Inner Hexagon):- Role: Pure Domain Library
- Responsibility: Defines domain types (
Vehicle,Battery,Charging, etc.), validation rules, and business logic. Contains no I/O, HTTP, or Database code. Shared dependency for all other crates. - Principle: "Make invalid states unrepresentable"
- Key Features:
- Rust structs mirroring the JSON schema
- Serde serialization/deserialization
- Type-safe enums for classifications
- Validation logic for data integrity
- Schema generation support
-
crates/ev-etl(Data Processing Pipeline):- Role: CLI Tool for Batch Processing
- Responsibility: Executes during CI/CD to transform source data into distributable artifacts
- Key Features:
- Reads layered JSON files from dataset repository
- Implements deep merge strategy (base.json → year → variant)
- Validates against JSON schema and domain types
- Generates multiple output formats:
vehicles.json- Complete canonical datasetvehicles.db- SQLite database with full schemavehicles.sql- PostgreSQL DDL + INSERT statementsvehicles.csv- Flattened tabular exportvehicles.xml- XML representation
- Provides validation reports and statistics
-
crates/ev-server(API Server):- Role: HTTP REST API Server
- Responsibility: Exposes vehicle data through HTTP endpoints
- Key Features:
- RESTful API design
- Query by make, model, year, variant
- Search and filter capabilities
- OpenAPI/Swagger documentation
- Multiple deployment targets:
- Standalone binary (Linux/Windows/macOS)
- Docker container
- Kubernetes deployment
- Serverless functions (future)
- Dependency Inversion: High-level modules (Adapters) depend on low-level modules (Core). The Core depends on nothing.
- Single Source of Truth: The dataset repository is the canonical source; all artifacts are derived.
- Zero-Cost Abstractions: Usage of Rust Generics and Traits over runtime polymorphism for compile-time optimization.
- Type-Driven Development: Leverage Rust's type system to prevent invalid states.
- Deterministic Builds: Same input always produces identical output.
- Format Agnostic Core: Core domain logic is independent of serialization formats.
This structure represents the physical layout of the Monorepo/Workspace.
open-ev-data-api/
├── .github/
│ ├── workflows/
│ │ ├── ci.yml # Quality Gates (Clippy, Fmt, Test)
│ │ ├── etl-artifacts.yml # ETL Pipeline: Build and Attach Artifacts
│ │ └── release.yml # Semantic Release + Docker Build
│ └── CODEOWNERS # Governance
├── .cargo/
│ └── config.toml # Global build flags
├── crates/ # [WORKSPACE MEMBERS]
│ │
│ ├── ev-core/ # [LAYER: DOMAIN]
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs # Library entry point
│ │ ├── domain/ # Domain Entities
│ │ │ ├── mod.rs
│ │ │ ├── vehicle.rs # Main aggregate root
│ │ │ ├── battery.rs # Battery specifications
│ │ │ ├── charging.rs # Charging capabilities
│ │ │ ├── powertrain.rs # Motor and drivetrain
│ │ │ ├── range.rs # Range and efficiency
│ │ │ └── types.rs # Common types and enums
│ │ └── validation/ # Validation rules
│ │ └── mod.rs
│ │
│ ├── ev-etl/ # [LAYER: DATA PROCESSING]
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── main.rs # CLI entry point
│ │ ├── ingest/ # Data ingestion
│ │ │ ├── mod.rs
│ │ │ ├── reader.rs # File system reader
│ │ │ └── parser.rs # JSON parsing
│ │ ├── merge/ # Deep merge logic
│ │ │ ├── mod.rs
│ │ │ └── strategy.rs # Merge precedence rules
│ │ ├── validate/ # Validation pipeline
│ │ │ └── mod.rs
│ │ └── output/ # Output generators
│ │ ├── mod.rs
│ │ ├── json.rs # Canonical JSON output
│ │ ├── sqlite.rs # SQLite database generator
│ │ ├── postgresql.rs # PostgreSQL schema + data
│ │ ├── csv.rs # CSV export
│ │ └── xml.rs # XML export
│ │
│ └── ev-server/ # [LAYER: API SERVER]
│ ├── Cargo.toml
│ └── src/
│ ├── main.rs # Binary entry point
│ ├── config.rs # Configuration management
│ ├── api/ # HTTP Handlers
│ │ ├── mod.rs
│ │ ├── routes.rs # Route definitions
│ │ ├── vehicles.rs # Vehicle endpoints
│ │ ├── search.rs # Search endpoints
│ │ └── health.rs # Health check
│ ├── db/ # Database layer
│ │ ├── mod.rs
│ │ ├── sqlite.rs # SQLite connection
│ │ └── postgresql.rs # PostgreSQL connection (optional)
│ └── models/ # API response models
│ └── mod.rs
│
├── tests/ # [INTEGRATION TESTS]
│ ├── etl_pipeline_test.rs # ETL processing tests
│ ├── merge_logic_test.rs # Merge strategy tests
│ ├── output_format_test.rs # Output validation tests
│ └── api_integration_test.rs # API endpoint tests
├── fixtures/ # [TEST DATA]
│ └── sample_vehicles/ # Sample vehicle data for testing
├── schemas/ # [DATABASE SCHEMAS]
│ ├── sqlite/
│ │ └── schema.sql # SQLite schema definition
│ └── postgresql/
│ └── schema.sql # PostgreSQL schema definition
├── docker/
│ ├── Dockerfile # API server container
│ └── docker-compose.yml # Local development setup
├── Cargo.toml # Workspace Root Config
├── .releaserc.json # Semantic Release Config
├── README.md # Project Overview
└── docs/
├── ARCHITECTURE.md # This Document
└── RUST_GUIDELINES.md # Rust Development Standards
This diagram visualizes the dependencies and data flow boundaries. All dependencies point inward toward ev-core.
graph TD
subgraph "Data Sources"
Dataset[Dataset Repository<br/>Layered JSON Files]
end
subgraph "Build-Time (CI/CD)"
ETL[ev-etl CLI]
subgraph "Output Artifacts"
JSON[vehicles.json]
SQLite[vehicles.db]
PostgreSQL[vehicles.sql]
CSV[vehicles.csv]
XML[vehicles.xml]
end
end
subgraph "Runtime (Production)"
Server[ev-server<br/>REST API]
DB[(Database<br/>SQLite/PostgreSQL)]
Client[HTTP Clients]
end
subgraph "Core Domain"
Core[ev-core<br/>Domain Types]
end
Dataset -->|Read| ETL
Core -->|Used by| ETL
Core -->|Used by| Server
ETL -->|Generate| JSON
ETL -->|Generate| SQLite
ETL -->|Generate| PostgreSQL
ETL -->|Generate| CSV
ETL -->|Generate| XML
SQLite -.->|Deploy| DB
PostgreSQL -.->|Deploy| DB
Server -->|Query| DB
Client -->|HTTP Request| Server
Server -->|JSON Response| Client
- ev-core is the foundation - no dependencies
- ev-etl depends on ev-core for types and validation
- ev-server depends on ev-core for types and serialization
- Output artifacts are standalone - no runtime dependencies on Rust code
This sequence diagram illustrates both the Build-Time Pipeline (artifact generation) and the Runtime Path (API consumption).
sequenceDiagram
autonumber
participant Contributor
participant DatasetRepo as Dataset Repository
participant CI as GitHub Actions
participant ETL as ev-etl CLI
participant Artifacts as Release Artifacts
participant Deploy as Deployment
participant API as ev-server
participant Client as API Consumer
note over Contributor, Artifacts: BUILD-TIME PIPELINE
Contributor->>DatasetRepo: Commit JSON changes
DatasetRepo->>CI: Trigger workflow
CI->>ETL: Run cargo build --release -p ev-etl
ETL->>ETL: Load layered JSON files
ETL->>ETL: Deep merge (base → year → variant)
ETL->>ETL: Validate against schema
ETL->>Artifacts: Generate vehicles.json
ETL->>Artifacts: Generate vehicles.db (SQLite)
ETL->>Artifacts: Generate vehicles.sql (PostgreSQL)
ETL->>Artifacts: Generate vehicles.csv
ETL->>Artifacts: Generate vehicles.xml
CI->>DatasetRepo: Attach artifacts to release
note over Deploy, Client: RUNTIME PATH
Deploy->>API: Deploy ev-server + vehicles.db
API->>API: Load database into memory
Client->>API: GET /api/v1/vehicles/list?make=tesla
API->>API: Query local database
API-->>Client: Return JSON response
Client->>API: GET /api/v1/vehicles/code/tesla:model_3:2024:model_3
API->>API: Query by unique code
API-->>Client: Return vehicle details
- Trigger: Dataset repository receives commits
- Compilation: ETL reads and merges layered JSON
- Validation: Schema validation + business rules
- Generation: Multiple output formats created
- Distribution: Artifacts attached to GitHub releases
- Initialization: Server loads pre-built database
- Request Handling: REST endpoints process queries
- Query Execution: Fast lookups in local database
- Response: JSON serialization and delivery
The project utilizes the Rust 2024 Edition. Below is the technology stack for the ecosystem.
- Language: Rust
- Edition:
2024(Latest Stable Edition) - Toolchain:
stable(Version 1.85+) - MSRV: 1.92.0 (Minimum Supported Rust Version)
| Crate | Version | Usage |
|---|---|---|
| serde | 1.0 |
Serialization/deserialization framework |
| serde_json | 1.0 |
JSON parsing and generation |
| anyhow | 1.0 |
Application-level error handling (ETL, Server) |
| thiserror | 2.0 |
Library-level error definitions (ev-core) |
| Crate | Version | Usage |
|---|---|---|
| walkdir | 2.5+ |
Recursive directory traversal |
| jsonschema | 0.25+ |
JSON schema validation |
| rusqlite | 0.34+ |
SQLite database generation |
| postgres | 0.19+ |
PostgreSQL SQL generation |
| csv | 1.3+ |
CSV serialization |
| quick-xml | 0.37+ |
XML serialization |
| rayon | 1.10+ |
Parallel processing of vehicle files |
| Crate | Version | Usage |
|---|---|---|
| axum | 0.8+ |
Web framework for REST API |
| tokio | 1.42+ |
Async runtime |
| tower | 0.5+ |
Middleware and service abstractions |
| tower-http | 0.6+ |
HTTP middleware (CORS, compression, tracing) |
| rusqlite | 0.34+ |
SQLite query layer |
| sqlx | 0.8+ |
PostgreSQL async query layer (optional) |
| utoipa | 5.3+ |
OpenAPI documentation generation |
| tracing | 0.1+ |
Structured logging |
| tracing-subscriber | 0.3+ |
Log collection and formatting |
| Crate | Version | Usage |
|---|---|---|
| criterion | 0.5+ |
Benchmarking |
| proptest | 1.6+ |
Property-based testing |
| tempfile | 3.14+ |
Temporary file creation for tests |
| mockall | 0.13+ |
Mocking framework |
- Container Runtime: Docker 27.0+
- Container Orchestration: Kubernetes 1.30+ (optional)
- CI/CD: GitHub Actions
- Release Automation: semantic-release
- Database (Production):
- SQLite 3.45+ (embedded mode)
- PostgreSQL 16+ (server mode)
- OpenAPI Specification: 3.1.0
- REST API Versioning: URL-based (
/api/v1/) - Response Format: JSON (RFC 8259)
- Date/Time Format: ISO 8601
- Character Encoding: UTF-8
- Recursively scan the dataset repository
src/directory - Identify all manufacturer directories (first level)
- Identify all model directories (second level)
- Collect all JSON files:
base.json, year directories, and variant files
- Base Files:
src/<make>/<model>/base.json - Year Base Files:
src/<make>/<model>/<year>/<vehicle_slug>.json - Variant Files:
src/<make>/<model>/<year>/<vehicle_slug>_<variant_slug>.json
The ETL implements a deterministic deep merge with the following precedence (lowest to highest):
- Model Base (
base.json) - Shared attributes across all years - Year Base (
<vehicle_slug>.json) - Specific year configuration - Variant (
<vehicle_slug>_<variant_slug>.json) - Delta from year base
- Objects: Deep merge by key (recursive)
- Scalars (string, number, boolean): Replace (higher precedence wins)
- Arrays: Complete replacement (no concatenation)
- Null values: Not allowed (use explicit empty states instead)
- Unknown keys: Validation failure
- Each year base file produces one canonical vehicle
- Each variant file produces one additional canonical vehicle
- Example:
model_3.json+model_3_long_range.json= 2 canonical vehicles
Each canonical vehicle must pass:
- JSON Schema Validation: Against
schema.jsonfrom dataset repository - Required Fields Check: All mandatory fields present
- Type Validation: Correct data types for all fields
- Business Rules:
- At least one battery capacity (gross or net)
- At least one charge port
- At least one rated range entry
- At least one source
- Valid slug patterns (lowercase, alphanumeric + underscore)
- Valid ISO codes (country, currency)
- Referential Integrity: Variant files reference valid base vehicles
Format: Single JSON file with array of all canonical vehicles
Structure:
{
"schema_version": "1.0.0",
"generated_at": "2025-12-25T12:00:00Z",
"vehicle_count": 1234,
"vehicles": [
{ /* canonical vehicle 1 */ },
{ /* canonical vehicle 2 */ },
...
],
"metadata": {
"etl_version": "1.0.0",
"dataset_commit": "abc123def",
"processing_time_ms": 5432
}
}Use Cases:
- Direct JSON consumption by applications
- Import into other systems
- Data analysis and exploration
Schema Design: Normalized relational structure
Tables:
vehicles- Core vehicle informationbattery_specs- Battery specificationscharging_specs- Charging capabilitiescharge_ports- Physical charge portsmotors- Electric motor detailsrange_ratings- Range by test cyclesources- Data sourcesvariants- Variant metadata
Indexes:
- Primary keys on all tables
- Composite index on (make_slug, model_slug, year, trim_slug)
- Index on make_slug, model_slug separately
- Full-text search index on model names
Use Cases:
- Embedded applications
- Desktop applications
- Quick queries without server setup
- API server data source (embedded mode)
Contents:
- Complete DDL (CREATE TABLE, CREATE INDEX)
- INSERT statements for all data
- Views for common queries
- Functions for search operations
Features:
- JSONB columns for complex nested data
- GiST indexes for JSONB queries
- Full-text search with tsvector
- Materialized views for aggregations
Use Cases:
- Production API deployments
- Advanced analytics
- Multi-tenant scenarios
- High-concurrency environments
Format: Flattened denormalized structure
Columns:
- Vehicle identification (make, model, year, trim, variant)
- Key specifications (battery, range, charging)
- Performance metrics
- Pricing information
- Source URLs (concatenated)
Handling Complex Fields:
- Arrays: Pipe-separated values (
value1|value2|value3) - Objects: Dot notation (
battery.pack_capacity_kwh_net) - Nested structures: Flattened to top level
Use Cases:
- Spreadsheet analysis (Excel, Google Sheets)
- Data science workflows (pandas, R)
- Business intelligence tools
- Legacy system integration
Format: Hierarchical XML structure
Root Element: <vehicles>
Vehicle Structure:
<vehicle id="oed:tesla:model_3:2024:base">
<make slug="tesla">Tesla</make>
<model slug="model_3">Model 3</model>
<year>2024</year>
<battery>
<pack_capacity_kwh_net>60.0</pack_capacity_kwh_net>
<!-- ... -->
</battery>
<!-- ... -->
</vehicle>Features:
- XML Schema (XSD) generation
- Namespace support
- XSLT transformation support
Use Cases:
- Enterprise system integration
- SOAP-based services
- Government/regulatory systems
- Legacy XML pipelines
- Collect all errors (don't fail fast)
- Categorize by severity: ERROR, WARNING, INFO
- Generate detailed error report with file paths and line numbers
- Total vehicles processed
- Variants generated
- Files scanned
- Validation failures
- Processing time per stage
0: Success - all vehicles valid1: Validation failures - some vehicles invalid2: Schema errors - malformed JSON3: File system errors - cannot read files
Pattern: Three-layer architecture
- Presentation Layer: HTTP handlers and routing
- Service Layer: Business logic and queries
- Data Layer: Database abstraction
Health check endpoint
Response:
{
"status": "healthy",
"version": "1.0.0",
"database": "connected",
"vehicle_count": 1234
}List all vehicles with pagination and filtering
Query Parameters:
make: Filter by manufacturer (slug)model: Filter by model (slug)year: Filter by yearvehicle_type: Filter by type (suv, sedan, etc.)min_range_km: Minimum rangemax_range_km: Maximum rangepage: Page number (default: 1)per_page: Items per page (default: 20, max: 100)
Response:
{
"vehicles": [ /* vehicle summaries */ ],
"pagination": {
"page": 1,
"per_page": 20,
"total": 1234,
"total_pages": 62
}
}Get specific vehicle by unique code
Path Parameters:
code: Vehicle unique code (format:make:model:year:filename)
Response: Full canonical vehicle object
Full-text search across vehicles
Query Parameters:
q: Search query (minimum 2 characters)page,per_page: Pagination
Response: Ranked search results with pagination
List all manufacturers with model information
Response:
{
"makes": [
{ "slug": "tesla", "name": "Tesla", "vehicle_count": 42, "models": ["Model 3", "Model S", "Model X", "Model Y"] },
{ "slug": "byd", "name": "BYD", "vehicle_count": 38, "models": ["Dolphin", "Seal", "Atto 3"] }
]
}Configuration via environment variables:
DATABASE_URL: SQLite or PostgreSQL connection stringPORT: Server port (default: 3000)HOST: Bind address (default: 0.0.0.0)LOG_LEVEL: Logging level (debug, info, warn, error)CORS_ORIGINS: Allowed CORS originsMAX_PAGE_SIZE: Maximum items per pageENABLE_COMPRESSION: Enable gzip compressionENABLE_OPENAPI: Enable OpenAPI endpoint
Targets:
- Cold start: < 100ms (with SQLite embedded)
- Response time (p50): < 10ms
- Response time (p99): < 50ms
- Throughput: > 10,000 req/s (single instance, cached)
- Memory footprint: < 100MB (with in-memory SQLite)
Triggered by: Push to main or release creation
Workflow: ETL Artifacts Generation
name: Generate Data Artifacts
on:
push:
branches: [main]
release:
types: [created]
jobs:
build-etl:
- Checkout API repository
- Install Rust toolchain
- Build ev-etl in release mode
generate-artifacts:
- Run ev-etl against dataset source
- Generate all output formats
- Validate all artifacts
- Calculate checksums
publish-artifacts:
- Upload to GitHub Release
- Tag with semantic version
- Generate release notesArtifacts Produced:
vehicles.json+vehicles.json.sha256vehicles.db+vehicles.db.sha256vehicles.sql+vehicles.sql.sha256vehicles.csv+vehicles.csv.sha256vehicles.xml+vehicles.xml.sha256validation-report.txtstatistics.json
Triggered by: Push, PR, or release
Workflow 1: Continuous Integration
name: CI
on: [push, pull_request]
jobs:
test:
- cargo fmt --check
- cargo clippy -- -D warnings
- cargo test --all-features
- cargo test --doc
build:
- cargo build --release -p ev-etl
- cargo build --release -p ev-serverWorkflow 2: Release
name: Release
on:
push:
branches: [main]
jobs:
semantic-release:
- Run semantic-release
- Generate changelog
- Create GitHub release
build-binaries:
- Cross-compile for multiple platforms
- Upload binaries to release
build-docker:
- Build Docker image
- Push to container registry
- Tag with semantic versionRelease Artifacts:
- Binaries:
ev-etl-linux-x64,ev-etl-windows-x64,ev-etl-macos-arm64 - Binaries:
ev-server-linux-x64,ev-server-windows-x64,ev-server-macos-arm64 - Docker image:
ghcr.io/open-ev-data/ev-server:latest - Docker image:
ghcr.io/open-ev-data/ev-server:v1.2.3
Use Case: Single-server deployments, edge locations, development
Setup:
# Download artifacts
wget https://github.com/.../vehicles.db
wget https://github.com/.../ev-server-linux-x64
# Run server
DATABASE_URL=vehicles.db ./ev-server-linux-x64Characteristics:
- Zero external dependencies
- < 100MB total footprint
- Single binary deployment
- Fast startup time
Use Case: Cloud deployments, Kubernetes, scalability
Docker Compose:
version: '3.8'
services:
api:
image: ghcr.io/open-ev-data/ev-server:latest
environment:
- DATABASE_URL=vehicles.db
ports:
- "3000:3000"
volumes:
- ./vehicles.db:/app/vehicles.db:roKubernetes Deployment:
- Deployment with multiple replicas
- ConfigMap for database file
- HorizontalPodAutoscaler
- Ingress for external access
Use Case: High-concurrency production, multi-tenant
Setup:
# Import schema
psql -d openev -f vehicles.sql
# Run server
DATABASE_URL=postgresql://user:pass@localhost/openev ./ev-serverCharacteristics:
- Advanced query capabilities
- Connection pooling
- Read replicas support
- Full ACID compliance
Important: For detailed Rust coding standards, best practices, and implementation guidelines, see RUST_GUIDELINES.md.
# Clone API repository
git clone https://github.com/open-ev-data/open-ev-data-api.git
cd open-ev-data-api
# Install Rust toolchain
rustup install stable
rustup default stable
# Build all crates
cargo build --all
# Run tests
cargo test --all# Point to local dataset
export DATASET_PATH=../open-ev-data-dataset/src
# Build and run ETL
cargo run -p ev-etl -- \
--input $DATASET_PATH \
--output ./output \
--formats json,sqlite,csv
# Validate output
cargo run -p ev-etl -- \
--validate ./output/vehicles.json# Use test database
cargo run -p ev-server -- \
--database ./output/vehicles.db \
--port 3000
# Run with hot reload (cargo-watch)
cargo watch -x 'run -p ev-server'
# Run integration tests
cargo test -p ev-server --test integration- Dataset Release: Triggers ETL artifact generation
- API Development: Happens independently
- API Release: semantic-release handles versioning
- Docker Build: Automatic on API release
- Deployment: Manual or automatic depending on environment
- GraphQL endpoint alongside REST
- WebSocket support for real-time updates
- Advanced search with Elasticsearch
- Caching layer with Redis
- Rate limiting and API keys
- Usage analytics and telemetry
- Popular vehicle tracking
- Search query analysis
- Performance monitoring dashboard
- Automated data quality scoring
- Community contribution workflow
- Diff visualization for updates
- Historical data tracking (versioned snapshots)
The OpenEV Data API provides a comprehensive solution for transforming the layered dataset into multiple consumption formats:
Core Strengths:
- Build-Time Compilation: Data validation happens once, not on every request
- Multiple Formats: Single source, five output formats
- Type Safety: Rust's type system prevents invalid states
- Performance: Optimized for read-heavy workloads
- Portability: Works embedded (SQLite) or client-server (PostgreSQL)
- Automation: Fully automated CI/CD pipeline
Architecture Benefits:
- Clean separation of concerns (Hexagonal Architecture)
- Testable and maintainable codebase
- Independent deployability of components
- Multiple deployment strategies supported
Integration Points:
- Dataset updates trigger artifact regeneration
- API releases follow semantic versioning
- Multiple consumption patterns supported (files, API, embedded)