Architecture Reference: OpenEV Data API

1. Introduction & Architectural Style

The OpenEV Data API is engineered using Hexagonal Architecture (Ports and Adapters), implemented via a Rust Cargo Workspace. This architectural choice strictly enforces the Dependency Rule, ensuring that the business logic (Core) remains independent of frameworks, databases and external interfaces.

1.1. System Objective

The system serves two primary objectives:

Data Processing Pipeline (ETL): Transform the layered JSON dataset into multiple ready-to-consume formats:
- Complete canonical JSON (all vehicles expanded and validated)
- SQLite database (embedded, ready-to-query)
- PostgreSQL schema + data (production-ready)
- CSV export (analysis and spreadsheet integration)
- XML export (legacy system compatibility)
API Server: Provide a high-performance REST API to query vehicle data, deployable in multiple environments (containers, serverless, edge).

The system prioritizes correctness (via strict typing), reproducibility (deterministic builds), and portability (multiple output formats and deployment targets).

1.2. Build-Time vs Runtime Strategy

Build-Time (CI/CD Pipeline):

Dataset compilation happens during the release process
Validation and merge logic executed once
Multiple output artifacts generated and attached to releases
Ensures data quality before distribution

Runtime (API Server):

Serves pre-compiled, validated data
No runtime validation overhead
Optimized for read performance
Stateless and horizontally scalable

1.3. Component Roles (Crates)

The workspace is divided into three distinct crates, each representing a specific architectural layer:

crates/ev-core (The Core / Inner Hexagon):
- Role: Pure Domain Library
- Responsibility: Defines domain types (Vehicle, Battery, Charging, etc.), validation rules, and business logic. Contains no I/O, HTTP, or Database code. Shared dependency for all other crates.
- Principle: "Make invalid states unrepresentable"
- Key Features:
  - Rust structs mirroring the JSON schema
  - Serde serialization/deserialization
  - Type-safe enums for classifications
  - Validation logic for data integrity
  - Schema generation support
crates/ev-etl (Data Processing Pipeline):
- Role: CLI Tool for Batch Processing
- Responsibility: Executes during CI/CD to transform source data into distributable artifacts
- Key Features:
  - Reads layered JSON files from dataset repository
  - Implements deep merge strategy (base.json → year → variant)
  - Validates against JSON schema and domain types
  - Generates multiple output formats:
    - vehicles.json - Complete canonical dataset
    - vehicles.db - SQLite database with full schema
    - vehicles.sql - PostgreSQL DDL + INSERT statements
    - vehicles.csv - Flattened tabular export
    - vehicles.xml - XML representation
  - Provides validation reports and statistics
crates/ev-server (API Server):
- Role: HTTP REST API Server
- Responsibility: Exposes vehicle data through HTTP endpoints
- Key Features:
  - RESTful API design
  - Query by make, model, year, variant
  - Search and filter capabilities
  - OpenAPI/Swagger documentation
  - Multiple deployment targets:
    - Standalone binary (Linux/Windows/macOS)
    - Docker container
    - Kubernetes deployment
    - Serverless functions (future)

1.4. Architectural Principles

Dependency Inversion: High-level modules (Adapters) depend on low-level modules (Core). The Core depends on nothing.
Single Source of Truth: The dataset repository is the canonical source; all artifacts are derived.
Zero-Cost Abstractions: Usage of Rust Generics and Traits over runtime polymorphism for compile-time optimization.
Type-Driven Development: Leverage Rust's type system to prevent invalid states.
Deterministic Builds: Same input always produces identical output.
Format Agnostic Core: Core domain logic is independent of serialization formats.

2. File Directory Tree

This structure represents the physical layout of the Monorepo/Workspace.

open-ev-data-api/
├── .github/
│   ├── workflows/
│   │   ├── ci.yml                # Quality Gates (Clippy, Fmt, Test)
│   │   ├── etl-artifacts.yml     # ETL Pipeline: Build and Attach Artifacts
│   │   └── release.yml           # Semantic Release + Docker Build
│   └── CODEOWNERS                # Governance
├── .cargo/
│   └── config.toml               # Global build flags
├── crates/                       # [WORKSPACE MEMBERS]
│   │
│   ├── ev-core/                  # [LAYER: DOMAIN]
│   │   ├── Cargo.toml
│   │   └── src/
│   │       ├── lib.rs            # Library entry point
│   │       ├── domain/           # Domain Entities
│   │       │   ├── mod.rs
│   │       │   ├── vehicle.rs    # Main aggregate root
│   │       │   ├── battery.rs    # Battery specifications
│   │       │   ├── charging.rs   # Charging capabilities
│   │       │   ├── powertrain.rs # Motor and drivetrain
│   │       │   ├── range.rs      # Range and efficiency
│   │       │   └── types.rs      # Common types and enums
│   │       └── validation/       # Validation rules
│   │           └── mod.rs
│   │
│   ├── ev-etl/                   # [LAYER: DATA PROCESSING]
│   │   ├── Cargo.toml
│   │   └── src/
│   │       ├── main.rs           # CLI entry point
│   │       ├── ingest/           # Data ingestion
│   │       │   ├── mod.rs
│   │       │   ├── reader.rs     # File system reader
│   │       │   └── parser.rs     # JSON parsing
│   │       ├── merge/            # Deep merge logic
│   │       │   ├── mod.rs
│   │       │   └── strategy.rs   # Merge precedence rules
│   │       ├── validate/         # Validation pipeline
│   │       │   └── mod.rs
│   │       └── output/           # Output generators
│   │           ├── mod.rs
│   │           ├── json.rs       # Canonical JSON output
│   │           ├── sqlite.rs     # SQLite database generator
│   │           ├── postgresql.rs # PostgreSQL schema + data
│   │           ├── csv.rs        # CSV export
│   │           └── xml.rs        # XML export
│   │
│   └── ev-server/                # [LAYER: API SERVER]
│       ├── Cargo.toml
│       └── src/
│           ├── main.rs           # Binary entry point
│           ├── config.rs         # Configuration management
│           ├── api/              # HTTP Handlers
│           │   ├── mod.rs
│           │   ├── routes.rs     # Route definitions
│           │   ├── vehicles.rs   # Vehicle endpoints
│           │   ├── search.rs     # Search endpoints
│           │   └── health.rs     # Health check
│           ├── db/               # Database layer
│           │   ├── mod.rs
│           │   ├── sqlite.rs     # SQLite connection
│           │   └── postgresql.rs # PostgreSQL connection (optional)
│           └── models/           # API response models
│               └── mod.rs
│
├── tests/                        # [INTEGRATION TESTS]
│   ├── etl_pipeline_test.rs      # ETL processing tests
│   ├── merge_logic_test.rs       # Merge strategy tests
│   ├── output_format_test.rs     # Output validation tests
│   └── api_integration_test.rs   # API endpoint tests
├── fixtures/                     # [TEST DATA]
│   └── sample_vehicles/          # Sample vehicle data for testing
├── schemas/                      # [DATABASE SCHEMAS]
│   ├── sqlite/
│   │   └── schema.sql            # SQLite schema definition
│   └── postgresql/
│       └── schema.sql            # PostgreSQL schema definition
├── docker/
│   ├── Dockerfile                # API server container
│   └── docker-compose.yml        # Local development setup
├── Cargo.toml                    # Workspace Root Config
├── .releaserc.json               # Semantic Release Config
├── README.md                     # Project Overview
└── docs/
    ├── ARCHITECTURE.md           # This Document
    └── RUST_GUIDELINES.md        # Rust Development Standards

3. System Relationship Diagram

This diagram visualizes the dependencies and data flow boundaries. All dependencies point inward toward ev-core.

graph TD
    subgraph "Data Sources"
        Dataset[Dataset Repository<br/>Layered JSON Files]
    end

    subgraph "Build-Time (CI/CD)"
        ETL[ev-etl CLI]

        subgraph "Output Artifacts"
            JSON[vehicles.json]
            SQLite[vehicles.db]
            PostgreSQL[vehicles.sql]
            CSV[vehicles.csv]
            XML[vehicles.xml]
        end
    end

    subgraph "Runtime (Production)"
        Server[ev-server<br/>REST API]
        DB[(Database<br/>SQLite/PostgreSQL)]
        Client[HTTP Clients]
    end

    subgraph "Core Domain"
        Core[ev-core<br/>Domain Types]
    end

    Dataset -->|Read| ETL
    Core -->|Used by| ETL
    Core -->|Used by| Server

    ETL -->|Generate| JSON
    ETL -->|Generate| SQLite
    ETL -->|Generate| PostgreSQL
    ETL -->|Generate| CSV
    ETL -->|Generate| XML

    SQLite -.->|Deploy| DB
    PostgreSQL -.->|Deploy| DB

    Server -->|Query| DB
    Client -->|HTTP Request| Server
    Server -->|JSON Response| Client

Dependency Flow

ev-core is the foundation - no dependencies
ev-etl depends on ev-core for types and validation
ev-server depends on ev-core for types and serialization
Output artifacts are standalone - no runtime dependencies on Rust code

4. Operational Data Flow

This sequence diagram illustrates both the Build-Time Pipeline (artifact generation) and the Runtime Path (API consumption).

sequenceDiagram
    autonumber
    participant Contributor
    participant DatasetRepo as Dataset Repository
    participant CI as GitHub Actions
    participant ETL as ev-etl CLI
    participant Artifacts as Release Artifacts
    participant Deploy as Deployment
    participant API as ev-server
    participant Client as API Consumer

    note over Contributor, Artifacts: BUILD-TIME PIPELINE

    Contributor->>DatasetRepo: Commit JSON changes
    DatasetRepo->>CI: Trigger workflow
    CI->>ETL: Run cargo build --release -p ev-etl
    ETL->>ETL: Load layered JSON files
    ETL->>ETL: Deep merge (base → year → variant)
    ETL->>ETL: Validate against schema
    ETL->>Artifacts: Generate vehicles.json
    ETL->>Artifacts: Generate vehicles.db (SQLite)
    ETL->>Artifacts: Generate vehicles.sql (PostgreSQL)
    ETL->>Artifacts: Generate vehicles.csv
    ETL->>Artifacts: Generate vehicles.xml
    CI->>DatasetRepo: Attach artifacts to release

    note over Deploy, Client: RUNTIME PATH

    Deploy->>API: Deploy ev-server + vehicles.db
    API->>API: Load database into memory
    Client->>API: GET /api/v1/vehicles/list?make=tesla
    API->>API: Query local database
    API-->>Client: Return JSON response

    Client->>API: GET /api/v1/vehicles/code/tesla:model_3:2024:model_3
    API->>API: Query by unique code
    API-->>Client: Return vehicle details

Pipeline Stages

Build-Time (CI/CD)

Trigger: Dataset repository receives commits
Compilation: ETL reads and merges layered JSON
Validation: Schema validation + business rules
Generation: Multiple output formats created
Distribution: Artifacts attached to GitHub releases

Runtime (API Server)

Initialization: Server loads pre-built database
Request Handling: REST endpoints process queries
Query Execution: Fast lookups in local database
Response: JSON serialization and delivery

5. Technology Stack & Versioning

The project utilizes the Rust 2024 Edition. Below is the technology stack for the ecosystem.

5.1. Language Environment

Language: Rust
Edition: 2024 (Latest Stable Edition)
Toolchain: stable (Version 1.85+)
MSRV: 1.92.0 (Minimum Supported Rust Version)

5.2. Core Libraries

Crate	Version	Usage
serde	`1.0`	Serialization/deserialization framework
serde_json	`1.0`	JSON parsing and generation
anyhow	`1.0`	Application-level error handling (ETL, Server)
thiserror	`2.0`	Library-level error definitions (ev-core)

5.3. ETL-Specific Libraries

Crate	Version	Usage
walkdir	`2.5+`	Recursive directory traversal
jsonschema	`0.25+`	JSON schema validation
rusqlite	`0.34+`	SQLite database generation
postgres	`0.19+`	PostgreSQL SQL generation
csv	`1.3+`	CSV serialization
quick-xml	`0.37+`	XML serialization
rayon	`1.10+`	Parallel processing of vehicle files

5.4. Server-Specific Libraries

Crate	Version	Usage
axum	`0.8+`	Web framework for REST API
tokio	`1.42+`	Async runtime
tower	`0.5+`	Middleware and service abstractions
tower-http	`0.6+`	HTTP middleware (CORS, compression, tracing)
rusqlite	`0.34+`	SQLite query layer
sqlx	`0.8+`	PostgreSQL async query layer (optional)
utoipa	`5.3+`	OpenAPI documentation generation
tracing	`0.1+`	Structured logging
tracing-subscriber	`0.3+`	Log collection and formatting

5.5. Development & Testing

Crate	Version	Usage
criterion	`0.5+`	Benchmarking
proptest	`1.6+`	Property-based testing
tempfile	`3.14+`	Temporary file creation for tests
mockall	`0.13+`	Mocking framework

5.6. Infrastructure & Deployment

Container Runtime: Docker 27.0+
Container Orchestration: Kubernetes 1.30+ (optional)
CI/CD: GitHub Actions
Release Automation: semantic-release
Database (Production):
- SQLite 3.45+ (embedded mode)
- PostgreSQL 16+ (server mode)

5.7. API Standards

OpenAPI Specification: 3.1.0
REST API Versioning: URL-based (/api/v1/)
Response Format: JSON (RFC 8259)
Date/Time Format: ISO 8601
Character Encoding: UTF-8

6. ETL Pipeline Specification

6.1. Input Processing

Discovery Phase

Recursively scan the dataset repository src/ directory
Identify all manufacturer directories (first level)
Identify all model directories (second level)
Collect all JSON files: base.json, year directories, and variant files

File Classification

Base Files: src/<make>/<model>/base.json
Year Base Files: src/<make>/<model>/<year>/<vehicle_slug>.json
Variant Files: src/<make>/<model>/<year>/<vehicle_slug>_<variant_slug>.json

6.2. Merge Strategy

The ETL implements a deterministic deep merge with the following precedence (lowest to highest):

Model Base (base.json) - Shared attributes across all years
Year Base (<vehicle_slug>.json) - Specific year configuration
Variant (<vehicle_slug>_<variant_slug>.json) - Delta from year base

Merge Rules

Objects: Deep merge by key (recursive)
Scalars (string, number, boolean): Replace (higher precedence wins)
Arrays: Complete replacement (no concatenation)
Null values: Not allowed (use explicit empty states instead)
Unknown keys: Validation failure

Output Cardinality

Each year base file produces one canonical vehicle
Each variant file produces one additional canonical vehicle
Example: model_3.json + model_3_long_range.json = 2 canonical vehicles

6.3. Validation Pipeline

Each canonical vehicle must pass:

JSON Schema Validation: Against schema.json from dataset repository
Required Fields Check: All mandatory fields present
Type Validation: Correct data types for all fields
Business Rules:
- At least one battery capacity (gross or net)
- At least one charge port
- At least one rated range entry
- At least one source
- Valid slug patterns (lowercase, alphanumeric + underscore)
- Valid ISO codes (country, currency)
Referential Integrity: Variant files reference valid base vehicles

6.4. Output Formats

6.4.1. Canonical JSON (`vehicles.json`)

Format: Single JSON file with array of all canonical vehicles

Structure:

{
  "schema_version": "1.0.0",
  "generated_at": "2025-12-25T12:00:00Z",
  "vehicle_count": 1234,
  "vehicles": [
    { /* canonical vehicle 1 */ },
    { /* canonical vehicle 2 */ },
    ...
  ],
  "metadata": {
    "etl_version": "1.0.0",
    "dataset_commit": "abc123def",
    "processing_time_ms": 5432
  }
}

Use Cases:

Direct JSON consumption by applications
Import into other systems
Data analysis and exploration

6.4.2. SQLite Database (`vehicles.db`)

Schema Design: Normalized relational structure

Tables:

vehicles - Core vehicle information
battery_specs - Battery specifications
charging_specs - Charging capabilities
charge_ports - Physical charge ports
motors - Electric motor details
range_ratings - Range by test cycle
sources - Data sources
variants - Variant metadata

Indexes:

Primary keys on all tables
Composite index on (make_slug, model_slug, year, trim_slug)
Index on make_slug, model_slug separately
Full-text search index on model names

Use Cases:

Embedded applications
Desktop applications
Quick queries without server setup
API server data source (embedded mode)

6.4.3. PostgreSQL Schema (`vehicles.sql`)

Contents:

Complete DDL (CREATE TABLE, CREATE INDEX)
INSERT statements for all data
Views for common queries
Functions for search operations

Features:

JSONB columns for complex nested data
GiST indexes for JSONB queries
Full-text search with tsvector
Materialized views for aggregations

Use Cases:

Production API deployments
Advanced analytics
Multi-tenant scenarios
High-concurrency environments

6.4.4. CSV Export (`vehicles.csv`)

Format: Flattened denormalized structure

Columns:

Vehicle identification (make, model, year, trim, variant)
Key specifications (battery, range, charging)
Performance metrics
Pricing information
Source URLs (concatenated)

Handling Complex Fields:

Arrays: Pipe-separated values (value1|value2|value3)
Objects: Dot notation (battery.pack_capacity_kwh_net)
Nested structures: Flattened to top level

Use Cases:

Spreadsheet analysis (Excel, Google Sheets)
Data science workflows (pandas, R)
Business intelligence tools
Legacy system integration

6.4.5. XML Export (`vehicles.xml`)

Format: Hierarchical XML structure

Root Element: <vehicles>

Vehicle Structure:

<vehicle id="oed:tesla:model_3:2024:base">
  <make slug="tesla">Tesla</make>
  <model slug="model_3">Model 3</model>
  <year>2024</year>
  <battery>
    <pack_capacity_kwh_net>60.0</pack_capacity_kwh_net>
    <!-- ... -->
  </battery>
  <!-- ... -->
</vehicle>

Features:

XML Schema (XSD) generation
Namespace support
XSLT transformation support

Use Cases:

Enterprise system integration
SOAP-based services
Government/regulatory systems
Legacy XML pipelines

6.5. Error Handling and Reporting

Validation Errors

Collect all errors (don't fail fast)
Categorize by severity: ERROR, WARNING, INFO
Generate detailed error report with file paths and line numbers

Statistics Generation

Total vehicles processed
Variants generated
Files scanned
Validation failures
Processing time per stage

Exit Codes

0: Success - all vehicles valid
1: Validation failures - some vehicles invalid
2: Schema errors - malformed JSON
3: File system errors - cannot read files

7. API Server Specification

7.1. Architecture

Pattern: Three-layer architecture

Presentation Layer: HTTP handlers and routing
Service Layer: Business logic and queries
Data Layer: Database abstraction

7.2. Endpoints

GET `/api/v1/health`

Health check endpoint

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "database": "connected",
  "vehicle_count": 1234
}

GET `/api/v1/vehicles/list`

List all vehicles with pagination and filtering

Query Parameters:

make: Filter by manufacturer (slug)
model: Filter by model (slug)
year: Filter by year
vehicle_type: Filter by type (suv, sedan, etc.)
min_range_km: Minimum range
max_range_km: Maximum range
page: Page number (default: 1)
per_page: Items per page (default: 20, max: 100)

Response:

{
  "vehicles": [ /* vehicle summaries */ ],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 1234,
    "total_pages": 62
  }
}

GET `/api/v1/vehicles/code/{code}`

Get specific vehicle by unique code

Path Parameters:

code: Vehicle unique code (format: make:model:year:filename)

Response: Full canonical vehicle object

GET `/api/v1/vehicles/search`

Full-text search across vehicles

Query Parameters:

q: Search query (minimum 2 characters)
page, per_page: Pagination

Response: Ranked search results with pagination

GET `/api/v1/makes/list`

List all manufacturers with model information

Response:

{
  "makes": [
    { "slug": "tesla", "name": "Tesla", "vehicle_count": 42, "models": ["Model 3", "Model S", "Model X", "Model Y"] },
    { "slug": "byd", "name": "BYD", "vehicle_count": 38, "models": ["Dolphin", "Seal", "Atto 3"] }
  ]
}

7.3. Configuration

Configuration via environment variables:

DATABASE_URL: SQLite or PostgreSQL connection string
PORT: Server port (default: 3000)
HOST: Bind address (default: 0.0.0.0)
LOG_LEVEL: Logging level (debug, info, warn, error)
CORS_ORIGINS: Allowed CORS origins
MAX_PAGE_SIZE: Maximum items per page
ENABLE_COMPRESSION: Enable gzip compression
ENABLE_OPENAPI: Enable OpenAPI endpoint

7.4. Performance Characteristics

Targets:

Cold start: < 100ms (with SQLite embedded)
Response time (p50): < 10ms
Response time (p99): < 50ms
Throughput: > 10,000 req/s (single instance, cached)
Memory footprint: < 100MB (with in-memory SQLite)

8. CI/CD Pipeline

8.1. Dataset Repository Pipeline

Triggered by: Push to main or release creation

Workflow: ETL Artifacts Generation

name: Generate Data Artifacts

on:
  push:
    branches: [main]
  release:
    types: [created]

jobs:
  build-etl:
    - Checkout API repository
    - Install Rust toolchain
    - Build ev-etl in release mode

  generate-artifacts:
    - Run ev-etl against dataset source
    - Generate all output formats
    - Validate all artifacts
    - Calculate checksums

  publish-artifacts:
    - Upload to GitHub Release
    - Tag with semantic version
    - Generate release notes

Artifacts Produced:

vehicles.json + vehicles.json.sha256
vehicles.db + vehicles.db.sha256
vehicles.sql + vehicles.sql.sha256
vehicles.csv + vehicles.csv.sha256
vehicles.xml + vehicles.xml.sha256
validation-report.txt
statistics.json

8.2. API Repository Pipeline

Triggered by: Push, PR, or release

Workflow 1: Continuous Integration

name: CI

on: [push, pull_request]

jobs:
  test:
    - cargo fmt --check
    - cargo clippy -- -D warnings
    - cargo test --all-features
    - cargo test --doc

  build:
    - cargo build --release -p ev-etl
    - cargo build --release -p ev-server

Workflow 2: Release

name: Release

on:
  push:
    branches: [main]

jobs:
  semantic-release:
    - Run semantic-release
    - Generate changelog
    - Create GitHub release

  build-binaries:
    - Cross-compile for multiple platforms
    - Upload binaries to release

  build-docker:
    - Build Docker image
    - Push to container registry
    - Tag with semantic version

Release Artifacts:

Binaries: ev-etl-linux-x64, ev-etl-windows-x64, ev-etl-macos-arm64
Binaries: ev-server-linux-x64, ev-server-windows-x64, ev-server-macos-arm64
Docker image: ghcr.io/open-ev-data/ev-server:latest
Docker image: ghcr.io/open-ev-data/ev-server:v1.2.3

9. Deployment Scenarios

9.1. Embedded Mode (SQLite)

Use Case: Single-server deployments, edge locations, development

Setup:

# Download artifacts
wget https://github.com/.../vehicles.db
wget https://github.com/.../ev-server-linux-x64

# Run server
DATABASE_URL=vehicles.db ./ev-server-linux-x64

Characteristics:

Zero external dependencies
< 100MB total footprint
Single binary deployment
Fast startup time

9.2. Container Deployment (Docker)

Use Case: Cloud deployments, Kubernetes, scalability

Docker Compose:

version: '3.8'
services:
  api:
    image: ghcr.io/open-ev-data/ev-server:latest
    environment:
      - DATABASE_URL=vehicles.db
    ports:
      - "3000:3000"
    volumes:
      - ./vehicles.db:/app/vehicles.db:ro

Kubernetes Deployment:

Deployment with multiple replicas
ConfigMap for database file
HorizontalPodAutoscaler
Ingress for external access

9.3. PostgreSQL Mode

Use Case: High-concurrency production, multi-tenant

Setup:

# Import schema
psql -d openev -f vehicles.sql

# Run server
DATABASE_URL=postgresql://user:pass@localhost/openev ./ev-server

Characteristics:

Advanced query capabilities
Connection pooling
Read replicas support
Full ACID compliance

10. Development Workflow

Important: For detailed Rust coding standards, best practices, and implementation guidelines, see RUST_GUIDELINES.md.

10.1. Initial Setup

# Clone API repository
git clone https://github.com/open-ev-data/open-ev-data-api.git
cd open-ev-data-api

# Install Rust toolchain
rustup install stable
rustup default stable

# Build all crates
cargo build --all

# Run tests
cargo test --all

10.2. ETL Development Cycle

# Point to local dataset
export DATASET_PATH=../open-ev-data-dataset/src

# Build and run ETL
cargo run -p ev-etl -- \
  --input $DATASET_PATH \
  --output ./output \
  --formats json,sqlite,csv

# Validate output
cargo run -p ev-etl -- \
  --validate ./output/vehicles.json

10.3. Server Development Cycle

# Use test database
cargo run -p ev-server -- \
  --database ./output/vehicles.db \
  --port 3000

# Run with hot reload (cargo-watch)
cargo watch -x 'run -p ev-server'

# Run integration tests
cargo test -p ev-server --test integration

10.4. Release Process

Dataset Release: Triggers ETL artifact generation
API Development: Happens independently
API Release: semantic-release handles versioning
Docker Build: Automatic on API release
Deployment: Manual or automatic depending on environment

11. Future Enhancements

Phase 2 - API Improvements

GraphQL endpoint alongside REST
WebSocket support for real-time updates
Advanced search with Elasticsearch
Caching layer with Redis
Rate limiting and API keys

Phase 3 - Analytics

Usage analytics and telemetry
Popular vehicle tracking
Search query analysis
Performance monitoring dashboard

Phase 4 - Data Quality

Automated data quality scoring
Community contribution workflow
Diff visualization for updates
Historical data tracking (versioned snapshots)

12. Summary

The OpenEV Data API provides a comprehensive solution for transforming the layered dataset into multiple consumption formats:

Core Strengths:

Build-Time Compilation: Data validation happens once, not on every request
Multiple Formats: Single source, five output formats
Type Safety: Rust's type system prevents invalid states
Performance: Optimized for read-heavy workloads
Portability: Works embedded (SQLite) or client-server (PostgreSQL)
Automation: Fully automated CI/CD pipeline

Architecture Benefits:

Clean separation of concerns (Hexagonal Architecture)
Testable and maintainable codebase
Independent deployability of components
Multiple deployment strategies supported

Integration Points:

Dataset updates trigger artifact regeneration
API releases follow semantic versioning
Multiple consumption patterns supported (files, API, embedded)

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Reference: OpenEV Data API

1. Introduction & Architectural Style

1.1. System Objective

1.2. Build-Time vs Runtime Strategy

1.3. Component Roles (Crates)

1.4. Architectural Principles

2. File Directory Tree

3. System Relationship Diagram

Dependency Flow

4. Operational Data Flow

Pipeline Stages

Build-Time (CI/CD)

Runtime (API Server)

5. Technology Stack & Versioning

5.1. Language Environment

5.2. Core Libraries

5.3. ETL-Specific Libraries

5.4. Server-Specific Libraries

5.5. Development & Testing

5.6. Infrastructure & Deployment

5.7. API Standards

6. ETL Pipeline Specification

6.1. Input Processing

Discovery Phase

File Classification

6.2. Merge Strategy

Merge Rules

Output Cardinality

6.3. Validation Pipeline

6.4. Output Formats

6.4.1. Canonical JSON (vehicles.json)

6.4.2. SQLite Database (vehicles.db)

6.4.3. PostgreSQL Schema (vehicles.sql)

6.4.4. CSV Export (vehicles.csv)

6.4.5. XML Export (vehicles.xml)

6.5. Error Handling and Reporting

Validation Errors

Statistics Generation

Exit Codes

7. API Server Specification

7.1. Architecture

7.2. Endpoints

GET /api/v1/health

GET /api/v1/vehicles/list

GET /api/v1/vehicles/code/{code}

GET /api/v1/vehicles/search

GET /api/v1/makes/list

7.3. Configuration

7.4. Performance Characteristics

8. CI/CD Pipeline

8.1. Dataset Repository Pipeline

8.2. API Repository Pipeline

9. Deployment Scenarios

9.1. Embedded Mode (SQLite)

9.2. Container Deployment (Docker)

9.3. PostgreSQL Mode

10. Development Workflow

10.1. Initial Setup

10.2. ETL Development Cycle

10.3. Server Development Cycle

10.4. Release Process

11. Future Enhancements

Phase 2 - API Improvements

Phase 3 - Analytics

Phase 4 - Data Quality

12. Summary

6.4.1. Canonical JSON (`vehicles.json`)

6.4.2. SQLite Database (`vehicles.db`)

6.4.3. PostgreSQL Schema (`vehicles.sql`)

6.4.4. CSV Export (`vehicles.csv`)

6.4.5. XML Export (`vehicles.xml`)

GET `/api/v1/health`

GET `/api/v1/vehicles/list`

GET `/api/v1/vehicles/code/{code}`

GET `/api/v1/vehicles/search`

GET `/api/v1/makes/list`