Skip to content

Proposal: sharedDatabases — SQLite as local data source #45

@a6b8

Description

@a6b8

Proposal: sharedDatabases — SQLite as Local Data Source

Summary

Extend FlowMCP with a new source: 'sqlite' type that enables schemas to query local SQLite databases instead of HTTP APIs. Databases are registered as shared resources (analogous to sharedLists) and made available to multiple schemas. Query execution happens entirely inside the core executor — handlers never get direct DB access.

Motivation

FlowMCP currently only supports HTTP-based data sources. This limits tools to external APIs with rate limits, API keys, and latency. Many valuable datasets (address directories, token metadata, registry snapshots) could be served locally:

  • No API keys — Zero cost per query
  • No rate limits — Instant responses
  • Offline-capable — Works without internet
  • Large datasets — 8M+ rows queryable in milliseconds
  • Historical snapshots — Registry/market data frozen in time

Design Principles

Following the existing spec philosophy:

Principle Application
Deterministic over clever SQL query declared in main, same input = same result
Declare over code Query is static, hashable, no dynamic SQL
Inject over import DB resolved by core, not by schema
Hash over trust DB has SHA-256 hash in meta.json
Constrain over permit Read-only, SELECT only, no DDL/DML

Architecture

flowchart TD
    A[Schema loaded] --> B{main.source?}
    B -->|http — default| C[HttpExecutor — unchanged]
    B -->|sqlite| D[SqliteExecutor — NEW]
    
    D --> D1["Resolve DB path from sharedDatabases"]
    D1 --> D2["db.prepare(query).all(bindings)"]
    D2 --> D3["Rows → postRequest handler → response"]
Loading

No Mixing Rule

A schema is either source: 'http' or source: 'sqlite' — never both. There is no way to combine HTTP calls and SQLite queries within a single schema.

Database Storage

Databases live as shared resources in a dedicated directory:

~/.flowmcp/
├── schemas/
├── shared-lists/
└── databases/              ← NEW
    └── eth-addresses/
        ├── meta.json       ← version, hash, table info
        └── eth-addresses.db

meta.json Format

{
    "name": "ethAddresses",
    "version": "1.0.0",
    "description": "8M Ethereum addresses with labels and balances",
    "hash": "sha256:a1b2c3...",
    "tables": [
        {
            "name": "addresses",
            "rows": 8000000,
            "columns": ["address", "balance", "label", "firstSeen"]
        }
    ],
    "sizeBytes": 524288000,
    "updatedAt": "2026-02-20"
}

Schema Format

New Top-Level Fields

Field Type Required Description
source 'http' | 'sqlite' No (default: 'http') Executor discriminator
databases Array<{ ref, version }> Only if source: 'sqlite' Declared DB dependencies

Conditional Fields by Source

Field source: 'http' source: 'sqlite'
root Required Forbidden
databases Forbidden Required
requiredServerParams Optional Forbidden
routes.method Required Forbidden
routes.path Required Forbidden
routes.database Forbidden Required (ref name)
routes.query Forbidden Required (SELECT statement)
parameters.location insert | query | body bind

Example Schema

export const main = {
    source: 'sqlite',
    namespace: 'ethaddresses',
    name: 'WhaleTracker',
    description: 'Query Ethereum whale addresses from local database',
    version: '2.0.0',
    tags: ['ethereum', 'addresses', 'local'],
    databases: [
        { ref: 'ethAddresses', version: '1.0.0' }
    ],
    routes: {
        queryWhales: {
            database: 'ethAddresses',
            query: 'SELECT address, balance, label FROM addresses WHERE balance > @minBalance ORDER BY balance DESC LIMIT @limit',
            description: 'Find whale addresses by minimum balance',
            parameters: [
                {
                    position: { key: 'minBalance', value: '{{USER_PARAM}}', location: 'bind' },
                    z: { primitive: 'number()', options: ['min(0)'] }
                },
                {
                    position: { key: 'limit', value: '{{USER_PARAM}}', location: 'bind' },
                    z: { primitive: 'number()', options: ['default(20)', 'max(1000)'] }
                }
            ],
            output: {
                mimeType: 'application/json',
                schema: {
                    type: 'array',
                    items: {
                        type: 'object',
                        properties: {
                            address: { type: 'string' },
                            balance: { type: 'number' },
                            label: { type: 'string', nullable: true }
                        }
                    }
                }
            },
            tests: [
                { _description: 'Top 10 whales', minBalance: 10000, limit: 10 }
            ]
        }
    }
}

Execution Flow

1. Read schema file as string
2. Static security scan (existing + new SQL scan)
3. Dynamic import
4. Extract main export
5. Validate main block (source-aware conditional validation)
6. Resolve sharedDatabases (check ~/.flowmcp/databases/, verify hash)
7. If handlers export exists:
   - Call handlers({ sharedLists, libraries })
   - Register ONLY postRequest handlers (no preRequest for sqlite)
8. On tool call:
   a. Validate parameters (z-block, unchanged)
   b. Open DB connection (read-only mode)
   c. db.prepare(query).all(bindings)
   d. Apply LIMIT cap (max 1000 rows)
   e. Pass rows to postRequest handler (if exists)
   f. Wrap in standard response envelope
   g. Close DB connection

Handler Restrictions for SQLite

Handler HTTP SQLite
preRequest Yes — modify struct + payload No — query is declarative
postRequest Yes — transform response Yes — transform rows

The handler receives { response } where response is an array of row objects. No db object is injected. The handler transforms data, nothing more.

// Correct — handler transforms rows
export const handlers = ({ sharedLists, libraries }) => ({
    queryWhales: {
        postRequest: async ({ response }) => {
            const formatted = response
                .map((row) => ({
                    ...row,
                    balanceEth: row.balance / 1e18
                }))
            return { response: formatted }
        }
    }
})

Security

SQL Static Scan

Before execution, the query string is scanned. Rejected patterns:

Pattern Reason
INSERT, UPDATE, DELETE No data mutation
DROP, CREATE, ALTER No DDL
ATTACH No foreign DB access
PRAGMA (except table_info) No DB configuration
; (semicolons) No multi-statement
-- (SQL comments) No comment injection

Runtime Constraints

Constraint Value
DB opened in read-only mode { readonly: true }
Max rows per query 1000
Query timeout 5 seconds
Only prepared statements No string concatenation

New location: 'bind'

Parameters with location: 'bind' are passed as prepared statement bindings:

// Schema declares:
position: { key: 'minBalance', value: '{{USER_PARAM}}', location: 'bind' }

// Core executes:
db.prepare('SELECT ... WHERE balance > @minBalance').all({ minBalance: 100 })

This is SQL-injection-safe by design — values are never interpolated into the query string.

Hashing

SQLite route hashes follow the same pattern, replacing HTTP-specific fields:

SHA256(JSON.stringify({
    namespace,
    version,
    source: 'sqlite',
    database: 'ethAddresses',
    route: {
        name: 'queryWhales',
        query: 'SELECT ...',
        parameters: [...],
        output: {...}
    },
    sharedListRefs: [...]
}))

CLI Extensions

# Register a database
flowmcp db add <name> <path-to-db>
# → copies to ~/.flowmcp/databases/<name>/
# → generates meta.json with hash + table info

# List registered databases
flowmcp db list

# Inspect database
flowmcp db info <name>

# Remove database
flowmcp db remove <name>

# Verify integrity
flowmcp db verify <name>

What Does NOT Change

Feature Unchanged
namespace, name, version, tags, docs Identical metadata
Parameter z-block validation Same primitives and options
output schema Same JSON Schema subset
tests format Same test declarations
Cherry-Pick Groups Same tool references
Shared Lists Can still be used for enum interpolation
Two-export model main + handlers
Standard response envelope { status, messages, data }

Dependencies

  • better-sqlite3 (MIT) added to flowmcp-core
  • SQLite itself is Public Domain

Suggested Implementation Order

  1. Spec document — Extend spec v2.1.0 with source: 'sqlite' section
  2. SqliteExecutor — New executor in flowmcp-core (~100 lines)
  3. MainValidator — Source-aware conditional validation
  4. SQL Security Scan — Static analysis for query strings
  5. CLI: flowmcp db — Database management commands in flowmcp-cli
  6. Proof of Concept — MCP Registry snapshot as first sharedDatabase

Example Use Cases

Database Rows Description
ethAddresses ~8M Ethereum address directory with labels
mcpRegistry ~500 MCP Server Registry snapshot
tokenMetadata ~50K ERC-20 token directory
contractLabels ~100K Known contract labels (Etherscan)
duneCache variable Cached Dune Analytics query results

This proposal follows the existing FlowMCP v2.0.0 patterns (sharedLists, two-export model, dependency injection, security scanning) and extends them for local data sources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions