Proposal: sharedDatabases — SQLite as local data source

## Proposal: `sharedDatabases` — SQLite as Local Data Source

### Summary

Extend FlowMCP with a new `source: 'sqlite'` type that enables schemas to query local SQLite databases instead of HTTP APIs. Databases are registered as **shared resources** (analogous to `sharedLists`) and made available to multiple schemas. Query execution happens entirely inside the core executor — handlers never get direct DB access.

### Motivation

FlowMCP currently only supports HTTP-based data sources. This limits tools to external APIs with rate limits, API keys, and latency. Many valuable datasets (address directories, token metadata, registry snapshots) could be served locally:

- **No API keys** — Zero cost per query
- **No rate limits** — Instant responses
- **Offline-capable** — Works without internet
- **Large datasets** — 8M+ rows queryable in milliseconds
- **Historical snapshots** — Registry/market data frozen in time

### Design Principles

Following the existing spec philosophy:

| Principle | Application |
|-----------|-------------|
| **Deterministic over clever** | SQL query declared in `main`, same input = same result |
| **Declare over code** | Query is static, hashable, no dynamic SQL |
| **Inject over import** | DB resolved by core, not by schema |
| **Hash over trust** | DB has SHA-256 hash in `meta.json` |
| **Constrain over permit** | Read-only, SELECT only, no DDL/DML |

### Architecture

```mermaid
flowchart TD
    A[Schema loaded] --> B{main.source?}
    B -->|http — default| C[HttpExecutor — unchanged]
    B -->|sqlite| D[SqliteExecutor — NEW]
    
    D --> D1["Resolve DB path from sharedDatabases"]
    D1 --> D2["db.prepare(query).all(bindings)"]
    D2 --> D3["Rows → postRequest handler → response"]
```

### No Mixing Rule

A schema is either `source: 'http'` or `source: 'sqlite'` — never both. There is no way to combine HTTP calls and SQLite queries within a single schema.

### Database Storage

Databases live as shared resources in a dedicated directory:

```
~/.flowmcp/
├── schemas/
├── shared-lists/
└── databases/              ← NEW
    └── eth-addresses/
        ├── meta.json       ← version, hash, table info
        └── eth-addresses.db
```

#### `meta.json` Format

```json
{
    "name": "ethAddresses",
    "version": "1.0.0",
    "description": "8M Ethereum addresses with labels and balances",
    "hash": "sha256:a1b2c3...",
    "tables": [
        {
            "name": "addresses",
            "rows": 8000000,
            "columns": ["address", "balance", "label", "firstSeen"]
        }
    ],
    "sizeBytes": 524288000,
    "updatedAt": "2026-02-20"
}
```

### Schema Format

#### New Top-Level Fields

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `source` | `'http' \| 'sqlite'` | No (default: `'http'`) | Executor discriminator |
| `databases` | `Array<{ ref, version }>` | Only if `source: 'sqlite'` | Declared DB dependencies |

#### Conditional Fields by Source

| Field | `source: 'http'` | `source: 'sqlite'` |
|-------|-------------------|---------------------|
| `root` | Required | Forbidden |
| `databases` | Forbidden | Required |
| `requiredServerParams` | Optional | Forbidden |
| `routes.method` | Required | Forbidden |
| `routes.path` | Required | Forbidden |
| `routes.database` | Forbidden | Required (ref name) |
| `routes.query` | Forbidden | Required (SELECT statement) |
| `parameters.location` | `insert \| query \| body` | `bind` |

#### Example Schema

```javascript
export const main = {
    source: 'sqlite',
    namespace: 'ethaddresses',
    name: 'WhaleTracker',
    description: 'Query Ethereum whale addresses from local database',
    version: '2.0.0',
    tags: ['ethereum', 'addresses', 'local'],
    databases: [
        { ref: 'ethAddresses', version: '1.0.0' }
    ],
    routes: {
        queryWhales: {
            database: 'ethAddresses',
            query: 'SELECT address, balance, label FROM addresses WHERE balance > @minBalance ORDER BY balance DESC LIMIT @limit',
            description: 'Find whale addresses by minimum balance',
            parameters: [
                {
                    position: { key: 'minBalance', value: '{{USER_PARAM}}', location: 'bind' },
                    z: { primitive: 'number()', options: ['min(0)'] }
                },
                {
                    position: { key: 'limit', value: '{{USER_PARAM}}', location: 'bind' },
                    z: { primitive: 'number()', options: ['default(20)', 'max(1000)'] }
                }
            ],
            output: {
                mimeType: 'application/json',
                schema: {
                    type: 'array',
                    items: {
                        type: 'object',
                        properties: {
                            address: { type: 'string' },
                            balance: { type: 'number' },
                            label: { type: 'string', nullable: true }
                        }
                    }
                }
            },
            tests: [
                { _description: 'Top 10 whales', minBalance: 10000, limit: 10 }
            ]
        }
    }
}
```

### Execution Flow

```
1. Read schema file as string
2. Static security scan (existing + new SQL scan)
3. Dynamic import
4. Extract main export
5. Validate main block (source-aware conditional validation)
6. Resolve sharedDatabases (check ~/.flowmcp/databases/, verify hash)
7. If handlers export exists:
   - Call handlers({ sharedLists, libraries })
   - Register ONLY postRequest handlers (no preRequest for sqlite)
8. On tool call:
   a. Validate parameters (z-block, unchanged)
   b. Open DB connection (read-only mode)
   c. db.prepare(query).all(bindings)
   d. Apply LIMIT cap (max 1000 rows)
   e. Pass rows to postRequest handler (if exists)
   f. Wrap in standard response envelope
   g. Close DB connection
```

### Handler Restrictions for SQLite

| Handler | HTTP | SQLite |
|---------|------|--------|
| `preRequest` | Yes — modify struct + payload | **No** — query is declarative |
| `postRequest` | Yes — transform response | **Yes** — transform rows |

The handler receives `{ response }` where `response` is an array of row objects. No `db` object is injected. The handler transforms data, nothing more.

```javascript
// Correct — handler transforms rows
export const handlers = ({ sharedLists, libraries }) => ({
    queryWhales: {
        postRequest: async ({ response }) => {
            const formatted = response
                .map((row) => ({
                    ...row,
                    balanceEth: row.balance / 1e18
                }))
            return { response: formatted }
        }
    }
})
```

### Security

#### SQL Static Scan

Before execution, the query string is scanned. Rejected patterns:

| Pattern | Reason |
|---------|--------|
| `INSERT`, `UPDATE`, `DELETE` | No data mutation |
| `DROP`, `CREATE`, `ALTER` | No DDL |
| `ATTACH` | No foreign DB access |
| `PRAGMA` (except `table_info`) | No DB configuration |
| `;` (semicolons) | No multi-statement |
| `--` (SQL comments) | No comment injection |

#### Runtime Constraints

| Constraint | Value |
|------------|-------|
| DB opened in read-only mode | `{ readonly: true }` |
| Max rows per query | 1000 |
| Query timeout | 5 seconds |
| Only prepared statements | No string concatenation |

### New `location: 'bind'`

Parameters with `location: 'bind'` are passed as prepared statement bindings:

```javascript
// Schema declares:
position: { key: 'minBalance', value: '{{USER_PARAM}}', location: 'bind' }

// Core executes:
db.prepare('SELECT ... WHERE balance > @minBalance').all({ minBalance: 100 })
```

This is SQL-injection-safe by design — values are never interpolated into the query string.

### Hashing

SQLite route hashes follow the same pattern, replacing HTTP-specific fields:

```javascript
SHA256(JSON.stringify({
    namespace,
    version,
    source: 'sqlite',
    database: 'ethAddresses',
    route: {
        name: 'queryWhales',
        query: 'SELECT ...',
        parameters: [...],
        output: {...}
    },
    sharedListRefs: [...]
}))
```

### CLI Extensions

```bash
# Register a database
flowmcp db add <name> <path-to-db>
# → copies to ~/.flowmcp/databases/<name>/
# → generates meta.json with hash + table info

# List registered databases
flowmcp db list

# Inspect database
flowmcp db info <name>

# Remove database
flowmcp db remove <name>

# Verify integrity
flowmcp db verify <name>
```

### What Does NOT Change

| Feature | Unchanged |
|---------|-----------|
| `namespace`, `name`, `version`, `tags`, `docs` | Identical metadata |
| Parameter `z`-block validation | Same primitives and options |
| `output` schema | Same JSON Schema subset |
| `tests` format | Same test declarations |
| Cherry-Pick Groups | Same tool references |
| Shared Lists | Can still be used for enum interpolation |
| Two-export model | `main` + `handlers` |
| Standard response envelope | `{ status, messages, data }` |

### Dependencies

- `better-sqlite3` (MIT) added to `flowmcp-core`
- SQLite itself is **Public Domain**

### Suggested Implementation Order

1. **Spec document** — Extend spec v2.1.0 with `source: 'sqlite'` section
2. **SqliteExecutor** — New executor in `flowmcp-core` (~100 lines)
3. **MainValidator** — Source-aware conditional validation
4. **SQL Security Scan** — Static analysis for query strings
5. **CLI: `flowmcp db`** — Database management commands in `flowmcp-cli`
6. **Proof of Concept** — MCP Registry snapshot as first `sharedDatabase`

### Example Use Cases

| Database | Rows | Description |
|----------|------|-------------|
| `ethAddresses` | ~8M | Ethereum address directory with labels |
| `mcpRegistry` | ~500 | MCP Server Registry snapshot |
| `tokenMetadata` | ~50K | ERC-20 token directory |
| `contractLabels` | ~100K | Known contract labels (Etherscan) |
| `duneCache` | variable | Cached Dune Analytics query results |

---

*This proposal follows the existing FlowMCP v2.0.0 patterns (sharedLists, two-export model, dependency injection, security scanning) and extends them for local data sources.*

Principle	Application
Deterministic over clever	SQL query declared in `main`, same input = same result
Declare over code	Query is static, hashable, no dynamic SQL
Inject over import	DB resolved by core, not by schema
Hash over trust	DB has SHA-256 hash in `meta.json`
Constrain over permit	Read-only, SELECT only, no DDL/DML

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: sharedDatabases — SQLite as local data source #45

Proposal: `sharedDatabases` — SQLite as Local Data Source

Summary

Motivation

Design Principles

Architecture

No Mixing Rule

Database Storage

`meta.json` Format

Schema Format

New Top-Level Fields

Conditional Fields by Source

Example Schema

Execution Flow

Handler Restrictions for SQLite

Security

SQL Static Scan

Runtime Constraints

New `location: 'bind'`

Hashing

CLI Extensions

What Does NOT Change

Dependencies

Suggested Implementation Order

Example Use Cases

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Type	Required	Description
`source`	`'http' \| 'sqlite'`	No (default: `'http'`)	Executor discriminator
`databases`	`Array<{ ref, version }>`	Only if `source: 'sqlite'`	Declared DB dependencies

Field	`source: 'http'`	`source: 'sqlite'`
`root`	Required	Forbidden
`databases`	Forbidden	Required
`requiredServerParams`	Optional	Forbidden
`routes.method`	Required	Forbidden
`routes.path`	Required	Forbidden
`routes.database`	Forbidden	Required (ref name)
`routes.query`	Forbidden	Required (SELECT statement)
`parameters.location`	`insert \| query \| body`	`bind`

Handler	HTTP	SQLite
`preRequest`	Yes — modify struct + payload	No — query is declarative
`postRequest`	Yes — transform response	Yes — transform rows

Pattern	Reason
`INSERT`, `UPDATE`, `DELETE`	No data mutation
`DROP`, `CREATE`, `ALTER`	No DDL
`ATTACH`	No foreign DB access
`PRAGMA` (except `table_info`)	No DB configuration
`;` (semicolons)	No multi-statement
`--` (SQL comments)	No comment injection

Constraint	Value
DB opened in read-only mode	`{ readonly: true }`
Max rows per query	1000
Query timeout	5 seconds
Only prepared statements	No string concatenation

Feature	Unchanged
`namespace`, `name`, `version`, `tags`, `docs`	Identical metadata
Parameter `z`-block validation	Same primitives and options
`output` schema	Same JSON Schema subset
`tests` format	Same test declarations
Cherry-Pick Groups	Same tool references
Shared Lists	Can still be used for enum interpolation
Two-export model	`main` + `handlers`
Standard response envelope	`{ status, messages, data }`

Database	Rows	Description
`ethAddresses`	~8M	Ethereum address directory with labels
`mcpRegistry`	~500	MCP Server Registry snapshot
`tokenMetadata`	~50K	ERC-20 token directory
`contractLabels`	~100K	Known contract labels (Etherscan)
`duneCache`	variable	Cached Dune Analytics query results

Proposal: sharedDatabases — SQLite as local data source #45

Description

Proposal: sharedDatabases — SQLite as Local Data Source

Summary

Motivation

Design Principles

Architecture

No Mixing Rule

Database Storage

meta.json Format

Schema Format

New Top-Level Fields

Conditional Fields by Source

Example Schema

Execution Flow

Handler Restrictions for SQLite

Security

SQL Static Scan

Runtime Constraints

New location: 'bind'

Hashing

CLI Extensions

What Does NOT Change

Dependencies

Suggested Implementation Order

Example Use Cases

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal: `sharedDatabases` — SQLite as Local Data Source

`meta.json` Format

New `location: 'bind'`