Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ jobs:
- run: bun install

- name: Extract metadata
run: bun run bin/cli.ts extract-metadata examples/v1/metadata.json /tmp/databases
run: bun run bin/cli.ts extract-table-metadata examples/v1/table_metadata.json /tmp/databases

- name: Extract field values
run: bun run bin/cli.ts extract-field-values examples/v1/metadata.json examples/v1/field-values.json /tmp/databases
run: bun run bin/cli.ts extract-field-values examples/v1/table_metadata.json examples/v1/field_values.json /tmp/databases

- name: Diff examples
run: diff -r examples/v1/databases /tmp/databases
82 changes: 23 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like `["Sample Database", "PUBLIC", "ORDERS"]` instead of database identifiers.

This repository contains the specification, examples, and a CLI that converts the JSON returned by Metabase's `GET /api/database/metadata` endpoint into YAML.
This repository contains the specification, examples, and a CLI that converts the `table_metadata.json` downloaded from a Metabase workspace page into YAML.

## Specification

The format is defined in **[core-spec/v1/spec.md](core-spec/v1/spec.md)** (v1.0.4). It covers entity keys, field types, folder structure, sampled field values, and the shape of each entity.

Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `metadata.json` returned by the endpoint and the extracted YAML tree.
Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `table_metadata.json` and the extracted YAML tree.

### Entities

Expand All @@ -20,42 +20,30 @@ Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)

## Obtaining metadata

Metadata is fetched on demand from a running Metabase instance via `GET /api/database/metadata`. The response is a flat JSON document with three arrays — `databases`, `tables`, and `fields` — streamed so that even warehouses with very large schemas can be exported without exhausting server memory.

Authenticate with either a session token (`X-Metabase-Session`) or an API key (`X-API-Key`):

```sh
curl "$METABASE_URL/api/database/metadata" \
-H "X-API-Key: $METABASE_API_KEY" \
-o metadata.json
```
Metadata is downloaded as `table_metadata.json` from the Metabase workspace page (Workspaces → the relevant workspace → "Download table_metadata.json"). The file is a flat JSON document with three arrays — `databases`, `tables`, and `fields` — that even warehouses with very large schemas can produce without exhausting server memory.

### Extracting metadata to YAML

The CLI turns that JSON into the human- and agent-friendly YAML tree described in the spec:

```sh
bunx @metabase/database-metadata extract-metadata <input-file> <output-folder>
bunx @metabase/database-metadata extract-table-metadata <input-file> <output-folder>
```

- `<input-file>` — path to the `metadata.json` produced by the API.
- `<input-file>` — path to the `table_metadata.json` downloaded from the workspace page.
- `<output-folder>` — destination directory. Database folders are created directly under it.

### Extracting field values

Metabase keeps a sampled list of distinct values for each field that's low-cardinality enough to enumerate (the same list that powers filter dropdowns in the UI). Fetch it and extract it alongside the metadata:
Metabase keeps a sampled list of distinct values for each field that's low-cardinality enough to enumerate (the same list that powers filter dropdowns in the UI). Download `field_values.json` from the same workspace page and extract it alongside the metadata:

```sh
curl "$METABASE_URL/api/database/field-values" \
-H "X-API-Key: $METABASE_API_KEY" \
-o field-values.json

bunx @metabase/database-metadata extract-field-values <metadata-file> <field-values-file> <output-folder>
```

- `<metadata-file>` — the same `metadata.json` used by `extract-metadata`. Field values reference fields by numeric ID, which the CLI resolves to natural keys using the metadata.
- `<field-values-file>` — path to the `field-values.json` returned by the endpoint.
- `<output-folder>` — destination directory; typically the same one used for `extract-metadata`, so values files land next to the table YAMLs they belong to.
- `<metadata-file>` — the same `table_metadata.json` used by `extract-table-metadata`. Field values reference fields by numeric ID, which the CLI resolves to natural keys using the metadata.
- `<field-values-file>` — path to the `field_values.json` downloaded from the workspace page.
- `<output-folder>` — destination directory; typically the same one used for `extract-table-metadata`, so values files land next to the table YAMLs they belong to.

One YAML file is written per field that has values. Fields with empty samples are skipped; field IDs not present in the metadata are reported as orphans and skipped. See the spec's [Field Values](core-spec/v1/spec.md#field-values) section for the on-disk shape and when agents should consult these files.

Expand All @@ -73,18 +61,18 @@ Omit `--file` to write `spec.md` into the current directory.

The following is the **default** workflow for a project that wants to use Metabase metadata. It is a convention, not a requirement — teams are free to organize things differently.

### 1. A `.metabase/` directory at the repo root
### 1. A `.metadata/` directory at the repo root

Create a top-level `.metabase/` directory and **add it to `.gitignore`**. This is where the raw `metadata.json` and the extracted `databases/` YAML tree live:
Create a top-level `.metadata/` directory and **add it to `.gitignore`**. This is where the raw `table_metadata.json` and the extracted `databases/` YAML tree live:

```
.metabase/
├── metadata.json
.metadata/
├── table_metadata.json
└── databases/
└── …
```

### 2. Why `.metabase/` should not be committed
### 2. Why `.metadata/` should not be committed

On a large data warehouse the metadata export can easily reach **hundreds of megabytes or several gigabytes**. Committing it:

Expand All @@ -94,44 +82,20 @@ On a large data warehouse the metadata export can easily reach **hundreds of meg

Each developer (or a CI job) fetches metadata on demand from their own Metabase instance instead.

### 3. Credentials via a gitignored `.env` file

Check in an **`.env.template`** at the repo root with placeholders:

```env
METABASE_URL=https://metabase.example.com
METABASE_API_KEY=
```
### 3. Download from the workspace page and extract

Each developer copies it to `.env` (also gitignored) and fills in the real values:
Each developer downloads `table_metadata.json` (and optionally `field_values.json`) from the Metabase workspace page and drops them into `.metadata/`. Then run the extractors:

```sh
cp .env.template .env
# edit .env to set METABASE_URL and METABASE_API_KEY
```

### 4. Fetch and extract on demand

With `.env` populated, the end-to-end flow is:

```sh
set -a; source .env; set +a

mkdir -p .metabase
curl -sf "$METABASE_URL/api/database/metadata" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metabase/metadata.json

curl -sf "$METABASE_URL/api/database/field-values" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metabase/field-values.json
mkdir -p .metadata
# Drop table_metadata.json (and optionally field_values.json) from the workspace page into .metadata/

rm -rf .metabase/databases
bunx @metabase/database-metadata extract-metadata .metabase/metadata.json .metabase/databases
bunx @metabase/database-metadata extract-field-values .metabase/metadata.json .metabase/field-values.json .metabase/databases
rm -rf .metadata/databases
bunx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases
bunx @metabase/database-metadata extract-field-values .metadata/table_metadata.json .metadata/field_values.json .metadata/databases
```

After this, tools and agents should read the YAML tree under `.metabase/databases/` — not `metadata.json` or `field-values.json`, which exist only as input to the extractors.
After this, tools and agents should read the YAML tree under `.metadata/databases/` — not `table_metadata.json` or `field_values.json`, which exist only as input to the extractors.

## Publishing to NPM

Expand All @@ -145,7 +109,7 @@ The workflow requires an `NPM_RELEASE_TOKEN` secret with publish access to the `

```sh
bun install
bun bin/cli.ts extract-metadata examples/v1/metadata.json /tmp/.metabase/databases
bun bin/cli.ts extract-table-metadata examples/v1/table_metadata.json /tmp/.metadata/databases
```

### Scripts
Expand Down
10 changes: 5 additions & 5 deletions bin/cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ import { join, resolve } from "path";

const REPO_ROOT = resolve(import.meta.dirname, "..");
const CLI = "bin/cli.ts";
const EXAMPLE_INPUT = "examples/v1/metadata.json";
const EXAMPLE_FIELD_VALUES = "examples/v1/field-values.json";
const EXAMPLE_INPUT = "examples/v1/table_metadata.json";
const EXAMPLE_FIELD_VALUES = "examples/v1/field_values.json";

type RunResult = {
stdout: string;
Expand Down Expand Up @@ -47,7 +47,7 @@ describe("cli", () => {
});
});

describe("extract-metadata", () => {
describe("extract-table-metadata", () => {
let workdir: string;

beforeEach(() => {
Expand All @@ -60,7 +60,7 @@ describe("cli", () => {

it("extracts the bundled example into YAML files", () => {
const { stdout, exitCode } = runCli([
"extract-metadata",
"extract-table-metadata",
EXAMPLE_INPUT,
workdir,
]);
Expand All @@ -73,7 +73,7 @@ describe("cli", () => {
});

it("errors when arguments are missing", () => {
const { stderr, exitCode } = runCli(["extract-metadata"]);
const { stderr, exitCode } = runCli(["extract-table-metadata"]);
expect(exitCode).toBe(1);
expect(stderr).toContain(
"<input-file> and <output-folder> arguments are required",
Expand Down
8 changes: 4 additions & 4 deletions bin/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import { parseArgs } from "node:util";

import { extractFieldValues } from "../src/extract-field-values.js";
import { extractMetadata } from "../src/extract-metadata.js";
import { extractTableMetadata } from "../src/extract-table-metadata.js";
import { extractSpec } from "../src/extract-spec.js";

type ParsedValues = {
Expand All @@ -14,7 +14,7 @@ type ParsedValues = {
const HELP = `Usage: database-metadata <command> [arguments] [options]

Commands:
extract-metadata <input-file> <output-folder> Extract metadata JSON into YAML files
extract-table-metadata <input-file> <output-folder> Extract metadata JSON into YAML files
Writes one YAML per database + one per table
with fields nested inside.

Expand Down Expand Up @@ -50,7 +50,7 @@ function handleExtractMetadata(positionals: string[]): void {
process.exit(1);
}

const stats = extractMetadata({ inputFile, outputFolder });
const stats = extractTableMetadata({ inputFile, outputFolder });
console.log(
`Extracted ${stats.databases} databases, ${stats.tables} tables, ${stats.fields} fields`,
);
Expand Down Expand Up @@ -96,7 +96,7 @@ function main(): void {
}

switch (command) {
case "extract-metadata":
case "extract-table-metadata":
return handleExtractMetadata(positionals);
case "extract-field-values":
return handleExtractFieldValues(positionals);
Expand Down
14 changes: 7 additions & 7 deletions core-spec/v1/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Metabase database metadata is a read-only snapshot of databases, tables, and fie

The format is designed to be **portable** and **reviewable**: numeric IDs are omitted or replaced with human-readable natural keys (database name, `[database, schema, table]` tuples, etc.). Files can be diffed, grepped, and edited by hand.

The raw API response (`metadata.json`) is a single flat JSON document with `databases`, `tables`, and `fields` arrays, optimized for transport rather than reading. It can be arbitrarily large — tens or hundreds of megabytes on warehouses with many tables — and is not intended for direct consumption. Tools and humans should read the extracted YAML tree under `databases/` instead, where each entity lives in its own small file and foreign keys are resolved to natural-key tuples.
The raw `table_metadata.json` (downloaded from the Metabase workspace page) is a single flat JSON document with `databases`, `tables`, and `fields` arrays, optimized for transport rather than reading. It can be arbitrarily large — tens or hundreds of megabytes on warehouses with many tables — and is not intended for direct consumption. Tools and humans should read the extracted YAML tree under `databases/` instead, where each entity lives in its own small file and foreign keys are resolved to natural-key tuples.

## Table of Contents

Expand Down Expand Up @@ -123,10 +123,10 @@ Common semantic types, grouped by purpose:

## Folder Structure

By convention, metadata is extracted under a `.metabase/databases/` directory, with each database occupying its own folder. The exporter itself doesn't enforce this location; it writes the tree below into whatever folder the caller passes.
By convention, metadata is extracted under a `.metadata/databases/` directory, with each database occupying its own folder. The exporter itself doesn't enforce this location; it writes the tree below into whatever folder the caller passes.

```
.metabase/
.metadata/
└── databases/
└── {database}/
├── {database}.yaml
Expand Down Expand Up @@ -268,13 +268,13 @@ Field values are **sampled, not exhaustive**: Metabase caps the list (typically

### Extraction order

**Field values must be extracted *after* metadata, never before or in isolation.** The raw `field-values.json` references fields by numeric `field_id` only; resolving those IDs to the natural-key tuples used everywhere in this format requires the metadata index. The extractor takes both `metadata.json` and `field-values.json` as inputs, and the two **must come from the same Metabase instance at the same point in time** — a stale metadata file paired with a fresh values file (or vice versa) will silently drop entries as orphans whenever a field has been added, removed, or had its ID reassigned.
**Field values must be extracted *after* metadata, never before or in isolation.** The raw `field_values.json` references fields by numeric `field_id` only; resolving those IDs to the natural-key tuples used everywhere in this format requires the metadata index. The extractor takes both `table_metadata.json` and `field_values.json` as inputs, and the two **must come from the same Metabase workspace download at the same point in time** — a stale metadata file paired with a fresh values file (or vice versa) will silently drop entries as orphans whenever a field has been added, removed, or had its ID reassigned.

The recommended workflow is therefore strictly sequential:

1. Fetch `metadata.json` from the Metabase instance.
2. Run `extract-metadata` to write the database/table/field YAML tree.
3. Fetch `field-values.json` from the **same** instance, ideally back-to-back with step 1.
1. Download `table_metadata.json` from the Metabase workspace page.
2. Run `extract-table-metadata` to write the database/table/field YAML tree.
3. Download `field_values.json` from the **same** workspace, ideally back-to-back with step 1.
4. Run `extract-field-values` against the same output folder to drop per-field values files into the existing tree.

Agents reading the tree can rely on this ordering: any `{table}/{field}.yaml` file is guaranteed to have a corresponding entry in the parent `{table}.yaml`'s `fields` array.
Expand Down
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@metabase/database-metadata",
"version": "1.0.4",
"version": "1.0.5",
"description": "CLI tool to extract Metabase database metadata into YAML files",
"license": "SEE LICENSE IN LICENSE.txt",
"repository": {
Expand Down
14 changes: 7 additions & 7 deletions src/extract-field-values.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ import yaml from "js-yaml";
import { extractFieldValues } from "./extract-field-values.js";

const REPO_ROOT = resolve(import.meta.dirname, "..");
const EXAMPLE_METADATA = join(REPO_ROOT, "examples/v1/metadata.json");
const EXAMPLE_FIELD_VALUES = join(REPO_ROOT, "examples/v1/field-values.json");
const EXAMPLE_METADATA = join(REPO_ROOT, "examples/v1/table_metadata.json");
const EXAMPLE_FIELD_VALUES = join(REPO_ROOT, "examples/v1/field_values.json");

describe("extractFieldValues", () => {
let workdir: string;
Expand Down Expand Up @@ -110,7 +110,7 @@ describe("extractFieldValues", () => {
},
],
};
const fieldValuesFile = join(workdir, "field-values.json");
const fieldValuesFile = join(workdir, "field_values.json");
writeFileSync(fieldValuesFile, JSON.stringify(fieldValues));

const stats = extractFieldValues({
Expand All @@ -137,7 +137,7 @@ describe("extractFieldValues", () => {
},
],
};
const fieldValuesFile = join(workdir, "field-values.json");
const fieldValuesFile = join(workdir, "field_values.json");
writeFileSync(fieldValuesFile, JSON.stringify(fieldValues));

const stats = extractFieldValues({
Expand All @@ -154,7 +154,7 @@ describe("extractFieldValues", () => {
});

it("joins nested JSON field paths with dots in the filename", () => {
const metadataPath = join(workdir, "metadata.json");
const metadataPath = join(workdir, "table_metadata.json");
writeFileSync(
metadataPath,
JSON.stringify({
Expand All @@ -168,7 +168,7 @@ describe("extractFieldValues", () => {
}),
);

const fieldValuesPath = join(workdir, "field-values.json");
const fieldValuesPath = join(workdir, "field_values.json");
writeFileSync(
fieldValuesPath,
JSON.stringify({
Expand Down Expand Up @@ -219,7 +219,7 @@ describe("extractFieldValues", () => {
},
],
};
const fieldValuesFile = join(workdir, "field-values.json");
const fieldValuesFile = join(workdir, "field_values.json");
writeFileSync(fieldValuesFile, JSON.stringify(fieldValues));

extractFieldValues({
Expand Down
Loading
Loading