Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,7 @@ jobs:
- run: bun install

- name: Extract metadata
run: bun run bin/cli.ts extract-table-metadata examples/v1/table_metadata.json /tmp/databases

- name: Extract field values
run: bun run bin/cli.ts extract-field-values examples/v1/table_metadata.json examples/v1/field_values.json /tmp/databases
run: bun run bin/cli.ts extract-table-metadata examples/v1/metadata.json /tmp/databases

- name: Diff examples
run: diff -r examples/v1/databases /tmp/databases
41 changes: 13 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like `["Sample Database", "PUBLIC", "ORDERS"]` instead of database identifiers.

This repository contains the specification, examples, and a CLI that converts the `table_metadata.json` downloaded from a Metabase workspace page into YAML.
This repository contains the specification, examples, and a CLI that converts the `metadata.json` downloaded from a Metabase instance into YAML.

## Specification

The format is defined in **[core-spec/v1/spec.md](core-spec/v1/spec.md)** (v1.0.4). It covers entity keys, field types, folder structure, sampled field values, and the shape of each entity.
The format is defined in **[core-spec/v1/spec.md](core-spec/v1/spec.md)** (v1.0.4). It covers entity keys, field types, folder structure, and the shape of each entity.

Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `table_metadata.json` and the extracted YAML tree.
Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `metadata.json` and the extracted YAML tree.

### Entities

Expand All @@ -20,7 +20,7 @@ Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)

## Obtaining metadata

Metadata is downloaded as `table_metadata.json` from the Metabase workspace page (Workspaces → the relevant workspace → "Download table_metadata.json"). The file is a flat JSON document with three arrays `databases`, `tables`, and `fields` — that even warehouses with very large schemas can produce without exhausting server memory.
Metadata is fetched from Metabase's `GET /api/ee/serialization/metadata/export` endpoint as a `metadata.json` file a flat JSON document with three arrays (`databases`, `tables`, and `fields`) streamed so even warehouses with very large schemas can be exported without exhausting server memory.

### Extracting metadata to YAML

Expand All @@ -30,23 +30,9 @@ The CLI turns that JSON into the human- and agent-friendly YAML tree described i
bunx @metabase/database-metadata extract-table-metadata <input-file> <output-folder>
```

- `<input-file>` — path to the `table_metadata.json` downloaded from the workspace page.
- `<input-file>` — path to the `metadata.json` downloaded from Metabase.
- `<output-folder>` — destination directory. Database folders are created directly under it.

### Extracting field values

Metabase keeps a sampled list of distinct values for each field that's low-cardinality enough to enumerate (the same list that powers filter dropdowns in the UI). Download `field_values.json` from the same workspace page and extract it alongside the metadata:

```sh
bunx @metabase/database-metadata extract-field-values <metadata-file> <field-values-file> <output-folder>
```

- `<metadata-file>` — the same `table_metadata.json` used by `extract-table-metadata`. Field values reference fields by numeric ID, which the CLI resolves to natural keys using the metadata.
- `<field-values-file>` — path to the `field_values.json` downloaded from the workspace page.
- `<output-folder>` — destination directory; typically the same one used for `extract-table-metadata`, so values files land next to the table YAMLs they belong to.

One YAML file is written per field that has values. Fields with empty samples are skipped; field IDs not present in the metadata are reported as orphans and skipped. See the spec's [Field Values](core-spec/v1/spec.md#field-values) section for the on-disk shape and when agents should consult these files.

### Extracting the spec

The bundled spec can be extracted to any file — convenient for agents that need to read it locally:
Expand All @@ -63,11 +49,11 @@ The following is the **default** workflow for a project that wants to use Metaba

### 1. A `.metadata/` directory at the repo root

Create a top-level `.metadata/` directory and **add it to `.gitignore`**. This is where the raw `table_metadata.json` and the extracted `databases/` YAML tree live:
Create a top-level `.metadata/` directory and **add it to `.gitignore`**. This is where the raw `metadata.json` and the extracted `databases/` YAML tree live:

```
.metadata/
├── table_metadata.json
├── metadata.json
└── databases/
└── …
```
Expand All @@ -82,20 +68,19 @@ On a large data warehouse the metadata export can easily reach **hundreds of meg

Each developer (or a CI job) fetches metadata on demand from their own Metabase instance instead.

### 3. Download from the workspace page and extract
### 3. Download from Metabase and extract

Each developer downloads `table_metadata.json` (and optionally `field_values.json`) from the Metabase workspace page and drops them into `.metadata/`. Then run the extractors:
Each developer downloads `metadata.json` from their Metabase instance and drops it into `.metadata/`. Then run the extractor:

```sh
mkdir -p .metadata
# Drop table_metadata.json (and optionally field_values.json) from the workspace page into .metadata/
# Drop metadata.json from Metabase into .metadata/

rm -rf .metadata/databases
bunx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases
bunx @metabase/database-metadata extract-field-values .metadata/table_metadata.json .metadata/field_values.json .metadata/databases
bunx @metabase/database-metadata extract-table-metadata .metadata/metadata.json .metadata/databases
```

After this, tools and agents should read the YAML tree under `.metadata/databases/` — not `table_metadata.json` or `field_values.json`, which exist only as input to the extractors.
After this, tools and agents should read the YAML tree under `.metadata/databases/` — not `metadata.json`, which exists only as input to the extractor.

## Publishing to NPM

Expand All @@ -109,7 +94,7 @@ The workflow requires an `NPM_RELEASE_TOKEN` secret with publish access to the `

```sh
bun install
bun bin/cli.ts extract-table-metadata examples/v1/table_metadata.json /tmp/.metadata/databases
bun bin/cli.ts extract-table-metadata examples/v1/metadata.json /tmp/.metadata/databases
```

### Scripts
Expand Down
43 changes: 1 addition & 42 deletions bin/cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ import { join, resolve } from "path";

const REPO_ROOT = resolve(import.meta.dirname, "..");
const CLI = "bin/cli.ts";
const EXAMPLE_INPUT = "examples/v1/table_metadata.json";
const EXAMPLE_FIELD_VALUES = "examples/v1/field_values.json";
const EXAMPLE_INPUT = "examples/v1/metadata.json";

type RunResult = {
stdout: string;
Expand Down Expand Up @@ -81,46 +80,6 @@ describe("cli", () => {
});
});

describe("extract-field-values", () => {
let workdir: string;

beforeEach(() => {
workdir = mkdtempSync(join(tmpdir(), "database-metadata-values-cli-"));
});

afterEach(() => {
rmSync(workdir, { recursive: true, force: true });
});

it("extracts the bundled example field values", () => {
const { stdout, exitCode } = runCli([
"extract-field-values",
EXAMPLE_INPUT,
EXAMPLE_FIELD_VALUES,
workdir,
]);
expect(exitCode).toBe(0);
expect(stdout).toContain("Extracted values for 4 fields");

const statePath = join(
workdir,
"Sample Database/schemas/PUBLIC/tables/PEOPLE/STATE.yaml",
);
expect(existsSync(statePath)).toBe(true);
});

it("errors when arguments are missing", () => {
const { stderr, exitCode } = runCli([
"extract-field-values",
EXAMPLE_INPUT,
]);
expect(exitCode).toBe(1);
expect(stderr).toContain(
"<metadata-file>, <field-values-file>, and <output-folder> arguments are required",
);
});
});

describe("extract-spec", () => {
let workdir: string;

Expand Down
39 changes: 4 additions & 35 deletions bin/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import { parseArgs } from "node:util";

import { extractFieldValues } from "../src/extract-field-values.js";
import { extractTableMetadata } from "../src/extract-table-metadata.js";
import { extractSpec } from "../src/extract-spec.js";

Expand All @@ -18,11 +17,6 @@ Commands:
Writes one YAML per database + one per table
with fields nested inside.

extract-field-values <metadata-file> <field-values-file> <output-folder>
Extract field values JSON into YAML files
placed next to each table YAML, one per
field that has sampled values.

extract-spec Copy the bundled spec.md into a target file
--file <path> Destination file (default: ./spec.md)

Expand All @@ -39,7 +33,7 @@ function parseArguments() {
});
}

function handleExtractMetadata(positionals: string[]): void {
async function handleExtractMetadata(positionals: string[]): Promise<void> {
const inputFile = positionals[1];
const outputFolder = positionals[2];

Expand All @@ -50,43 +44,20 @@ function handleExtractMetadata(positionals: string[]): void {
process.exit(1);
}

const stats = extractTableMetadata({ inputFile, outputFolder });
const stats = await extractTableMetadata({ inputFile, outputFolder });
console.log(
`Extracted ${stats.databases} databases, ${stats.tables} tables, ${stats.fields} fields`,
);
process.exit(0);
}

function handleExtractFieldValues(positionals: string[]): void {
const metadataFile = positionals[1];
const fieldValuesFile = positionals[2];
const outputFolder = positionals[3];

if (!metadataFile || !fieldValuesFile || !outputFolder) {
console.error(
"Error: <metadata-file>, <field-values-file>, and <output-folder> arguments are required",
);
process.exit(1);
}

const stats = extractFieldValues({
metadataFile,
fieldValuesFile,
outputFolder,
});
console.log(
`Extracted values for ${stats.fieldsWithValues} fields (${stats.fieldsSkipped} skipped, ${stats.orphans} orphans)`,
);
process.exit(0);
}

function handleExtractSpec(values: ParsedValues): void {
const { target } = extractSpec({ file: values.file ?? "spec.md" });
console.log(`Spec extracted to ${target}`);
process.exit(0);
}

function main(): void {
async function main(): Promise<void> {
const { values, positionals } = parseArguments();
const command = positionals[0];

Expand All @@ -98,8 +69,6 @@ function main(): void {
switch (command) {
case "extract-table-metadata":
return handleExtractMetadata(positionals);
case "extract-field-values":
return handleExtractFieldValues(positionals);
case "extract-spec":
return handleExtractSpec(values);
default:
Expand All @@ -108,4 +77,4 @@ function main(): void {
}
}

main();
await main();
5 changes: 5 additions & 0 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading