From d01815440d821bfa2071b318a66ed19843dcd88f Mon Sep 17 00:00:00 2001 From: Yogesh Rao Date: Wed, 13 May 2026 12:52:14 +0530 Subject: [PATCH] feat: improve rill-model skill score from 45% to 84% MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hey @begelundmuller 👋 I ran your skills through `tessl skill review` at work and found some targeted improvements for `rill-model`. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | rill-canvas | 45% | 45% | — | | rill-connector | 45% | 45% | — | | rill-development | 44% | 44% | — | | rill-explore | 52% | 52% | — | | rill-metrics-view | 45% | 45% | — | | **rill-model** | **45%** | **84%** | **+39%** | | rill-rillyaml | 40% | 40% | — | | rill-theme | 46% | 46% | — | Changes in rill-model: - Rewrote frontmatter description with specific actions, domain keywords, and "Use when..." clause - Compressed model categories into a concise table - Condensed synthetic data section - Added validation & troubleshooting workflow section - Created REFERENCE.md companion quick-reference property table Note: These skills are auto-generated via rill init --agent agentsmd. Consider applying similar improvements to the generation source in the Rill CLI. --- skills/rill-model/REFERENCE.md | 27 +++++++++++++++++++++ skills/rill-model/SKILL.md | 43 +++++++++++++++++++++------------- 2 files changed, 54 insertions(+), 16 deletions(-) create mode 100644 skills/rill-model/REFERENCE.md diff --git a/skills/rill-model/REFERENCE.md b/skills/rill-model/REFERENCE.md new file mode 100644 index 0000000..8390ca6 --- /dev/null +++ b/skills/rill-model/REFERENCE.md @@ -0,0 +1,27 @@ +# Rill Model Property Quick Reference + +| Property | Type | Default | Description | +|----------|------|---------|-------------| +| `type` | string | required | Must be `model` | +| `materialize` | boolean | auto | `true` for cross-connector, `false` for same-connector | +| `incremental` | boolean | `false` | Enable incremental data loading | +| `connector` | string | — | Input connector name (e.g., `bigquery`, `snowflake`, `duckdb`) | +| `sql` | string | required | SQL query — plain SELECT, no trailing semicolon | +| `partitions` | object | — | Glob-based or SQL-based partition configuration | +| `state` | object | — | Watermark state for incremental models (alternative to partitions) | +| `output.connector` | string | default OLAP | Output connector (e.g., `clickhouse`, `duckdb`) | +| `output.incremental_strategy` | string | varies | `partition_overwrite`, `merge`, or `append` | +| `output.unique_key` | array | — | Columns for merge deduplication | +| `output.order_by` | string | — | Required for ClickHouse output | +| `output.partition_by` | string | — | Column/expression for table partitioning | +| `output.ttl` | string | — | ClickHouse data retention (e.g., `event_time + INTERVAL 90 DAY DELETE`) | +| `refresh.cron` | string | — | Cron schedule for source model refresh | +| `change_mode` | string | `reset` | How spec changes apply: `reset`, `manual`, or `patch` | +| `dev` | object | — | Development-only property overrides (e.g., limited partitions) | +| `timeout` | string | — | Max ingestion wait time (e.g., `72h`) | +| `pre_exec` | string | — | SQL to run before main query (DuckDB/ClickHouse) | +| `post_exec` | string | — | SQL to run after main query (DuckDB/ClickHouse) | +| `stage.connector` | string | — | Staging connector for incompatible source→output pairs | +| `stage.path` | string | — | Staging path (e.g., `s3://bucket/staging/`) | + +See [SKILL.md](SKILL.md) for full examples, dialect-specific notes, and the complete JSON schema. diff --git a/skills/rill-model/SKILL.md b/skills/rill-model/SKILL.md index d0d98dc..7bc3243 100644 --- a/skills/rill-model/SKILL.md +++ b/skills/rill-model/SKILL.md @@ -1,10 +1,12 @@ --- name: rill-model -description: Detailed instructions and examples for developing model resources in Rill +description: "Creates and configures Rill model YAML and SQL files for data pipelines — source models ingesting from S3, BigQuery, Snowflake, or GCS into DuckDB or ClickHouse, derived models with SQL joins, incremental and partition-based ingestion, materialization, dev partition limits, and refresh schedules. Use when the user needs to create or edit a Rill model, configure cross-connector ETL, set up incremental or partitioned data loading, or write SQL transformations in a Rill project." --- # Instructions for developing a model in Rill +> **Quick reference**: See [REFERENCE.md](REFERENCE.md) for a property lookup table covering all model configuration options. + ## Introduction Models are resources that specify ETL or transformation logic, outputting a tabular dataset to one of the project's connectors. They are typically found near the root of the project's DAG, referencing only connectors and other models. @@ -19,13 +21,13 @@ Models in Rill are similar to models in dbt, but support additional advanced fea ### Model categories -When reasoning about a model, consider these attributes: - -- **Source model**: References external data, typically reading from a SQL database or object store connector and writing to an OLAP connector. -- **Derived model**: References other models, usually performing joins or formatting columns to prepare denormalized tables for metrics views and dashboards. -- **Incremental model**: Contains logic for incrementally loading data, processing only new or changed records. -- **Partitioned model**: Loads data in well-defined increments (e.g., daily partitions), enabling scalability and idempotent incremental runs. -- **Materialized model**: Outputs a physical table rather than a SQL view. +| Category | Description | Typical DAG position | +|----------|-------------|---------------------| +| Source | Reads from external connector (SQL DB, object store) into OLAP | Root — no parent models | +| Derived | Joins/transforms other models for metrics views and dashboards | Middle/leaf — references parent models | +| Incremental | Processes only new or changed records | Any — uses `incremental: true` | +| Partitioned | Loads data in well-defined chunks (e.g., daily partitions) | Any — uses `partitions:` | +| Materialized | Creates a physical table (vs. SQL view) | Any — uses `materialize: true` | ### Performance considerations @@ -37,14 +39,7 @@ Models are usually expensive resources that can take a long time to run. Create ### Generating synthetic data for prototyping -When developing models for prototyping or demonstration purposes where external data sources are not yet available, generate a `SELECT` query that returns realistic synthetic data with these characteristics: -- Use realistic column names and data types that match typical business scenarios -- Always include a time/timestamp column for time-series analysis -- Generate 6-12 months of historical data with approximately 10,000 rows to enable meaningful analysis -- Space out timestamps realistically across the time period rather than clustering them -- Use realistic data distributions (e.g., varying quantities, diverse categories, plausible geographic distributions) - -Only generate synthetic data when the user explicitly requests mock data or when required external sources don't exist in the project. If real data sources are available, always prefer using them. +When external data sources are unavailable and the user requests mock data, generate a `SELECT` query returning realistic synthetic data: include a timestamp column, 6-12 months of history (~10,000 rows), realistic distributions, and diverse categories. Always prefer real data sources when available. ## Materialization @@ -196,6 +191,22 @@ refresh: By default, cron refreshes are disabled in local development. If you need to test them locally, add `run_in_dev: true` under `refresh:`. +## Validation and troubleshooting + +After creating or editing a model, verify it works correctly: + +1. **Check for errors**: Run `rill start` (or use the Rill Developer UI) and verify the model appears without errors in the project status. +2. **Verify row counts**: For source models, confirm data was ingested by querying the output table (e.g., `SELECT COUNT(*) FROM `). +3. **Test incremental runs**: For incremental models, trigger a second run and verify only new/changed data is processed — check that row counts increase as expected without duplicates. +4. **Validate partitions**: For partitioned models, verify partition status shows processed partitions. If a partition fails, only that partition needs reprocessing. +5. **Check dev partitions**: In development, ensure dev partition overrides are limiting data volume as intended before running against full production data. + +**Common errors:** +- Model SQL fails silently → check that the SQL is a plain `SELECT` without trailing semicolons +- Cross-connector model not materializing → verify `materialize: true` is set (required when input ≠ output connector) +- Incremental state not updating → ensure the `state:` query runs against the output OLAP connector, not the source connector +- Partition glob matches nothing → use introspection tools (e.g., `list_bucket_files`) to verify the path pattern + ## Advanced concepts ### Staging connectors