This file is the primary reference for Claude Code when building, extending, or debugging this project. Read it in full before writing any code. When in doubt, refer back here rather than making assumptions.
Build a Databricks-native cloud inventory platform called StackQL Cloud Inventory. It is a Streamlit app deployed on Databricks Apps that lets engineers define StackQL queries, test them interactively, schedule them as Databricks Jobs, and surface results via Unity Catalog to DBSQL dashboards and Genie spaces.
The full architecture is described in ARCHITECTURE.md. The user-facing overview is in README.md. This file contains the implementation instructions.
stackql-cloud-inventory/
|- app/
| |- src/
| | |- pages/
| | | |- ide.py
| | | |- schedules.py
| | | |- inventory.py
| | | |- providers.py
| | |- components/
| | | |- editor.py
| | | |- results_table.py
| | |- db/
| | | |- __init__.py
| | | |- models.py
| | | |- service.py
| | |- services/
| | | |- __init__.py
| | | |- query_service.py
| | | |- job_service.py
| | | |- ai_service.py
| | |- jobs/
| | | |- run_query.py
| | |- main.py
| |- tests/
| | |- test_db_service.py
| | |- test_query_service.py
| | |- test_job_service.py
| | |- test_ai_service.py
| |- requirements.txt
| |- app.yml
|- infra/
| |- bundles/
| | |- databricks.yml
| | |- resources/
| | |- jobs/
| | |- schemas/
| |- stackql-deploy/
| |- stackql_manifest.yml
| |- resources/
|- scripts/
| |- init_lakebase.sql
|- .vscode/
| |- launch.json
|- .gitignore
|- README.md
|- ARCHITECTURE.md
|- CLAUDE.md
| Layer | Technology | Notes |
|---|---|---|
| App framework | Streamlit | Databricks Apps serverless runtime |
| SQL editor | streamlit-code-editor | Monaco wrapper. Do not use st.text_area as primary editor. |
| AI assistant | anthropic SDK | Direct API, not via a Databricks-hosted model |
| StackQL execution | pystackql | Wraps the StackQL binary |
| Database | SQLAlchemy 2.x + psycopg2-binary | SQLAlchemy 2.x style only - not legacy 1.x |
| Databricks integration | databricks-sdk | Jobs API and SecretsAPI |
| Cron parsing | croniter | Human-readable cron preview in Schedules page |
| Local dev env loading | python-dotenv | No-op in production |
Python version: 3.11+. Type hints required throughout.
Two schemas in Lakebase. DDL lives in scripts/init_lakebase.sql.
stackql_app - app metadata, managed by the Streamlit app:
queries(id, name, description, query_text, provider, created_by, created_at, updated_at)schedules(id, query_id, job_id, cron_expression, target_schema, target_table, is_active, last_run_at, last_run_status, created_at, updated_at)provider_config(id, provider, env_var_name, secret_scope, secret_key, created_by, created_at)- unique on(provider, env_var_name)
stackql_inventory - cloud inventory results, managed by scheduled Jobs:
- Tables created dynamically by Job runs, named by the user at schedule creation time
- Materialised views over those tables for dashboard consumption
All business logic goes through services. Pages and components must not contain SQL, SDK calls, or API calls directly.
Dataclass models as the transport layer between service and UI. Define Query, Schedule, ProviderConfig. No SQLAlchemy ORM classes - use raw SQL in db/service.py with dataclasses as the return type.
All Lakebase CRUD. Connection string built from LAKEBASE_* env vars. Use SQLAlchemy create_engine with pool_pre_ping=True. Expose:
get_queries() -> list[Query]get_query(id: int) -> Query | Nonesave_query(query: Query) -> intupdate_query(id: int, query: Query) -> Nonedelete_query(id: int) -> Noneget_schedules() -> list[Schedule]save_schedule(schedule: Schedule) -> intupdate_schedule(id: int, schedule: Schedule) -> Nonedelete_schedule(id: int) -> Noneget_provider_config() -> list[ProviderConfig]save_provider_config(config: ProviderConfig) -> Nonedelete_provider_config(id: int) -> Noneget_inventory_tables() -> list[dict]- lists tables and row counts instackql_inventoryschemaget_inventory_preview(table_name: str, limit: int = 100) -> pd.DataFramerefresh_materialised_view(view_name: str) -> None
No raw SQL outside this module.
Wraps pystackql. Protocol:
- Call
db_svc.get_provider_config(), filter by the selected provider - For each config entry, call
WorkspaceClient().secrets.get_secret(scope, key)to resolve the value - Inject resolved values into
os.environ - Track which keys were injected
- Instantiate
StackQL(download_dir="/tmp/stackql") - Call
stackql.execute(query_text)and coerce the result to apd.DataFrame - In a
finallyblock, remove all injected keys fromos.environ - Return the DataFrame
Never store resolved secret values in session state, logs, or any persistent store. The finally block must run even if execution fails.
If STACKQL_LOCAL_DEV=true is set, read credentials from env vars directly (offline fallback for development without workspace access).
Wraps Databricks Jobs SDK. Expose:
create_inventory_job(query: Query, schedule: Schedule) -> str- returnsjob_idupdate_inventory_job(job_id: str, schedule: Schedule) -> Nonedelete_inventory_job(job_id: str) -> Nonepause_inventory_job(job_id: str) -> Noneresume_inventory_job(job_id: str) -> Noneget_job_run_status(job_id: str) -> dict | None
When creating a job, the task environment config must reference secret scope keys using Databricks interpolation syntax ({{secrets/scope/key}}), not resolved values. The job runner script is app/src/jobs/run_query.py.
Wraps the Anthropic API. Protocol:
- Call
WorkspaceClient().secrets.get_secret("stackql-inventory", "anthropic-api-key")to resolve the key - Instantiate
anthropic.Anthropic(api_key=key) - Pass the key directly to the client - do not set
ANTHROPIC_API_KEYenv var - Discard the resolved key after the client is instantiated (it's held by the client object only for the duration of the call)
Expose:
stream_chat(messages: list[dict], mode: str) -> Generator[str, None, None]- mode is"query"or"results"extract_sql_from_response(response: str) -> str | None- returns the first SQL fenced code block, stripped of markers, or None
Two system prompts:
Query writing mode - orient the assistant as a StackQL expert. Include: StackQL SQL syntax (SELECT only in query mode), table naming convention (provider.service.resource), requirement for WHERE clause parameters (region, project, subscriptionId etc.), instruction to format SQL in a fenced sql code block.
Results interpretation mode - orient the assistant as a cloud infrastructure analyst. Include: focus on summarising what the data shows, flagging notable findings (high counts, unusual states, cost or security concerns), and suggesting follow-up queries. Instruct it not to repeat the data verbatim.
If STACKQL_LOCAL_DEV=true, fall back to os.getenv("ANTHROPIC_API_KEY").
Standalone script executed by Databricks Jobs. Accepts CLI args: --query-id, --target-schema, --target-table. Workflow:
- Read
LAKEBASE_*env vars (injected by the job environment config from secret scope) - Load the query text from Lakebase
stackql_app.queriesby query ID - Resolve provider credentials from the job environment (same secret scope injection pattern)
- Execute via pystackql
- Write result DataFrame to
{target_schema}.{target_table}in Lakebase (replace semantics) - Call
REFRESH MATERIALIZED VIEWif a view named{target_table}_mvexists - Update
schedules.last_run_atandschedules.last_run_status
All pages use layout="wide". The IDE page is two-column. All other pages are single-column. No custom CSS unless strictly necessary for the editor component.
Layout: Two columns, approximately 65/35 width split. Left column contains the editor and results. Right column contains the AI chat panel.
Left column - editor:
Use streamlit-code-editor (from code_editor import code_editor). Configure with:
lang="sql"- toolbar buttons for Run (maps to submit command) and Clear
showLineNumbers: True- height as a min/max tuple e.g.
[10, 30]
Store editor content in st.session_state["ide_editor_content"]. When the editor fires a submit command (Run button or Ctrl+Enter), response["text"] contains the query.
Left column - toolbar row (between editor and results):
- Provider
st.selectboxpopulated fromdb_svc.get_provider_config(), unique provider names - Run button - triggers
query_service.execute(query_text, provider)withst.spinner - Save Query button - opens
st.dialogwith name and description fields - Explain Query button - sets
st.session_state["ide_pending_chat_prompt"]to a pre-built prompt containing the current query text
Left column - results area:
On success: results_table component renders the DataFrame.
On empty result: st.info("Query returned no rows.")
On error: st.error(error_message) with the raw StackQL error.
Left column - sidebar:
Query library in st.sidebar. List saved queries as buttons. Clicking loads query into editor via session state. Delete button with popover confirmation per query.
Right column - AI chat panel:
Mode toggle at top: st.segmented_control or st.radio with options "Write Query" and "Interpret Results". Stored in st.session_state["ide_chat_mode"].
Chat history in st.session_state["ide_chat_messages"] as list[dict] with role and content keys. Render history on every rerender using st.chat_message.
At the top of the right column render loop, check for st.session_state["ide_pending_chat_prompt"]. If present, treat it as a submitted user message (append to history, call AI service, clear the key).
st.chat_input fixed at the bottom. On submit, append user message and call ai_service.stream_chat() via st.write_stream.
After each assistant response, check for a SQL code block using ai_service.extract_sql_from_response(). If found, render an "Insert into editor" st.button that sets st.session_state["ide_editor_content"] to the extracted SQL.
"Interpret Results" pre-populate:
The results component renders an "Interpret Results" button. Clicking it serialises the current DataFrame as a markdown table (top 20 rows max) and sets st.session_state["ide_pending_chat_prompt"] with the data and an interpretation request.
- Summary:
st.dataframeof all schedules joined with query names and last run status. Usest.column_configfor status colour coding. - "New Schedule" opens
st.dialog. Fields: query selector (dropdown of saved queries), cron expression (text input with live human-readable preview viacroniter), target schema (defaultstackql_inventory), target table name. - On save:
db_svc.save_schedule()thenjob_svc.create_inventory_job()withst.spinner. - Per-row actions: pause/resume toggle, delete (with
st.popoverconfirmation). - Last run status as coloured indicator: SUCCESS = green, FAILED = red, no runs = grey.
- Top metrics row: total tables, total rows (sum across all inventory tables), last refresh timestamp across all tables.
- Table list as
st.dataframewith row selection enabled. Columns: table name, row count, last updated, materialised view exists. - On row selection, render a preview panel below: top 100 rows of the selected table as
st.dataframe. - Refresh button per table: calls
db_svc.refresh_materialised_view(table_name + "_mv").
- Table of existing mappings: provider, env var name, secret scope, secret key masked as
***, created by, created at. Delete button per row. - "Add Mapping" via
st.dialogorst.expander. Fields: provider (free text or select from known list: aws, azure, google, databricks, github, cloudflare), env var name, secret scope, secret key. - On save: call
SecretsAPI.get_secret(scope, key)- if it raisesNotFound, showst.errorand do not save. If found, calldb_svc.save_provider_config(). - Never display resolved secret values anywhere.
- "Test" button per row: resolves the secret, injects the env var, calls
stackql.execute("SHOW PROVIDERS"), reports pass/fail, cleans up env var. Does not surface the value.
components/editor.py
Thin wrapper around code_editor. Reads/writes st.session_state["ide_editor_content"]. Detects submit command in the response dict. Returns tuple[str | None, bool] - (query_text, submitted). All code_editor configuration is encapsulated here.
components/results_table.py
Accepts a pd.DataFrame, execution time in seconds, and an error string (mutually exclusive with DataFrame). Renders: row count + execution time caption, st.dataframe with use_container_width=True, "Interpret Results" button. Handles empty DataFrame and error states. Sets st.session_state["ide_pending_chat_prompt"] when "Interpret Results" is clicked.
Set by the Databricks App runtime in production via app.yml secret scope bindings. Set in app/.env for local development.
DATABRICKS_HOST # auto-injected by App runtime; set in .env locally
DATABRICKS_TOKEN # auto-injected by App runtime; set in .env locally
LAKEBASE_HOST
LAKEBASE_PORT # default 5432
LAKEBASE_DATABASE
LAKEBASE_USER
LAKEBASE_PASSWORD
STACKQL_LOCAL_DEV # set to "true" for offline local dev only - never in production
Cloud provider credentials and the Anthropic API key are not static env vars. They are resolved at call time from Databricks Secrets via the SDK.
The Databricks App manifest. Must declare:
command: ["python", "-m", "streamlit", "run", "src/main.py"]enventries for allLAKEBASE_*vars referencing secret scope keys instackql-inventory
databricks.yml must define resources for:
- Lakebase instance
- Lakehouse Federation foreign connection to Lakebase
- UC foreign catalog (
stackql_inventory_catalog) over thestackql_inventoryLakebase schema - UC grants on the foreign catalog
- The Databricks App referencing
../../appas source - App resource bindings for
LAKEBASE_*secret scope keys - Template Job definition used as the base when
job_service.pyclones per-schedule jobs
Use targets for dev and prod with variable substitution for instance names, catalog names, and workspace host.
stackql_manifest.yml defines demo/test cloud provider resources (AWS, GCP stacks). Follow the stateless IaC pattern: tags as natural keys, no state files. These are not required for the app to function - they provide targets for inventory queries in demo environments.
- Databricks CLI v0.200+
- Python 3.11+
- Docker
-
Create and activate a virtual environment in
app/:python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt -
Start local Postgres:
docker run -d --name stackql-dev-pg -p 5432:5432 -e POSTGRES_PASSWORD=postgres postgres:16 -
Initialise schema:
psql -h localhost -U postgres -f scripts/init_lakebase.sql -
Create
app/.env(git-ignored, never commit):DATABRICKS_HOST=https://<workspace>.azuredatabricks.net DATABRICKS_TOKEN=dapi... LAKEBASE_HOST=localhost LAKEBASE_PORT=5432 LAKEBASE_DATABASE=postgres LAKEBASE_USER=postgres LAKEBASE_PASSWORD=postgres -
Run:
databricks apps run-local --prepare-environment --debugor:
cd app && streamlit run src/main.py
Provide two configurations:
"Streamlit: run app"- launchesstreamlit run src/main.pywithcwdset toapp/andenvFilepointing atapp/.env"Databricks: run-local (attach)"- attaches to the debugger on port 5678 started bydatabricks apps run-local --debug
- Python 3.11+. Type hints on all function signatures.
- Use
loggingmodule throughout. Noprint(). - SQLAlchemy 2.x style only (
Session,text(),execute()). Not legacy 1.x. - No raw SQL in page or component files. All DB operations go through
db/service.py. - No SDK calls in page or component files. All Jobs operations go through
job_service.py. All secrets resolution goes through the relevant service. - No LLM calls in page or component files. All AI calls go through
ai_service.py. - Session state keys are namespaced by page:
ide_*,schedules_*,inventory_*,providers_*. The single shared key iside_pending_chat_prompt. st.set_page_configat the top of every page file.- Chat history (
ide_chat_messages) is alist[dict]with string values only. Do not store SDK objects, DataFrames, or Anthropic response objects in session state. - SQL extracted from AI responses must be stripped of fenced code block markers before being placed in the editor.
- Secrets must not appear in logs, session state, error messages, or UI. If a secret resolution fails, show the scope and key name in the error - never the value.
Tests live in app/tests/. Use pytest. Run with:
cd app && pytest tests/ -v
Rules:
- Mock
pystackql.StackQLwith a fixture returning a knownpd.DataFrame - Mock
databricks.sdk.WorkspaceClientwithunittest.mock.MagicMock - Mock
anthropic.Anthropicwith a fixture returning a known streaming response - Mock SQLAlchemy sessions for DB service tests
- No live cloud, workspace, or API calls in tests
- Set
STACKQL_LOCAL_DEV=truein the test environment to short-circuit secrets resolution
Handle these explicitly - do not let them surface as unhandled exceptions:
| Failure | Where | How to handle |
|---|---|---|
| pystackql binary download fails | query_service, job runner | Raise RuntimeError with message directing user to check /tmp/stackql permissions |
databricks.sdk.errors.NotFound on secret lookup |
query_service, ai_service | Raise RuntimeError with scope/key name and direction to Provider Config page |
| Anthropic API error / rate limit | ai_service | Catch, yield an error string from the generator, let the chat panel display it via st.error |
| Lakebase connection failure | db/service.py | Catch on first connection attempt, raise RuntimeError. The app entrypoint should catch this and show st.error as a top-level banner. |
| Databricks Job creation failure | job_service | Catch SDK errors, re-raise with the specific permission or quota message |
| StackQL query returns empty result | query_service | Return an empty DataFrame - not an error |
| StackQL query returns error | query_service | Raise RuntimeError with the raw StackQL error message |
code_editor component load failure |
editor component | Catch, show st.error with instructions - do not fall back silently to st.text_area |
- Do not use LangChain, LangGraph, or any other LLM orchestration library. Call the Anthropic API directly.
- Do not use
dbutils.secrets.get(). UseWorkspaceClient().secrets.get_secret()which works both in the workspace and locally with a configured token. - Do not store resolved secret values in
st.session_state, module-level variables, or any cache. - Do not use
@st.cache_dataor@st.cache_resourceon functions that handle secrets. - Do not use SQLAlchemy ORM (declarative model classes). Use
text()with dataclass models. - Do not use
st.text_areaas the primary SQL editor. - Do not write SQL in page files.
- Do not write SDK calls in page files.
- Do not use
streamlit-monaco- usestreamlit-code-editorwhich has a more complete API. - Do not create module-level Lakebase connections or SDK clients. Instantiate them inside service functions where the connection string or credentials are available.
- Do not schedule queries by running a background thread inside the Streamlit app. All scheduling is handled by Databricks Jobs.
Do not build these unless explicitly asked:
- Multi-workspace support
- Custom StackQL provider registration via the UI
- Row-level security on inventory tables
- Email or webhook alerting on Job failures (use Databricks Job notification settings)
- Persisting chat history to Lakebase across sessions
- Monaco autocomplete for StackQL provider/service/resource schema paths
- User-facing admin controls for managing the Lakebase instance or UC federation