Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
000379a
feat: add PATRotator for short-lived token auto-rotation (#81)
datasciencemonkey Mar 27, 2026
b27a73d
feat: wire PATRotator into app startup (#81)
datasciencemonkey Mar 27, 2026
4c546ab
chore: add secret resource with WRITE for PAT rotation (#81)
datasciencemonkey Mar 27, 2026
3c50f23
docs: PAT auto-rotation implementation plan (#81)
datasciencemonkey Mar 27, 2026
bb6782d
fix: read PAT_ROTATION_INTERVAL and PAT_TOKEN_LIFETIME from env vars …
datasciencemonkey Mar 27, 2026
b4ff448
chore: set PAT rotation interval to 2 minutes for testing (#81)
datasciencemonkey Mar 27, 2026
db131f0
fix: set default rotation to 2 min, remove config from app.yaml (#81)
datasciencemonkey Mar 27, 2026
38994f8
fix: clearer rotation log messages — INFO: starting/complete (#81)
datasciencemonkey Mar 27, 2026
fd81b1b
fix: use pat_rotator.py defaults (120s), remove env var overrides fro…
datasciencemonkey Mar 27, 2026
a81c2a8
chore: set PAT rotation to 5 min interval, 10 min lifetime (#81)
datasciencemonkey Mar 27, 2026
871b4fd
feat: resolve owner via SP + Apps API (app.creator), preserve SP cred…
datasciencemonkey Mar 27, 2026
a2ce23b
feat: 10-min rotation with 15-min lifetime, ensure_fresh() on session…
datasciencemonkey Mar 27, 2026
f420cf5
feat: session-aware rotation — skip when no active sessions (#81)
datasciencemonkey Mar 27, 2026
88b28a5
feat: interactive PAT setup — /api/pat-status + /api/configure-pat en…
datasciencemonkey Mar 27, 2026
473d59b
feat: terminal prompts for PAT on first session, remove DATABRICKS_TO…
datasciencemonkey Mar 27, 2026
b251636
refactor: remove secret scope persistence from PATRotator (#83)
datasciencemonkey Mar 27, 2026
8806962
chore: remove secret scope config — PAT prompt handles restarts (#83)
datasciencemonkey Mar 27, 2026
929ff5c
fix: make setup_claude.py token-optional — install CLI without PAT (#83)
datasciencemonkey Mar 27, 2026
ec7b0e5
feat: configure Claude CLI auth after interactive PAT setup (#83)
datasciencemonkey Mar 27, 2026
e02d79c
fix: all setup scripts install CLI without token, skip config until P…
datasciencemonkey Mar 27, 2026
5a17afb
feat: configure all CLIs (Claude, Codex, OpenCode, Gemini, Databricks…
datasciencemonkey Mar 27, 2026
7014aa0
fix: add missing lock to heartbeat test fixture
datasciencemonkey Mar 27, 2026
6ef7d7e
docs: update README and deployment guide for zero-config auth (#83)
datasciencemonkey Mar 27, 2026
70a576a
fix: strip SP creds after owner resolution, move setup to after PAT (…
datasciencemonkey Mar 27, 2026
616379b
fix: show PAT prompt instead of snake game when setup hasn't started
datasciencemonkey Mar 28, 2026
5ec5e3c
chore: remove snake game loading page — setup waits inline in terminal
datasciencemonkey Mar 28, 2026
70a5c10
fix: immediately mint controlled token on PAT configure
datasciencemonkey Mar 28, 2026
bb3174d
feat: track rotation time, fast-path expired token detection
datasciencemonkey Mar 28, 2026
644560e
feat: persist app state to ~/.coda/app_state.json
datasciencemonkey Mar 28, 2026
dd42956
feat: wire app_state.json — owner at boot, rotation every 10 min
datasciencemonkey Mar 28, 2026
c2a7e81
chore: pin all package versions in requirements.txt
datasciencemonkey Mar 28, 2026
858594f
chore: simplify rotation log message to 'CLI updated'
datasciencemonkey Mar 28, 2026
4576c26
fix: bump pyasn1→0.6.3, pyjwt→2.12.1; ignore 3 unfixable CVEs
datasciencemonkey Mar 28, 2026
9abd7a2
fix: upgrade requests to 2.33.0 from GitHub (GHSA-gc5v-m9x4-r6x2)
datasciencemonkey Mar 28, 2026
a95cd65
chore: replace mlflow[genai] with mlflow-tracing — drops pygments CVE
datasciencemonkey Mar 28, 2026
8da95c3
fix: eliminate cryptography CVE — google-auth 2.47.0 drops the dep
datasciencemonkey Mar 28, 2026
e888141
fix: upgrade mcp 1.19.0→1.26.0 (GHSA-9h52-p55h-vw2f DNS rebinding)
datasciencemonkey Mar 28, 2026
ba0597b
fix: re-eliminate cryptography dep after mcp upgrade reintroduced it
datasciencemonkey Mar 28, 2026
c568709
fix: upgrade mcp→1.26.0, ignore cryptography until 46.0.6 hits PyPI
datasciencemonkey Mar 28, 2026
d99c35a
fix: upgrade cryptography to 46.0.6 from GitHub (GHSA-m959-cc7f-wv43)
datasciencemonkey Mar 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

🟢 **OpenCode** — Open-source agent with multi-provider support

Every agent starts **pre-wired to your Databricks AI Gateway** — models, auth tokens, and base URLs are all configured at boot. No API keys to manage.
Every agent installs at boot and connects to your **Databricks AI Gateway** — on first terminal session, paste a short-lived PAT and all CLIs are configured automatically. Token auto-rotates every 10 minutes.

---

Expand Down Expand Up @@ -63,7 +63,7 @@ This isn't just a terminal in the cloud. Running coding agents on Databricks giv
| 🐍 **Loading Screen** | Play snake while setup steps run in parallel |
| 🔄 **Workspace Sync** | Every `git commit` auto-syncs to `/Workspace/Users/{you}/projects/` |
| ✏️ **Micro Editor** | Modern terminal editor, pre-installed |
| ⚙️ **Databricks CLI** | Pre-configured with your PAT, ready to go |
| ⚙️ **Databricks CLI** | Installed at boot, configured interactively on first session |
| 📊 **MLflow Tracing** | Every Claude Code session is automatically traced to your Databricks MLflow experiment |

---
Expand Down Expand Up @@ -136,10 +136,10 @@ Tracing is skipped gracefully if `APP_OWNER` is not set (e.g., local dev without
1. Click [**Use this template**](https://github.com/datasciencemonkey/coding-agents-databricks-apps/generate) to create your own repo
2. Go to **Databricks → Apps → Create App**
3. Choose **Custom App** and connect your new repo
4. Add your PAT as the `DATABRICKS_TOKEN` secret in **App Resources**
5. Deploy
4. Deploy
5. Open the app — paste a short-lived PAT when prompted on first terminal session

That's it. Open the app URL and start coding.
That's it. No secrets to configure, no pre-deployment setup.

[→ Full deployment guide](docs/deployment.md) — environment variables, gateway config, and advanced options.

Expand Down Expand Up @@ -280,7 +280,7 @@ This template repo opens that vision up for every Databricks user — no IDE set

| Variable | Required | Description |
|----------|----------|-------------|
| `DATABRICKS_TOKEN` | Yes | Your Personal Access Token (secret) |
| `DATABRICKS_TOKEN` | No | Optional. If not set, the app prompts for a token on first session. Auto-rotated every 10 minutes |
| `HOME` | Yes | Set to `/app/python/source_code` in app.yaml |
| `ANTHROPIC_MODEL` | No | Claude model name (default: `databricks-claude-opus-4-6`) |
| `CODEX_MODEL` | No | Codex model name (default: `databricks-gpt-5-2`) |
Expand All @@ -289,7 +289,7 @@ This template repo opens that vision up for every Databricks user — no IDE set

### Security Model

Single-user app — each user deploys their own instance with their own PAT. Only the token owner can access the terminal. Everyone else sees 403.
Single-user app — the owner is resolved via the app's service principal and Apps API (`app.creator`), with no PAT required at deploy time. Authorization checks `X-Forwarded-Email` against `app.creator`. On first terminal session, the user pastes a short-lived PAT interactively. Tokens auto-rotate every 10 minutes (15-minute lifetime), with old tokens proactively revoked. On restart, the user re-pastes (no persistence by design).

### Gunicorn

Expand Down
200 changes: 183 additions & 17 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,11 @@
from collections import deque

import tomllib
import requests

import app_state
from utils import ensure_https
from pat_rotator import PATRotator

# Sanitize DATABRICKS_TOKEN early — the platform sometimes injects trailing
# newlines / whitespace which causes auth failures. Cleaning it here prevents
Expand All @@ -45,6 +48,8 @@
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# PAT auto-rotation — initialized after sessions dict is defined (see below)

app = Flask(__name__, static_folder='static', static_url_path='/static')
app.secret_key = os.urandom(24)
app.config['MAX_CONTENT_LENGTH'] = 32 * 1024 * 1024 # 32 MB — aligned with Claude Code's 30 MB file limit
Expand All @@ -57,6 +62,12 @@
sessions = {}
sessions_lock = threading.Lock()

# PAT auto-rotation (short-lived tokens, background refresh)
# Only rotates while active sessions exist — stops when all sessions are reaped
pat_rotator = PATRotator(
session_count_fn=lambda: len(sessions),
)

# SIGTERM graceful shutdown: notify clients before gunicorn stops the worker
shutting_down = False

Expand Down Expand Up @@ -250,6 +261,68 @@ def _reinit_app_git():
logger.info("Reinitialized app source git (template origin removed)")


def _configure_all_cli_auth(token):
"""Configure auth for ALL coding-agent CLIs after a PAT is provided.

Called from /api/configure-pat when a user supplies a PAT interactively.
Handles: Claude CLI (inline), Databricks CLI (via pat_rotator), and
Codex/OpenCode/Gemini CLIs (by re-running their setup scripts with token in env).
"""
import json

home = os.environ.get("HOME", "/app/python/source_code")
if not home or home == "/":
home = "/app/python/source_code"

# 1. Configure Claude CLI (~/.claude/settings.json)
claude_dir = os.path.join(home, ".claude")
os.makedirs(claude_dir, exist_ok=True)

gateway_host = ensure_https(os.environ.get("DATABRICKS_GATEWAY_HOST", "").rstrip("/"))
databricks_host = ensure_https(os.environ.get("DATABRICKS_HOST", "").rstrip("/"))

if gateway_host:
anthropic_base_url = f"{gateway_host}/anthropic"
else:
anthropic_base_url = f"{databricks_host}/serving-endpoints/anthropic"

settings = {
"env": {
"ANTHROPIC_MODEL": os.environ.get("ANTHROPIC_MODEL", "databricks-claude-sonnet-4-6"),
"ANTHROPIC_BASE_URL": anthropic_base_url,
"ANTHROPIC_AUTH_TOKEN": token,
"ANTHROPIC_CUSTOM_HEADERS": "x-databricks-use-coding-agent-mode: true",
}
}

settings_path = os.path.join(claude_dir, "settings.json")
with open(settings_path, "w") as f:
json.dump(settings, f, indent=2)

logger.info(f"Claude CLI auth configured: {settings_path}")

# 2. Configure Databricks CLI (~/.databrickscfg) — already called by
# configure_pat() via pat_rotator, but explicit for clarity
pat_rotator._write_databrickscfg(token)
logger.info("Databricks CLI auth configured: ~/.databrickscfg")

# 3. Re-run Codex, OpenCode, Gemini setup scripts with token in env
# They are idempotent: detect CLI already installed, just write config files
env = {**os.environ, "DATABRICKS_TOKEN": token}
for script in ["setup_codex.py", "setup_opencode.py", "setup_gemini.py"]:
try:
result = subprocess.run(
["uv", "run", "python", script],
env=env, capture_output=True, text=True, timeout=60
)
if result.returncode == 0:
logger.info(f"CLI config updated: {script}")
else:
logger.warning(f"CLI config failed: {script}: {result.stderr[:200]}")
except Exception as e:
logger.warning(f"CLI config error: {script}: {e}")


def run_setup():
with setup_lock:
setup_state["status"] = "running"
Expand Down Expand Up @@ -301,9 +374,27 @@ def run_setup():


def get_token_owner():
"""Get the owner email from DATABRICKS_TOKEN at startup."""
"""Get the owner email. Priority: Apps API (app.creator) > PAT (current_user.me).

Uses the auto-provisioned SP to call the Apps API — no PAT needed for
owner resolution. Falls back to PAT-based lookup for backward compat.
"""
from databricks.sdk import WorkspaceClient

# 1. Try Apps API via SP credentials (no PAT needed)
app_name = os.environ.get("DATABRICKS_APP_NAME")
if app_name:
try:
w = WorkspaceClient() # auto-detects SP credentials
app = w.apps.get(name=app_name)
owner = app.creator
logger.info(f"Owner resolved from app.creator: {owner}")
return owner
except Exception as e:
logger.warning(f"Could not resolve owner via Apps API: {e}")

# 2. Fallback: PAT-based resolution
try:
from databricks.sdk import WorkspaceClient
host = ensure_https(os.environ.get("DATABRICKS_HOST", ""))
token = os.environ.get("DATABRICKS_TOKEN")
if not host or not token:
Expand Down Expand Up @@ -611,7 +702,7 @@ def cleanup_stale_sessions():
def authorize_request():
"""Check authorization before processing any request."""
# Skip auth for health check, setup status, and Socket.IO (has own auth via connect event)
if request.path in ("/health", "/api/setup-status") or request.path.startswith("/socket.io"):
if request.path in ("/health", "/api/setup-status", "/api/pat-status", "/api/configure-pat", "/api/app-state") or request.path.startswith("/socket.io"):
return None

authorized, user = check_authorization()
Expand Down Expand Up @@ -650,10 +741,6 @@ def set_security_headers(response):

@app.route("/")
def index():
with setup_lock:
status = setup_state["status"]
if status in ("pending", "running"):
return send_from_directory("static", "loading.html")
return send_from_directory("static", "index.html")


Expand All @@ -662,6 +749,12 @@ def get_setup_status():
return jsonify(_get_setup_state_snapshot())


@app.route("/api/app-state")
def get_app_state():
"""Admin endpoint: persisted app state (owner, last rotation)."""
return jsonify(app_state.get_state())


@app.route("/health")
def health():
with sessions_lock:
Expand All @@ -682,6 +775,79 @@ def get_version():
return jsonify({"version": APP_VERSION})


@app.route("/api/pat-status")
def pat_status():
"""Check if a valid, usable PAT is configured."""
host = ensure_https(os.environ.get("DATABRICKS_HOST", ""))
token = os.environ.get("DATABRICKS_TOKEN", "").strip()

if not token or pat_rotator.is_token_expired:
# No token, or token lifetime exceeded (rotation stopped while no sessions)
return jsonify({"configured": False, "valid": False,
"workspace_host": host})

# Validate with direct HTTP — avoids SDK auth fallback to SP
try:
resp = requests.get(f"{host}/api/2.0/preview/scim/v2/Me",
headers={"Authorization": f"Bearer {token}"}, timeout=10)
if resp.status_code == 200:
user = resp.json().get("userName", "unknown")
return jsonify({"configured": True, "valid": True, "user": user})
return jsonify({"configured": True, "valid": False,
"workspace_host": host})
except Exception:
return jsonify({"configured": True, "valid": False,
"workspace_host": host})


@app.route("/api/configure-pat", methods=["POST"])
def configure_pat():
"""Accept a user-provided PAT, validate it, and start rotation."""
data = request.json
token = data.get("token", "").strip()
if not token:
return jsonify({"error": "Token required"}), 400

# Validate the token — direct HTTP, no SDK fallback
host = ensure_https(os.environ.get("DATABRICKS_HOST", ""))
try:
resp = requests.get(f"{host}/api/2.0/preview/scim/v2/Me",
headers={"Authorization": f"Bearer {token}"}, timeout=10)
if resp.status_code != 200:
return jsonify({"error": "Invalid token"}), 400
user = resp.json().get("userName", "unknown")
except Exception as e:
return jsonify({"error": f"Token validation failed: {e}"}), 400

# Immediately mint a controlled short-lived token from the user-pasted PAT.
# This gives us a token ID we own — all future rotations can revoke the old one.
# The user-pasted PAT becomes unused after this (expires per its own lifetime).
os.environ["DATABRICKS_TOKEN"] = token
pat_rotator._current_token = token
pat_rotator._current_token_id = None
rotated = pat_rotator._rotate_once()
if rotated:
token = pat_rotator.token # use the newly minted token from here on
else:
# Rotation failed — fall back to user-pasted token (still valid)
pat_rotator._write_databrickscfg(token)
pat_rotator.start()

# Configure all CLI tools (Claude, Codex, OpenCode, Gemini, Databricks)
_configure_all_cli_auth(pat_rotator.token or token)

# Run setup now that we have a valid token (installs CLIs, configures agents)
# Only run if setup hasn't completed yet
with setup_lock:
if setup_state["status"] != "complete":
setup_thread = threading.Thread(target=run_setup, daemon=True, name="setup-thread")
setup_thread.start()
logger.info("Setup triggered after PAT configuration")

logger.info(f"PAT configured interactively by {user} — rotation started")
return jsonify({"status": "ok", "user": user, "message": "Token configured. Auto-rotation started."})


@app.route("/api/session", methods=["POST"])
def create_session():
"""Create a new terminal session."""
Expand Down Expand Up @@ -923,28 +1089,28 @@ def initialize_app(local_dev=False):
if not local_dev:
signal.signal(signal.SIGTERM, handle_sigterm)

# Remove OAuth credentials - force PAT auth only
os.environ.pop("DATABRICKS_CLIENT_ID", None)
os.environ.pop("DATABRICKS_CLIENT_SECRET", None)
# SP credentials preserved — needed for Apps API (owner resolution) and secret persistence

# Determine app owner from DATABRICKS_TOKEN
# Resolve owner: Apps API (app.creator via SP) > PAT (current_user.me)
app_owner = get_token_owner()
if app_owner:
logger.info(f"App owner (from token): {app_owner}")
logger.info(f"App owner: {app_owner}")
os.environ["APP_OWNER"] = app_owner
app_state.set_app_owner(app_owner)
else:
logger.warning("Could not determine app owner - authorization disabled")

# Strip SP credentials — only needed for owner resolution above.
# Keeping them causes SDK to silently fall back to SP auth when PAT is dead.
os.environ.pop("DATABRICKS_CLIENT_ID", None)
os.environ.pop("DATABRICKS_CLIENT_SECRET", None)
logger.info("SP credentials stripped — PAT-only auth from this point")

# Start background cleanup thread
cleanup_thread = threading.Thread(target=cleanup_stale_sessions, daemon=True)
cleanup_thread.start()
logger.info(f"Started session cleanup thread (timeout={SESSION_TIMEOUT_SECONDS}s, interval={CLEANUP_INTERVAL_SECONDS}s)")

# Start setup in background thread — app starts immediately with loading screen
setup_thread = threading.Thread(target=run_setup, daemon=True, name="setup-thread")
setup_thread.start()
logger.info("Started background setup thread")


if __name__ == "__main__":
# Local dev — no SIGTERM handler (SIG_DFL), no shutting_down flag
Expand Down
3 changes: 0 additions & 3 deletions app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,12 @@ command:
env:
- name: HOME
value: /app/python/source_code
- name: DATABRICKS_TOKEN
valueFrom: DATABRICKS_TOKEN
- name: ANTHROPIC_MODEL
value: databricks-claude-opus-4-6
- name: GEMINI_MODEL
value: databricks-gemini-3-1-pro
- name: CODEX_MODEL
value: databricks-gpt-5-2
#OPTIONAL: Move to the new Databricks Gateway if you have access (recommended), otherwise it will default to the older endpoint
- name: DATABRICKS_GATEWAY_HOST
valueFrom: DATABRICKS_GATEWAY_HOST
- name: CLAUDE_CODE_DISABLE_AUTO_MEMORY
Expand Down
Loading
Loading