feat: store hashed tokens instead of plaintext#8217
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment was marked as outdated.
This comment was marked as outdated.
| -- Make token NOT NULL again | ||
| ALTER TABLE token ALTER COLUMN token SET NOT NULL; |
There was a problem hiding this comment.
Potential down-migration failure: When rolling back step 2 (token_hash_pk_swap.down.sql), this restores token NOT NULL, but new tokens created after the forward migration will have token = NULL (since the code no longer populates it). This means the rollback will fail with ERROR: column "token" of relation "token" contains null values if any new tokens were created after migration.
You'd need to backfill or drop the new null rows before restoring NOT NULL:
-- Delete tokens that were created after the forward migration (no plaintext token)
DELETE FROM token WHERE token IS NULL;
-- Then restore NOT NULL
ALTER TABLE token ALTER COLUMN token SET NOT NULL;- Update all base.sql fixtures to include token_hash/token_prefix columns - Keep plaintext token for webhook tokens (needed for URL reconstruction) - Restore get_token_by_prefix to query DB for webhook tokens - Fix down migration to delete NULL-token rows before restoring NOT NULL - Update parser fixture standalone schema - Update EE dedicated_worker_ee.rs to use token_hash/token_prefix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write to token column alongside token_hash until MIN_VERSION_SUPPORTS_TOKEN_HASH (1.649.0) is reached. This ensures older workers can still authenticate during rolling upgrades. Remove the separate UPDATE in new_webhook_token since create_token_internal now writes plaintext directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check MIN_VERSION_SUPPORTS_TOKEN_HASH at runtime: write plaintext to token column while old workers exist, switch to NULL once all workers are >= 1.649.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-fast reviewi this PR |
This comment was marked as outdated.
This comment was marked as outdated.
Resolve conflicts in monitor.rs and native-triggers/lib.rs, keeping the token_prefix-based queries from the store-hash branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…OT NULL Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-fast review this PR |
|
Claude finished @hugocasa's task in 1m 55s —— View job PR Review: Store hashed tokens instead of plaintext
Overall this is a well-designed security improvement. The two-phase migration approach, the Bugs / Issues1. Potential panic on short tokens ( There are 7 instances of
The superadmin path at
Similarly, 2. Google watch renewal fails when token is orphaned (
Migration Concerns3. Native trigger migration deletes orphaned triggers ( DELETE FROM native_trigger WHERE webhook_token_hash IS NULL;This silently deletes any
The deletion is the right thing to do for data integrity, but consider logging or counting how many rows are deleted, or adding a 4. Native trigger migration backfill uses prefix JOIN ( UPDATE native_trigger nt SET webhook_token_hash = t.token_hash
FROM token t WHERE t.token_prefix = nt.webhook_token_prefix;If two tokens share the same 10-char prefix (unlikely but possible), this JOIN could match the wrong token for a trigger. Since the prefix space is 62^10, the probability is negligible in practice, but it's worth noting. A Design Observations (Non-blocking)5. The 6. The rotation creates a new token copying 7. INSERT INTO token_expiry_notification (token, expiration) VALUES ($1, $2)The column is still named SummaryThe core security goal (removing plaintext tokens from the DB) is well-implemented. The main actionable item is the short token panic (item 1) which could cause a runtime crash on malformed input. Everything else is either already acknowledged in the PR's TODO section or is a low-probability edge case. |
|
/ai-fast |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ner/expiration in rotate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…fication column Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-fast review this PR |
|
Claude finished @hugocasa's task in 1m 48s —— View job PR Review: Store hashed tokens instead of plaintext
This is a well-designed security improvement. The two-phase migration, auth cache keyed by raw token, Bugs1. Panic on short tokens — still present (7 sites in This was flagged in both previous reviews and remains unfixed.
While The fix is straightforward — use Previously Flagged Issues — Now Addressed2. Native trigger orphan handling (migration) — Fixed The previous reviews flagged that 3. Down migration NULL token handling — Fixed
4. Migration 5.
Design Observations (Non-blocking)6. UPDATE token_expiry_notification
SET token = encode(sha256(token::bytea), 'hex');This assumes all existing values in However, looking at the code path: 7. Google channel renewal still hard-fails on missing token ( As acknowledged in the PR description's TODO, 8. Native trigger migration backfill prefix collision risk (very low) The prefix JOIN at SummaryThe core security design is sound and most previous review issues have been addressed. The only remaining actionable bug is the short token panic (item 1) — a malformed |
Manual Testing ChecklistToken basics
Backward compatibility
Native triggers (requires Google or Nextcloud integration)
Migration
Edge cases
|
|
Claude finished @hugocasa's task in 5m 20s —— View job PR Review: Store hashed tokens instead of plaintext
Overall this is a well-designed security improvement. The two-phase migration approach, auth cache keyed by raw token, Bug: Panic on short tokens (still present)This was flagged in all three previous reviews and remains unfixed. Most critical path — token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),The Other affected locations in
Fix: Replace all bare fn safe_token_prefix(token: &str) -> String {
token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string()
}Recommendation:
|
rotate_webhook_token now atomically inserts the new token and deletes the old one in a single transaction, preventing token leaks. Returns new_token_hash so callers can clean up the new token if their subsequent trigger update fails (which involves external HTTP calls and cannot be in the same DB transaction). - Handler: wraps post-rotation work; deletes new token on failure - Google renewal: deletes new token if service_config update fails - Tests updated to match new atomic semantics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ilure rotate_webhook_token now keeps the old token alive and returns old_token_hash. Callers delete it only after the trigger row has been successfully updated. If the external service call or DB update fails, the trigger keeps working with the old token. Worst case: if the best-effort delete fails, the old token leaks as an extra DB row — harmless compared to breaking the trigger. Also update summarized_schema.txt for renamed columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit updates the EE repository reference after PR #437 was merged in windmill-ee-private. Previous ee-repo-ref: 7aef8b06cb6f54c2bc89dd57b70947deed72553c New ee-repo-ref: 2d0823a471014e2bc2d898c63518323946b7474f Automated by sync-ee-ref workflow.
|
🤖 Updated |
|
/ai-fast re-review this PR |
|
Claude finished @rubenfiszel's task in 2m 55s —— View job PR Re-Review: Store hashed tokens instead of plaintext
Overall this is a well-designed security improvement. The two-phase migration, auth cache keyed by raw token, Status of previously flagged issues
Bug: Panic on short tokens (4th time flagged)There are 8 instances of
token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),
token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),Additionally, there are 4 instances of
Most critical path —
Fix: Add a helper and use it everywhere: fn safe_prefix(token: &str) -> String {
token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string()
}Or just use the inline pattern at each site: Design observations (non-blocking)1. Migration UPDATE token_expiry_notification SET token = encode(sha256(token::bytea), 'hex');But the new code in 2. Google renewal TODO
3. Native trigger backfill prefix collision
Positive observations
SummaryThe core design is sound and most previous issues have been addressed. The only remaining actionable bug is the short token panic — a malformed |
|
/ai fix There are 8 instances of token[0..TOKEN_PREFIX_LEN].to_string() that will panic if a token shorter than 10 characters is provided: auth.rs — 7 sites on the main auth path (lines 284, 308, 324, 373, 387, 402, 436): token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()), users.rs:1651 — login path: token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()), Additionally, there are 4 instances of &token[..TOKEN_PREFIX_LEN] that will also panic: Most critical path — auth.rs:436 (superadmin secret): The SUPERADMIN_SECRET env var is user-configured and can easily be set to something shorter than 10 chars (e.g., "changeme"). Any instance with a short secret will crash on first auth attempt. users.rs:531 (logout): A malformed cookie or Authorization: Bearer x header will crash the server. Fix: Add a helper and use it everywhere: fn safe_prefix(token: &str) -> String { |
|
Claude finished @rubenfiszel's task in 4m 16s —— View job Fix: Short token panic prevention
SummaryAdded
|
Add safe_token_prefix() helper that uses .get(..TOKEN_PREFIX_LEN).unwrap_or(token) instead of direct slice indexing, preventing panics when tokens are shorter than 10 characters (e.g., short superadmin secrets or malformed Bearer tokens). Co-authored-by: Ruben Fiszel <rubenfiszel@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
token_hash) in thetokentable instead of plaintext, so a DB compromise doesn't expose usable credentialstoken_prefix(first 10 chars) column for listing/deletion without needing plaintexttokencolumn is still written alongside the hash untilMIN_VERSION_SUPPORTS_TOKEN_HASH(1.650.0) is reached by all workers, thenNULLis written insteadTODO for @hugocasademont
Remove
get_token_by_prefix(native triggers)get_token_by_prefixretrieves the plaintext token from the DB so it can be embedded in webhook callback URLs. It's used in two places:handler.rs:237— trigger update: already handlesNonegracefully (deletes old token, creates new one)google/external.rs:327—renew_channel: hard-fails onNone("Webhook token not found")Once
MIN_VERSION_SUPPORTS_TOKEN_HASHis met and plaintext stops being written, all new webhook tokens will havetoken = NULL, breaking Google trigger watch renewal.Recommended fix — always delete + create a new token on renewal:
create_webhook_token_for_trigger(db, trigger)that inserts a token directly using the trigger'semail/owner/workspace_id/script_path(noApiAuthedneeded)renew_channelto: delete old token by prefix → create new token → use it in the webhook URL → updatewebhook_token_prefixon the triggerget_token_by_prefixcan be removed entirely — every renewal just creates a fresh tokenThis matches the pattern
handler.rsalready uses on trigger update.Test plan
cargo checkpasses (CE + EE + native_trigger features)token_hash+token_prefixpopulated,tokencolumn has plaintext (pre-1.650.0) or NULL (post-1.650.0)🤖 Generated with Claude Code
Companion EE PR: https://github.com/windmill-labs/windmill-ee-private/pull/437