Skip to content

feat: store hashed tokens instead of plaintext#8217

Merged
rubenfiszel merged 38 commits intomainfrom
store-hash
Mar 17, 2026
Merged

feat: store hashed tokens instead of plaintext#8217
rubenfiszel merged 38 commits intomainfrom
store-hash

Conversation

@rubenfiszel
Copy link
Copy Markdown
Contributor

@rubenfiszel rubenfiszel commented Mar 4, 2026

Summary

  • Store SHA-256 hashed tokens (token_hash) in the token table instead of plaintext, so a DB compromise doesn't expose usable credentials
  • Add token_prefix (first 10 chars) column for listing/deletion without needing plaintext
  • Auth cache stays keyed by raw token — zero-cost cache hits, only hash on cache miss
  • Two-phase migration: (1) add columns + backfill + build indexes (allows concurrent reads), (2) instant PK swap (~2ms exclusive lock)
  • Plaintext token column is still written alongside the hash until MIN_VERSION_SUPPORTS_TOKEN_HASH (1.650.0) is reached by all workers, then NULL is written instead

TODO for @hugocasademont

Remove get_token_by_prefix (native triggers)

get_token_by_prefix retrieves the plaintext token from the DB so it can be embedded in webhook callback URLs. It's used in two places:

  1. handler.rs:237 — trigger update: already handles None gracefully (deletes old token, creates new one)
  2. google/external.rs:327renew_channel: hard-fails on None ("Webhook token not found")

Once MIN_VERSION_SUPPORTS_TOKEN_HASH is met and plaintext stops being written, all new webhook tokens will have token = NULL, breaking Google trigger watch renewal.

Recommended fix — always delete + create a new token on renewal:

  • Add a low-level create_webhook_token_for_trigger(db, trigger) that inserts a token directly using the trigger's email/owner/workspace_id/script_path (no ApiAuthed needed)
  • Update renew_channel to: delete old token by prefix → create new token → use it in the webhook URL → update webhook_token_prefix on the trigger
  • Then get_token_by_prefix can be removed entirely — every renewal just creates a fresh token

This matches the pattern handler.rs already uses on trigger update.

Test plan

  • cargo check passes (CE + EE + native_trigger features)
  • Migration applies cleanly on fresh and existing DBs
  • Create token via UI → verify token_hash+token_prefix populated, token column has plaintext (pre-1.650.0) or NULL (post-1.650.0)
  • Authenticate with Bearer token → hash-based lookup works
  • List tokens → prefix displays correctly
  • Delete token by prefix → deletion works
  • Login/logout cycle → session tokens work
  • Monitor cleanup of expired tokens uses prefix in logs
  • Native trigger webhook re-creation works

🤖 Generated with Claude Code

Companion EE PR: https://github.com/windmill-labs/windmill-ee-private/pull/437

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude

This comment was marked as outdated.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 4, 2026

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: cb144ef
Status:⚡️  Build in progress...

View logs

Comment thread backend/windmill-native-triggers/src/lib.rs Outdated
Comment thread backend/migrations/20260316000000_token_hash.up.sql
Comment on lines +15 to +16
-- Make token NOT NULL again
ALTER TABLE token ALTER COLUMN token SET NOT NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential down-migration failure: When rolling back step 2 (token_hash_pk_swap.down.sql), this restores token NOT NULL, but new tokens created after the forward migration will have token = NULL (since the code no longer populates it). This means the rollback will fail with ERROR: column "token" of relation "token" contains null values if any new tokens were created after migration.

You'd need to backfill or drop the new null rows before restoring NOT NULL:

-- Delete tokens that were created after the forward migration (no plaintext token)
DELETE FROM token WHERE token IS NULL;
-- Then restore NOT NULL
ALTER TABLE token ALTER COLUMN token SET NOT NULL;

Comment thread backend/tests/fixtures/preserve_on_behalf_of.sql
Comment thread backend/windmill-api-auth/src/auth.rs
rubenfiszel and others added 5 commits March 4, 2026 08:01
- Update all base.sql fixtures to include token_hash/token_prefix columns
- Keep plaintext token for webhook tokens (needed for URL reconstruction)
- Restore get_token_by_prefix to query DB for webhook tokens
- Fix down migration to delete NULL-token rows before restoring NOT NULL
- Update parser fixture standalone schema
- Update EE dedicated_worker_ee.rs to use token_hash/token_prefix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write to token column alongside token_hash until MIN_VERSION_SUPPORTS_TOKEN_HASH
(1.649.0) is reached. This ensures older workers can still authenticate
during rolling upgrades. Remove the separate UPDATE in new_webhook_token
since create_token_internal now writes plaintext directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check MIN_VERSION_SUPPORTS_TOKEN_HASH at runtime: write plaintext to
token column while old workers exist, switch to NULL once all workers
are >= 1.649.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rubenfiszel
Copy link
Copy Markdown
Contributor Author

/ai-fast reviewi this PR

@claude

This comment was marked as outdated.

hugocasa and others added 9 commits March 12, 2026 10:36
Resolve conflicts in monitor.rs and native-triggers/lib.rs,
keeping the token_prefix-based queries from the store-hash branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…OT NULL

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@hugocasa
Copy link
Copy Markdown
Collaborator

/ai-fast review this PR

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 13, 2026

Claude finished @hugocasa's task in 1m 55s —— View job


PR Review: Store hashed tokens instead of plaintext

  • Read key changed files (auth, migrations, native triggers, users)
  • Analyze security and correctness
  • Post review findings

Overall this is a well-designed security improvement. The two-phase migration approach, the MIN_VERSION_SUPPORTS_TOKEN_HASH gating, and the auth cache staying keyed by raw token are all solid choices. Here are the findings:


Bugs / Issues

1. Potential panic on short tokens (auth.rs)

There are 7 instances of token[0..TOKEN_PREFIX_LEN] that will panic if a token shorter than 10 chars is provided. While rd_string(32) always produces 32-char tokens, this could still be triggered by:

  • A malformed Authorization: Bearer xx header
  • Short cookie values from corrupted state

The superadmin path at auth.rs:437 already uses the safe pattern: token.get(..TOKEN_PREFIX_LEN).unwrap_or(token). The other 7 sites should too:

  • auth.rs:284, auth.rs:308, auth.rs:324, auth.rs:373, auth.rs:387, auth.rs:402
  • users.rs:1651

Similarly, mcp/oauth_server.rs:389 and oauth_server.rs:533 use &access_token[..TOKEN_PREFIX_LEN] which could panic, though these are on internally-generated tokens so the risk is very low.

Fix this →

2. Google watch renewal fails when token is orphaned (google/external.rs:328-334)

renew_channel calls rotate_webhook_token and if it returns None (token deleted or missing), it returns a hard error. This is fine for now since there's always a token row during the transition period, but after MIN_VERSION_SUPPORTS_TOKEN_HASH is met and plaintext stops being stored, the rotate_webhook_token function should still work (it looks up by token_hash, not plaintext), so this path should remain functional. However, the error message at line 332 says "no user context to create a fresh one" — consider adding a fallback that creates a fresh token (as the PR description's TODO recommends) rather than failing the renewal entirely.


Migration Concerns

3. Native trigger migration deletes orphaned triggers (20260313000000_native_trigger_token_hash.up.sql:13)

DELETE FROM native_trigger WHERE webhook_token_hash IS NULL;

This silently deletes any native_trigger row whose webhook_token_prefix doesn't match a token.token_prefix. This could happen if:

  • A token was deleted but the trigger wasn't cleaned up
  • Prefix collision caused the JOIN to match a different token

The deletion is the right thing to do for data integrity, but consider logging or counting how many rows are deleted, or adding a DO NOTHING approach (set a sentinel hash value) so the trigger can be detected and re-created rather than silently lost.

4. Native trigger migration backfill uses prefix JOIN (20260313000000_native_trigger_token_hash.up.sql:7-10)

UPDATE native_trigger nt SET webhook_token_hash = t.token_hash
FROM token t WHERE t.token_prefix = nt.webhook_token_prefix;

If two tokens share the same 10-char prefix (unlikely but possible), this JOIN could match the wrong token for a trigger. Since the prefix space is 62^10, the probability is negligible in practice, but it's worth noting. A DISTINCT ON or additional filtering by workspace_id and email would be safer.


Design Observations (Non-blocking)

5. delete_token can delete multiple tokens with same prefix (users.rs:1951-1960)

The DELETE FROM token WHERE email = $1 AND token_prefix = $2 could delete multiple tokens if a user has tokens with colliding 10-char prefixes. The code already returns tokens_deleted.len() in the response message, which is good for visibility. At scale, consider whether a UNIQUE(email, token_prefix) constraint or using token_hash for deletion would be safer.

6. rotate_webhook_token doesn't copy expiration or owner (lib.rs:727-782)

The rotation creates a new token copying label, email, scopes, workspace_id, and super_admin, but doesn't copy expiration or owner. For webhook tokens, having no expiration is probably intentional (they live as long as the trigger), but the missing owner column means the new token won't have an owner set. Verify this is acceptable for token permission checks.

7. token_expiry_notification table stores token_hash as FK (lib.rs:622-631)

INSERT INTO token_expiry_notification (token, expiration) VALUES ($1, $2)

The column is still named token but now stores a hash. This is a naming inconsistency that could confuse future developers. Not blocking, but worth a rename in a follow-up migration (e.g., token_hash).


Summary

The core security goal (removing plaintext tokens from the DB) is well-implemented. The main actionable item is the short token panic (item 1) which could cause a runtime crash on malformed input. Everything else is either already acknowledged in the PR's TODO section or is a low-probability edge case.

Copy link
Copy Markdown
Collaborator

@hugocasa hugocasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ai-fast

@hugocasa
Copy link
Copy Markdown
Collaborator

/ai-fast

hugocasa and others added 4 commits March 13, 2026 23:21
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ner/expiration in rotate

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…fication column

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@hugocasa
Copy link
Copy Markdown
Collaborator

/ai-fast review this PR

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 13, 2026

Claude finished @hugocasa's task in 1m 48s —— View job


PR Review: Store hashed tokens instead of plaintext

  • Read key changed files (auth, users, native triggers, migrations)
  • Check if previous review issues were addressed
  • Analyze security, correctness, and edge cases
  • Post review findings

This is a well-designed security improvement. The two-phase migration, auth cache keyed by raw token, MIN_VERSION_SUPPORTS_TOKEN_HASH gating, and rotate_webhook_token approach are all solid. Here are the findings from this review:


Bugs

1. Panic on short tokens — still present (7 sites in auth.rs + 1 in users.rs)

This was flagged in both previous reviews and remains unfixed. token[0..TOKEN_PREFIX_LEN] will panic if a token shorter than 10 characters is provided:

  • auth.rs:284, auth.rs:308, auth.rs:324, auth.rs:373, auth.rs:387, auth.rs:402 — regular token auth paths
  • auth.rs:436 — superadmin secret path (the secret is user-configured and could easily be shorter than 10 chars)
  • users.rs:1651 — login path

While rd_string(32) always produces 32-char tokens, a malformed Authorization: Bearer x header or a short superadmin secret will crash the server.

The fix is straightforward — use token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string() everywhere. This is a one-line change at each site.

Fix this →


Previously Flagged Issues — Now Addressed

2. Native trigger orphan handling (migration) — Fixed

The previous reviews flagged that 20260313000000_native_trigger_token_hash.up.sql deleted orphaned triggers silently. This has been fixed — orphaned triggers now get a placeholder hash and an error message ("Webhook token not found during migration — re-create this trigger to fix") instead of being deleted. Good improvement.

3. Down migration NULL token handling — Fixed

20260304000001_token_hash_pk_swap.down.sql:16 correctly deletes NULL-token rows before restoring NOT NULL.

4. token_expiry_notification column naming — Fixed

Migration 20260313000001_rename_token_expiry_notification_column.up.sql correctly converts existing plaintext values to hashes and renames the column to token_hash.

5. rotate_webhook_token copies owner and expiration — Fixed

lib.rs:763-775 now copies old.owner and old.expiration to the new token, addressing the previous review concern.


Design Observations (Non-blocking)

6. token_expiry_notification migration hashes plaintext values (20260313000001)

UPDATE token_expiry_notification
SET token = encode(sha256(token::bytea), 'hex');

This assumes all existing values in token_expiry_notification.token are plaintext. If any rows were already inserted with a hash (e.g., from the new code running after 20260304000000 but before this migration), they'd be double-hashed. Worth verifying the ordering guarantees — since 20260313000001 runs after 20260304000000, and the new code in register_token_expiry_notification already passes token_hash, any rows created between the two migrations would indeed be double-hashed.

However, looking at the code path: register_token_expiry_notification in lib.rs:622 inserts token_hash into a column named token. So between migrations 20260304000000 (which starts writing hashes) and 20260313000001 (which hashes the column), new rows will contain hashes that get re-hashed. The impact is low — these rows just won't match in check_expiring_tokens's JOIN, so some expiry notifications could be missed during the transition window.

7. Google channel renewal still hard-fails on missing token (google/external.rs:328-335)

As acknowledged in the PR description's TODO, renew_channel returns a hard error when rotate_webhook_token returns None. The handler.rs update path (lines 255-273) already has the correct pattern — it falls back to new_webhook_token. The renew_channel path should get the same treatment before MIN_VERSION_SUPPORTS_TOKEN_HASH is bumped to avoid breaking Google watch renewals. Since this is explicitly called out as a TODO for @hugocasademont, it's tracked.

8. Native trigger migration backfill prefix collision risk (very low)

The prefix JOIN at 20260313000000:7-10 could theoretically match the wrong token if two tokens share a 10-char prefix. Adding AND t.workspace_id = nt.workspace_id would make this more robust, though the 62^10 prefix space makes collisions negligible in practice.


Summary

The core security design is sound and most previous review issues have been addressed. The only remaining actionable bug is the short token panic (item 1) — a malformed Authorization: Bearer x header or a short superadmin secret will crash the server. This should be fixed before merge.

View job

@hugocasa
Copy link
Copy Markdown
Collaborator

hugocasa commented Mar 13, 2026

Manual Testing Checklist

Token basics

  • Create a token via Settings > Tokens > Add token — verify it works for API calls
  • List tokens — verify the token prefix shows correctly (10 chars)
  • Delete a token by prefix — verify it no longer authenticates
  • Login/logout — verify session token is created on login and invalidated on logout

Backward compatibility

  • Check the token table: newly created tokens should have token_hash and token_prefix populated
  • If MIN_VERSION is below 1.650.0 (rolling upgrade scenario), verify the plaintext token column is also populated
  • If MIN_VERSION is at or above 1.650.0, verify the plaintext token column is NULL

Native triggers (requires Google or Nextcloud integration)

  • Create a native trigger — verify webhook works and receives events
  • Update a native trigger (change the script path) — verify webhook still works with the new token
  • Delete a native trigger — verify the webhook token is also cleaned up from the token table
  • Check native_trigger.webhook_token_hash is populated (not the old webhook_token_prefix)

Migration

  • Verify the migration runs cleanly on a DB with existing tokens
  • Check that token_hash and token_prefix columns are backfilled for all existing tokens
  • Verify the token_expiry_notification.token_hash column was renamed and existing values hashed

Edge cases

  • Verify superadmin secret still works for authentication
  • Verify workspace-scoped tokens still respect workspace boundaries
  • Verify expired tokens are still rejected

@hugocasa hugocasa marked this pull request as draft March 13, 2026 22:41
@hugocasa hugocasa marked this pull request as ready for review March 16, 2026 13:24
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 16, 2026

Claude finished @hugocasa's task in 5m 20s —— View job


PR Review: Store hashed tokens instead of plaintext

  • Read key changed files (auth, users, migrations, native triggers, oauth)
  • Analyze security, correctness, and migration safety
  • Check previous review findings
  • Post comprehensive review with testing instructions

Overall this is a well-designed security improvement. The two-phase migration approach, auth cache keyed by raw token, MIN_VERSION_SUPPORTS_TOKEN_HASH gating, and rotate_webhook_token pattern are all solid. Here are the findings:


Bug: Panic on short tokens (still present)

This was flagged in all three previous reviews and remains unfixed. token[0..TOKEN_PREFIX_LEN] will panic on tokens shorter than 10 characters.

Most critical path — auth.rs:436 (superadmin secret):

token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),

The SUPERADMIN_SECRET env var is user-configured and can easily be shorter than 10 chars. Any instance with a short secret will crash the API server on auth.

Other affected locations in auth.rs: lines 284, 308, 324, 373, 387, 402 — these handle tokens from Authorization: Bearer headers which could theoretically be malformed/short.

users.rs:531 — logout path, token comes from Bearer header. Lines 1651, 1753, 1841 use rd_string(32) so are safe in practice but should still use the defensive pattern.

Fix: Replace all bare token[0..TOKEN_PREFIX_LEN] with token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string(), or extract a helper:

fn safe_token_prefix(token: &str) -> String {
    token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string()
}

Fix this →


Recommendation: mcp_oauth_refresh_token.access_token semantic change without migration

oauth_server.rs:431 — The access_token column in mcp_oauth_refresh_token now stores the hash of the access token. There's no migration to hash existing plaintext values in this column. Existing rows from before this PR will have plaintext in access_token, and the DELETE FROM token WHERE token_hash = $1 at line 520 will silently fail to match (non-fatal since the error is caught, but the old access token won't be cleaned up on refresh).

Suggestion: Either add a migration to hash existing values, or rename the column to access_token_hash for clarity, or at minimum add a code comment explaining why existing rows are acceptable.


Recommendation: rotate_webhook_token doesn't migrate token_expiry_notification rows

lib.rs:729-786 — When rotating a webhook token, the old token is deleted and a new one created. If the old token had a token_expiry_notification row, it becomes orphaned (references a deleted token_hash). The new token gets no notification row.

Suggestion: Either copy the notification to the new hash, or add ON DELETE CASCADE from token_expiry_notification.token_hash to token.token_hash.


Migration Notes (non-blocking)

Orphan sentinel hash (20260316000002:15-18): Uses sha256('') as a placeholder. This works but is a magic value. A clearly-invalid non-hex string like '__orphaned__' would be more explicit.

Prefix JOIN for backfill (20260316000002:7-10): Joins on token_prefix (10 chars). Collision probability is negligible (62^10 space) but adding AND t.workspace_id = nt.workspace_id would be more robust.

Down migration is destructive (20260316000001.down.sql:16): DELETE FROM token WHERE token IS NULL destroys all tokens created after the forward migration. This is correct behavior but could surprise operators — consider adding a -- WARNING: destroys post-migration tokens comment.


Positive Observations

  • Migration strategy is excellent. Splitting into 4 migrations (add columns + backfill + indexes, instant PK swap, native trigger hash, token_expiry_notification rename) minimizes lock duration and allows partial rollback.
  • Backward compatibility is thoughtful. The MIN_VERSION_SUPPORTS_TOKEN_HASH gate ensures plaintext is still written for rolling upgrades, and idx_token_plaintext keeps old workers fast.
  • Cache invalidation transition is clean. The DB trigger now sends token_prefix instead of plaintext, and invalidate_token_from_cache evicts by prefix — no plaintext ever touches the notification channel post-migration.
  • Token rotation pattern is solid. rotate_webhook_token correctly copies metadata within a transaction, and callers handle None gracefully (or with a visible error on the trigger).
  • Test coverage is good. 6 integration tests cover hash consistency, CRUD lifecycle, logout, backward compat, and rotation.
  • Google renewal TODO is well-documented. The PR description clearly outlines the fix needed before MIN_VERSION_SUPPORTS_TOKEN_HASH is bumped.

Testing Instructions

To verify the token hashing changes: Go to Settings > Tokens (user settings page) and create a new token — it should display a token prefix (first 10 chars) in the token list. Copy the full token value shown at creation time and use it in an API call (curl -H "Authorization: Bearer <token>" http://localhost:8000/api/version) to verify hash-based authentication works. Delete the token from the list by clicking the delete button — verify it no longer authenticates. Log out and log back in to verify session tokens work. If you have access to the database, check the token table: token_hash and token_prefix should be populated for all rows; the token column should be NULL if the instance version is >= 1.650.0, or contain the plaintext otherwise. For native triggers, create/update a webhook trigger and verify native_trigger.webhook_token_hash is populated. The superadmin secret (SUPERADMIN_SECRET env var) should also still work for authentication — note the short-token panic bug above means secrets shorter than 10 characters will crash the server.


@hugocasa hugocasa changed the title feat: store hashed tokens instead of plaintext [ee] feat: store hashed tokens instead of plaintext Mar 16, 2026
hugocasa and others added 5 commits March 16, 2026 19:11
rotate_webhook_token now atomically inserts the new token and deletes
the old one in a single transaction, preventing token leaks.

Returns new_token_hash so callers can clean up the new token if their
subsequent trigger update fails (which involves external HTTP calls
and cannot be in the same DB transaction).

- Handler: wraps post-rotation work; deletes new token on failure
- Google renewal: deletes new token if service_config update fails
- Tests updated to match new atomic semantics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ilure

rotate_webhook_token now keeps the old token alive and returns
old_token_hash. Callers delete it only after the trigger row has been
successfully updated. If the external service call or DB update fails,
the trigger keeps working with the old token.

Worst case: if the best-effort delete fails, the old token leaks as an
extra DB row — harmless compared to breaking the trigger.

Also update summarized_schema.txt for renamed columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit updates the EE repository reference after PR #437 was merged in windmill-ee-private.

Previous ee-repo-ref: 7aef8b06cb6f54c2bc89dd57b70947deed72553c

New ee-repo-ref: 2d0823a471014e2bc2d898c63518323946b7474f

Automated by sync-ee-ref workflow.
@windmill-internal-app
Copy link
Copy Markdown
Contributor

🤖 Updated ee-repo-ref.txt to 2d0823a471014e2bc2d898c63518323946b7474f after windmill-ee-private PR #437 was merged.

@windmill-internal-app windmill-internal-app Bot changed the title [ee] feat: store hashed tokens instead of plaintext feat: store hashed tokens instead of plaintext Mar 17, 2026
@rubenfiszel
Copy link
Copy Markdown
Contributor Author

/ai-fast re-review this PR

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 17, 2026

Claude finished @rubenfiszel's task in 2m 55s —— View job


PR Re-Review: Store hashed tokens instead of plaintext

  • Read key changed files (auth, users, native triggers, migrations, oauth)
  • Check if previous review issues were addressed
  • Analyze security, correctness, and edge cases
  • Post comprehensive review findings

Overall this is a well-designed security improvement. The two-phase migration, auth cache keyed by raw token, MIN_VERSION_SUPPORTS_TOKEN_HASH gating, and rotate_webhook_token approach are all solid.


Status of previously flagged issues

# Issue Status
1 Short token panic Still present (see below)
2 Orphaned triggers deleted silently Fixed — now sets placeholder hash + error message
3 Down migration NULL handling Fixed — deletes NULL rows before restoring NOT NULL
4 token_expiry_notification naming Fixed — migration 20260316000003 renames to token_hash
5 rotate_webhook_token missing owner/expiration Fixedlib.rs:763-775 copies both
6 mcp_oauth_refresh_token.access_token naming Fixed — migration 20260316000004 renames to access_token_hash
7 Google renewal hard-fail Acknowledged — tracked as TODO for @hugocasademont

Bug: Panic on short tokens (4th time flagged)

There are 8 instances of token[0..TOKEN_PREFIX_LEN].to_string() that will panic if a token shorter than 10 characters is provided:

auth.rs — 7 sites on the main auth path (lines 284, 308, 324, 373, 387, 402, 436):

token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),

users.rs:1651 — login path:

token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),

Additionally, there are 4 instances of &token[..TOKEN_PREFIX_LEN] that will also panic:

  • lib.rs:523 (create_token_internal) — safe, uses rd_string(32)
  • users.rs:531 (logout) — unsafe, token comes from Authorization: Bearer header
  • users.rs:1753 (create_session_token) — safe, uses rd_string(32)
  • users.rs:1841 (impersonate) — safe, uses rd_string(32)

Most critical path — auth.rs:436 (superadmin secret): The SUPERADMIN_SECRET env var is user-configured and can easily be set to something shorter than 10 chars (e.g., "changeme"). Any instance with a short secret will crash on first auth attempt.

users.rs:531 (logout): A malformed cookie or Authorization: Bearer x header will crash the server.

Fix: Add a helper and use it everywhere:

fn safe_prefix(token: &str) -> String {
    token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string()
}

Or just use the inline pattern at each site: token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string()

Fix this →


Design observations (non-blocking)

1. token_expiry_notification double-hashing window

Migration 20260316000003 hashes all existing values:

UPDATE token_expiry_notification SET token = encode(sha256(token::bytea), 'hex');

But the new code in register_token_expiry_notification (lib.rs:622) already writes token_hash into the token column. Any notification rows inserted between migration 20260316000000 (new code deploys) and 20260316000003 (this migration runs) will be double-hashed, causing them to not match in check_expiring_tokens. Impact is low — some expiry notifications may be missed during the brief transition window.

2. Google renewal TODO

renew_channel (google/external.rs:330-335) still hard-fails when rotate_webhook_token returns None. The PR description's TODO outlines the right fix (create a fresh token without ApiAuthed). This should be implemented before MIN_VERSION_SUPPORTS_TOKEN_HASH is bumped to avoid breaking Google watch renewals. Since this is explicitly tracked as a TODO for @hugocasademont, just noting it's still pending.

3. Native trigger backfill prefix collision

20260316000002:7-10 joins native_trigger to token on token_prefix alone. Adding AND t.workspace_id = nt.workspace_id would be more robust, though the 62^10 prefix space makes collisions negligible.


Positive observations

  • Migration strategy is excellent. Four migrations minimize lock duration and allow partial rollback
  • Backward compatibility is well-thought-out. MIN_VERSION_SUPPORTS_TOKEN_HASH gate + idx_token_plaintext keep old workers fast during rolling upgrades
  • Cache invalidation transition is clean. DB trigger sends token_prefix instead of plaintext
  • Token rotation pattern is solid. rotate_webhook_token correctly copies all metadata within a transaction
  • Test coverage is good. 6 integration tests cover hash consistency, CRUD lifecycle, logout, backward compat, and rotation
  • handler.rs update path is well-implemented. Falls back to new_webhook_token when rotate_webhook_token returns None (lines 263-279)

Summary

The core design is sound and most previous issues have been addressed. The only remaining actionable bug is the short token panic — a malformed Authorization: Bearer x header or a short superadmin secret will crash the server. This should be fixed before merge.

@rubenfiszel
Copy link
Copy Markdown
Contributor Author

/ai fix There are 8 instances of token[0..TOKEN_PREFIX_LEN].to_string() that will panic if a token shorter than 10 characters is provided:

auth.rs — 7 sites on the main auth path (lines 284, 308, 324, 373, 387, 402, 436):

token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),

users.rs:1651 — login path:

token_prefix: Some(token[0..TOKEN_PREFIX_LEN].to_string()),

Additionally, there are 4 instances of &token[..TOKEN_PREFIX_LEN] that will also panic:

lib.rs:523 (create_token_internal) — safe, uses rd_string(32)
users.rs:531 (logout) — unsafe, token comes from Authorization: Bearer header
users.rs:1753 (create_session_token) — safe, uses rd_string(32)
users.rs:1841 (impersonate) — safe, uses rd_string(32)

Most critical path — auth.rs:436 (superadmin secret): The SUPERADMIN_SECRET env var is user-configured and can easily be set to something shorter than 10 chars (e.g., "changeme"). Any instance with a short secret will crash on first auth attempt.

users.rs:531 (logout): A malformed cookie or Authorization: Bearer x header will crash the server.

Fix: Add a helper and use it everywhere:

fn safe_prefix(token: &str) -> String {
token.get(..TOKEN_PREFIX_LEN).unwrap_or(token).to_string()
}

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 17, 2026

Claude finished @rubenfiszel's task in 4m 16s —— View job


Fix: Short token panic prevention

  • Read auth.rs to find all unsafe token[0..TOKEN_PREFIX_LEN] sites
  • Read users.rs to find unsafe sites
  • Add safe_token_prefix helper function in windmill-common/src/auth.rs
  • Fix all 7 unsafe sites in auth.rs
  • Fix unsafe sites in users.rs (lines 531, 1651, 1753, 1841)
  • Fix lib.rs:523 in windmill-api-auth
  • Fix oauth_server.rs (lines 389, 533)
  • Fix native-triggers/lib.rs:767
  • Run cargo check --features all_sqlx_features
  • Commit and push

Summary

Added safe_token_prefix() helper in windmill-common/src/auth.rs that uses .get(..TOKEN_PREFIX_LEN).unwrap_or(token) instead of direct slice indexing. Replaced all 12 instances of unsafe token[0..TOKEN_PREFIX_LEN] / &token[..TOKEN_PREFIX_LEN] across:

  • windmill-api-auth/src/auth.rs — 7 sites (critical: superadmin secret path)
  • windmill-api-users/src/users.rs — 3 sites (critical: logout path)
  • windmill-api-auth/src/lib.rs — 1 site
  • windmill-native-triggers/src/lib.rs — 1 site
  • windmill-api/src/mcp/oauth_server.rs — 2 sites (these + a few others used rd_string(32) so were safe in practice, but fixed for consistency)
    | Branch: store-hash

Add safe_token_prefix() helper that uses .get(..TOKEN_PREFIX_LEN).unwrap_or(token)
instead of direct slice indexing, preventing panics when tokens are shorter than
10 characters (e.g., short superadmin secrets or malformed Bearer tokens).

Co-authored-by: Ruben Fiszel <rubenfiszel@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rubenfiszel rubenfiszel merged commit f2be625 into main Mar 17, 2026
9 of 10 checks passed
@rubenfiszel rubenfiszel deleted the store-hash branch March 17, 2026 01:15
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants