auth: extract ?o=/?a= from DATABRICKS_HOST env var#5337
Conversation
Pasting a SPOG URL from the Databricks UI into DATABRICKS_HOST drops the workspace identifier on the way into the SDK: the SDK strips query parameters in fixHostIfNeeded without promoting them to WorkspaceID. API calls then hit the SPOG without an X-Databricks-Org-Id header and the server answers with HTML, surfacing as "received HTML response instead of JSON". The bundle YAML host field is already normalized via NormalizeHostURL, but the env var bypassed that path. Run the existing ExtractHostQueryParams normalization on DATABRICKS_HOST at the top of root.Execute before any SDK code runs. Promote ?o= and ?a= to DATABRICKS_WORKSPACE_ID and DATABRICKS_ACCOUNT_ID when those env vars are unset, and rewrite DATABRICKS_HOST without the query string. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
shreyas-goenka
left a comment
There was a problem hiding this comment.
Are we sure this is how we want to approach this? Should we not fix the parser in the SDK? This seems a bit hacky?
| os.Setenv(envHost, params.Host) | ||
| if params.WorkspaceID != "" { | ||
| if _, set := os.LookupEnv(envWorkspaceID); !set { | ||
| os.Setenv(envWorkspaceID, params.WorkspaceID) |
There was a problem hiding this comment.
What if the environment variables are set but conflict with the ?o parameter? Should we error in that case?
There was a problem hiding this comment.
Today existing env vars always win: if DATABRICKS_WORKSPACE_ID is set, we leave it alone and just strip the query from DATABRICKS_HOST. Same behavior as the SDK PR (existing Config.WorkspaceID is preserved).
I'd lean against erroring, the env var is the more explicit signal, and a hard error here would also diverge from the SDK behavior. Happy to revisit if there's a concrete scenario where silent-preserve bites users.
There was a problem hiding this comment.
Fair enough, lets add a todo here so some future agent or human can remove this.
There was a problem hiding this comment.
Added in 9db4b3b, right next to the env-var-precedence branch so it's visible at the relevant code site (the top-of-file TODO covers deleting the whole normalizer on the next SDK bump).
One small behavior fix piggybacked: an explicitly-empty DATABRICKS_WORKSPACE_ID / DATABRICKS_ACCOUNT_ID is now treated the same as unset, so a paste-empty env doesn't block ?o= / ?a= promotion. That also makes the t.Setenv switch in the test cleaner (see the other thread).
|
Commit: 8a44c5e |
|
Yeah, I agree the SDK is the right long-term home, and I've got that fix open in databricks/databricks-sdk-go#1699, same logic, in Doing both because:
Happy to add a |
Stopgap until the SDK fix lands and the CLI bumps to a version that includes it; then this normalizer can be deleted. Co-authored-by: Isaac
shreyas-goenka
left a comment
There was a problem hiding this comment.
Approving to unblock, thanks!
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| if tt.preserveHostUnset { | ||
| os.Unsetenv(envHost) |
There was a problem hiding this comment.
nit: use t.Setenv and set to "".
Benefits:
- blocks parallel runs for this test.
- Automatically cleaned up once hte test runs. os.UnsetEnvs can handle around and might affect integration tests.
There was a problem hiding this comment.
Done in 9db4b3b, t.Setenv throughout, os.Unsetenv gone. Also collapsed the redundant wantXxxSet bools since the function now treats empty and unset identically and we can just compare against os.Getenv directly.
| os.Setenv(envHost, params.Host) | ||
| if params.WorkspaceID != "" { | ||
| if _, set := os.LookupEnv(envWorkspaceID); !set { | ||
| os.Setenv(envWorkspaceID, params.WorkspaceID) |
There was a problem hiding this comment.
Fair enough, lets add a todo here so some future agent or human can remove this.
- Treat an explicitly-empty DATABRICKS_WORKSPACE_ID/ACCOUNT_ID the same as unset, so a paste-empty env doesn't block ?o=/?a= promotion. - Add a TODO at the env-var-precedence branch so the silent-preserve behavior is easy to revisit when the SDK fix lands. - Use t.Setenv exclusively in the table-driven test (parallel-safe, auto-cleanup) and assert against os.Getenv directly. Co-authored-by: Isaac
| // NormalizeDatabricksHostEnv extracts ?o=/?workspace_id= and ?a=/?account_id= | ||
| // from DATABRICKS_HOST and promotes them to DATABRICKS_WORKSPACE_ID and |
There was a problem hiding this comment.
Let's already do w=?
There was a problem hiding this comment.
Where does ?w= come from? I checked the codebase and don't see it referenced anywhere (?o= is the SPOG convention, ?workspace_id= is the explicit alias). The SDK PR (#1699) only handles ?o= and ?workspace_id= too. Happy to add ?w= as another alias if it's a real URL pattern users paste, just want to make sure I'm picking up the right convention.
| // Promote ?o=/?a= query parameters in DATABRICKS_HOST to | ||
| // DATABRICKS_WORKSPACE_ID/DATABRICKS_ACCOUNT_ID before the SDK reads the | ||
| // env var. Without this, SPOG URLs pasted from the Databricks UI lose | ||
| // their workspace routing identifier and API calls return HTML. |
There was a problem hiding this comment.
Should the normalization be scoped to the CLI execution rather than changing the actual environment value? Do we see this potentially changing environment variable in unintended ways?
There was a problem hiding this comment.
Good call, refactored in 8a44c5e.
Before: NormalizeDatabricksHostEnv() mutated os.Setenv for DATABRICKS_HOST/WORKSPACE_ID/ACCOUNT_ID at the top of root.Execute.
Now: NormalizeDatabricksConfigFromEnv(ctx, cfg) sets the SDK Config fields directly. Same end result (?o=/?a= promoted, host stripped), but the process environment is left untouched. Called from the three places that actually build a Config from env: MustAccountClient, MustWorkspaceClient, and cmd/api's makeCommand.
Subprocesses (terraform) still get the normalized values via auth.Env(cfg) -> b.AuthEnv(), since terraform's env is derived from cfg, not os.Environ.
Confirmed the SDK env loader (config_attributes.go:84) skips fields that are already set, so our pre-populated cfg.Host survives without being clobbered by the unmodified DATABRICKS_HOST env value.
Replaces NormalizeDatabricksHostEnv (which called os.Setenv on DATABRICKS_HOST/WORKSPACE_ID/ACCOUNT_ID at the top of root.Execute) with NormalizeDatabricksConfigFromEnv(ctx, cfg), which sets the SDK Config fields directly. Same end result (?o=/?a= promoted, host stripped), but the process environment is left untouched. Called from the three places that build a Config from env: MustAccountClient, MustWorkspaceClient, and cmd/api's makeCommand. Subprocesses (terraform) still get the normalized values via auth.Env(cfg) -> b.AuthEnv(), which derives env from cfg rather than reading os.Environ. Co-authored-by: Isaac
|
Commit: a57fe14 |
Why
Pasting a SPOG URL from the Databricks UI (e.g.
https://acme.azuredatabricks.net/?o=12345) intoDATABRICKS_HOSTdrops the workspace identifier before any API call goes out. The SDK strips path and query fromHostinfixHostIfNeededwithout promoting?o=toWorkspaceID, so the request goes to the SPOG hostname without anX-Databricks-Org-Idheader. The server can't route it and answers with the login HTML page, which surfaces as:The bundle YAML
workspace.hostfield is already normalized viaNormalizeHostURLinbundle/config/workspace.go, anddatabricks apihandles?o=per call (#5137). The env-var path was the remaining gap.Changes
Before:
DATABRICKS_HOST=https://acme.databricks.net/?o=12345reached the SDK with the query intact, the SDK dropped it, andWorkspaceIDstayed empty.Now:
auth.NormalizeDatabricksHostEnvruns once at the top ofroot.Execute, before the SDK reads anything. It uses the existingExtractHostQueryParamshelper to promote?o=/?workspace_id=toDATABRICKS_WORKSPACE_IDand?a=/?account_id=toDATABRICKS_ACCOUNT_ID(only when those env vars are unset), then rewritesDATABRICKS_HOSTwithout the query string.A follow-up PR will push the same normalization into the SDK's
fixHostIfNeededso Python/Java/JS SDK users and any direct Go-SDK callers get the same fix without going through the CLI.Test plan
go test ./libs/auth/covers the new helper with table-driven cases:?o=promotion,?a=+?o=together, existingDATABRICKS_WORKSPACE_IDis preserved, hosts without query are untouched, non-numeric?o=is dropped, unsetDATABRICKS_HOSTis a no-op.go test ./cmd/root/passes../task checks.X-Databricks-Org-Idheader and got HTML back; after the fix the header is258628866953061andbundle validateproceeds to the workspace API.