Skip to content

Fix default sanitizer to only redact #-prefixed secrets; add input_files_dir to scaffolder#4

Open
matyas-jirat-keboola wants to merge 4 commits intomainfrom
feature/default-sanitizer-config
Open

Fix default sanitizer to only redact #-prefixed secrets; add input_files_dir to scaffolder#4
matyas-jirat-keboola wants to merge 4 commits intomainfrom
feature/default-sanitizer-config

Conversation

@matyas-jirat-keboola
Copy link
Collaborator

@matyas-jirat-keboola matyas-jirat-keboola commented Feb 26, 2026

Changes

Fix: create_default_sanitizer() only redacts #-prefixed secret values

  • create_default_sanitizer() was calling extract_values(), which extracts all string values from the secrets dict regardless of key name
  • Non-sensitive metadata fields like oauthVersion: "2.0" ended up in sensitive_values, causing literal string replacement in cassette URLs (e.g. api.xro/2.0api.xro/REDACTED)
  • Fix: switch to _collect_hash_values(), which already exists and correctly respects the Keboola #-prefix convention — only values under #-prefixed keys are treated as sensitive
  • DefaultSanitizer(config=...) already used _collect_hash_values() correctly; this aligns create_default_sanitizer() with the same behaviour
  • Tests updated to use #-prefixed keys matching the real secrets file format

Feature: input_files_dir parameter for writer component scaffolding

  • Added input_files_dir: Path | None to scaffold_from_json() and _scaffold_single_test()
  • Added TestScaffolder._copy_input_files() static method that reads each test's config.json and copies files from input_files_dir into the test's in/tables/ or in/files/ based on storage.input.tables[].destination / storage.input.files[].destination entries
  • The copy happens before recording so writer components find their input data during the recording run
  • Silently skips if input_files_dir does not exist or a referenced source file is missing
  • Documented the standard tests/setup/input_files/ repo layout convention in the module docstring

Test plan

  • Record cassettes for an OAuth component with oauthVersion in secrets.json (e.g. Xero) and verify URLs are not corrupted
  • Replay cassettes and verify no CannotOverwriteExistingCassetteException
  • Confirm actual secrets (#data, #appSecret) are still redacted in cassettes
  • Scaffold a writer component test with input_files_dir pointing to a directory with CSVs and verify files are copied into in/tables/ before recording

🤖 Generated with Claude Code

matyas-jirat-keboola and others added 4 commits February 26, 2026 14:09
…rameter

- Add input_files_dir param to scaffold_from_json() so writer component
  input CSVs can be supplied without going through the CLI layer
- Add TestScaffolder._copy_input_files() static method (previously lived
  only in datadirtest/__main__.py, invisible to API users)
- Expand module docstring with tests/setup/ layout convention and a
  concrete writer example so other sessions can discover the feature

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously _copy_input_files() was called after all tests were recorded,
meaning writer components couldn't find their input CSVs during the live
API run. Move the copy into _scaffold_single_test(), after the directory
structure is written but before _record_test() is called.

The files now live in in/tables/ or in/files/ for both recording and replay.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces extract_values() with _collect_hash_values() so that non-sensitive
metadata fields (oauthVersion, id, created, etc.) are skipped. Previously,
short values like "2.0" were added to sensitive_values, causing URL path
corruption in cassettes (e.g. api.xro/2.0 → api.xro/REDACTED).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous tests used plain keys (e.g. {"api_key": "my-key"}) which
did not reflect the actual config.secrets.json structure. In Keboola,
only #-prefixed keys are encrypted secrets — non-prefixed fields like
oauthVersion or id are plain metadata that must not be redacted from
cassettes (doing so was the bug fixed in 58bd94c).

Update both failing tests to use #-prefixed keys, and add an explicit
assertion that non-prefixed metadata values are not collected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@matyas-jirat-keboola matyas-jirat-keboola changed the title Fix create_default_sanitizer to only redact #-prefixed secret values Fix default sanitizer to only redact #-prefixed secrets; add input_files_dir to scaffolder Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant