feat(cli,mcp): add namespace support to destination smoke tests#999
Conversation
- Generate zz_deleteme_yyyymmdd_hhmm namespaces for test isolation - Add --namespace-suffix for concurrent run disambiguation - Add --reuse-namespace for multi-test runs on same namespace - Inject namespace into smoke test source streams and records - Include namespace in DestinationSmokeTestResult Co-Authored-By: AJ Steers <aj@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1773874618-namespace-support' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1773874618-namespace-support'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
Co-Authored-By: AJ Steers <aj@airbyte.io>
📝 WalkthroughWalkthroughAdded namespace support for destination smoke tests: auto-generate timestamped namespaces or reuse an existing one. The namespace is propagated from CLI/MCP through core test runner into the smoke-test source, which applies it to streams and emitted records. Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant CLI as CLI Handler
participant Core as Core Runner
participant Source as Smoke Test Source
participant Dest as Destination
User->>CLI: destination_smoke_test(--namespace-suffix / --reuse-namespace)
CLI->>Core: run_destination_smoke_test(namespace_suffix, reuse_namespace)
Core->>Core: compute namespace (generate or reuse)
Core->>Source: get_smoke_test_source(namespace=...)
Source->>Source: discover(config with namespace)
Source->>Source: _build_streams_from_scenarios(namespace=...)
Source-->>Core: configured source
Core->>Source: read(config with namespace)
Source->>Dest: emit records/statuses with namespace
Dest-->>Source: write to namespaced destination
Source-->>Core: read complete
Core-->>CLI: DestinationSmokeTestResult(namespace=...)
CLI-->>User: show result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Would you like me to add a short developer note or changelog entry suggestion to accompany these changes, wdyt? 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@airbyte/_util/destination_smoke_tests.py`:
- Around line 197-198: Add an explicit validation that rejects callers passing
both namespace_suffix and reuse_namespace together: in the function that accepts
the parameters (look for parameters named namespace_suffix and reuse_namespace
in destination_smoke_tests.py) check at start that not (namespace_suffix and
reuse_namespace) and raise a clear exception (e.g., ValueError or the CLI's
argument error type) if both are provided; do the same validation where those
parameters are handled around the block referenced at lines ~230-233 so the
mutually-exclusive intent is enforced rather than silently letting
reuse_namespace override namespace_suffix.
- Around line 40-55: The generated namespace in generate_namespace uses
minute-resolution timestamp ("%Y%m%d_%H%M") causing collisions for runs within
the same minute; change the timestamp to include finer resolution (e.g., seconds
"%S" or milliseconds) or append a short collision-resistant token (e.g., first 8
chars of uuid4) to make namespaces unique across parallel/quick reruns while
preserving the optional namespace_suffix and NAMESPACE_PREFIX usage in the
function.
In `@airbyte/cli/pyab.py`:
- Around line 703-711: Update the help text for the click option
"--reuse-namespace" (parameter reuse_namespace) to soften the cleanup claim:
remove or rephrase "cleanup is skipped" and instead state that specifying an
exact namespace prevents generation of a new namespace and that the CLI does not
automatically perform cleanup of that namespace (so users must manage cleanup
themselves); keep the note about it being useful for running a second test
against an already-populated namespace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 99d7510c-c9ce-45f1-a9a7-37153893006d
📒 Files selected for processing (4)
airbyte/_util/destination_smoke_tests.pyairbyte/cli/pyab.pyairbyte/cli/smoke_test_source/source.pyairbyte/mcp/local.py
Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Pull request overview
Adds per-run namespace isolation for destination smoke tests so each run writes into its own dedicated schema/namespace (using a zz_deleteme_... naming convention), exposed consistently through the shared helper, CLI, and MCP tool.
Changes:
- Introduces namespace generation + propagation through
run_destination_smoke_test()and returns the namespace inDestinationSmokeTestResult. - Extends the smoke-test source connector to accept a
namespaceconfig and apply it to discovered streams and emitted records. - Adds
--namespace-suffix/--reuse-namespaceoptions to the CLI and matching parameters to the MCP tool.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
airbyte/_util/destination_smoke_tests.py |
Adds namespace generation/selection and includes namespace in the result model. |
airbyte/cli/smoke_test_source/source.py |
Adds namespace to the source spec; applies it to streams and record messages. |
airbyte/cli/pyab.py |
Exposes namespace controls via CLI flags and passes them to the shared helper. |
airbyte/mcp/local.py |
Adds MCP tool parameters for namespace controls and passes them to the shared helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Devin, reply to any feedback you haven't replied to. Thx |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte/cli/smoke_test_source/source.py (1)
399-404:⚠️ Potential issue | 🟠 MajorConfirm: This namespace gets dropped when flowing through
from_read_result()Your code here is correct, but
AirbyteMessageIterator.from_read_result()(lines 103–113 inairbyte/_message_iterators.py) hardcodesnamespace=Nonewith an explicit comment that it doesn't handle namespaces. When a destination writes from aReadResult(e.g., aftersource.read()is called), the messages flow throughfrom_read_result()and the namespace is dropped—this happens in bothdestination.write(source_data=read_result)calls at lines 216 and 221 inairbyte/destinations/base.py.The test at
test_destination_write_from_read_result()confirms this pattern: source →read()→ReadResult→destination.write(). Is this namespace loss intentional, or does it need fixing?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte/cli/smoke_test_source/source.py` around lines 399 - 404, from_read_result currently hardcodes namespace=None and thus drops namespaces when converting a ReadResult into AirbyteMessage objects; update AirbyteMessageIterator.from_read_result to preserve per-record namespace by reading namespace from the incoming AirbyteRecord (or ReadResult record) and passing it to AirbyteRecordMessage (instead of None), so that messages yielded by from_read_result keep record.namespace and downstream destination.write calls (which consume ReadResult) retain namespaces; locate and change the logic in AirbyteMessageIterator.from_read_result and ensure any use of namespace=None is replaced with the record's namespace when present.
♻️ Duplicate comments (2)
airbyte/_util/destination_smoke_tests.py (2)
58-59:⚠️ Potential issue | 🟠 MajorCould we make generated namespaces collision-resistant, wdyt?
Minute-level timestamps can collide in parallel/rapid reruns and weaken the isolation this PR introduces.
Possible fix
from datetime import datetime, timezone +from uuid import uuid4 @@ - ts = now.strftime("%Y%m%d_%H%M") - return f"{NAMESPACE_PREFIX}_{ts}_{suffix}" + ts = now.strftime("%Y%m%d_%H%M%S") + token = uuid4().hex[:8] + return f"{NAMESPACE_PREFIX}_{ts}_{token}_{suffix}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte/_util/destination_smoke_tests.py` around lines 58 - 59, The generated namespace uses minute-level timestamps (ts = now.strftime("%Y%m%d_%H%M")) which can collide; update the namespace generation in destination_smoke_tests.py to be collision-resistant by including higher-resolution time (seconds or milliseconds) or appending a short random/UUID component so the returned string constructed from NAMESPACE_PREFIX, ts and suffix is unique (refer to the ts variable, NAMESPACE_PREFIX and the return expression) — ensure the resulting name format remains parseable but adds uniqueness to avoid parallel/rapid rerun collisions.
201-202:⚠️ Potential issue | 🟡 MinorCould we reject
namespace_suffixandreuse_namespacetogether, wdyt?Right now
reuse_namespacesilently wins, which makes conflicting inputs ambiguous.Possible guardrail
def run_destination_smoke_test( @@ ) -> DestinationSmokeTestResult: @@ + if namespace_suffix and reuse_namespace: + raise PyAirbyteInputError( + message="`namespace_suffix` and `reuse_namespace` are mutually exclusive.", + ) + # Determine namespace namespace = reuse_namespace or generate_namespace( namespace_suffix=namespace_suffix, )Also applies to: 235-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte/_util/destination_smoke_tests.py` around lines 201 - 202, Add an explicit validation that disallows passing both namespace_suffix and reuse_namespace together: in the function that declares the parameters namespace_suffix and reuse_namespace add a guard at the top that checks if both are not None and raise a clear exception (e.g., ValueError) with a message stating they are mutually exclusive; apply the same guard where the alternate signature/overload appears (the other spot referencing namespace_suffix/reuse_namespace around the later block) so callers get a deterministic error instead of reuse_namespace silently winning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@airbyte/_util/destination_smoke_tests.py`:
- Around line 227-229: The docstring for the parameter reuse_namespace currently
states "cleanup is skipped" which is misleading because this function has no
cleanup behavior; update the docstring for reuse_namespace (the parameter in
this module's destination_smoke_tests function) to remove the claim about
skipping cleanup and instead say something like: "When set, no new namespace is
generated; any cleanup of the reused namespace is the caller's responsibility."
Ensure you change only the docstring text referencing reuse_namespace to reflect
the actual behavior.
---
Outside diff comments:
In `@airbyte/cli/smoke_test_source/source.py`:
- Around line 399-404: from_read_result currently hardcodes namespace=None and
thus drops namespaces when converting a ReadResult into AirbyteMessage objects;
update AirbyteMessageIterator.from_read_result to preserve per-record namespace
by reading namespace from the incoming AirbyteRecord (or ReadResult record) and
passing it to AirbyteRecordMessage (instead of None), so that messages yielded
by from_read_result keep record.namespace and downstream destination.write calls
(which consume ReadResult) retain namespaces; locate and change the logic in
AirbyteMessageIterator.from_read_result and ensure any use of namespace=None is
replaced with the record's namespace when present.
---
Duplicate comments:
In `@airbyte/_util/destination_smoke_tests.py`:
- Around line 58-59: The generated namespace uses minute-level timestamps (ts =
now.strftime("%Y%m%d_%H%M")) which can collide; update the namespace generation
in destination_smoke_tests.py to be collision-resistant by including
higher-resolution time (seconds or milliseconds) or appending a short
random/UUID component so the returned string constructed from NAMESPACE_PREFIX,
ts and suffix is unique (refer to the ts variable, NAMESPACE_PREFIX and the
return expression) — ensure the resulting name format remains parseable but adds
uniqueness to avoid parallel/rapid rerun collisions.
- Around line 201-202: Add an explicit validation that disallows passing both
namespace_suffix and reuse_namespace together: in the function that declares the
parameters namespace_suffix and reuse_namespace add a guard at the top that
checks if both are not None and raise a clear exception (e.g., ValueError) with
a message stating they are mutually exclusive; apply the same guard where the
alternate signature/overload appears (the other spot referencing
namespace_suffix/reuse_namespace around the later block) so callers get a
deterministic error instead of reuse_namespace silently winning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9c63bb19-d24a-48c4-9b92-9bbcecc5ab04
📒 Files selected for processing (4)
airbyte/_util/destination_smoke_tests.pyairbyte/cli/pyab.pyairbyte/cli/smoke_test_source/source.pyairbyte/mcp/local.py
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/cli/pyab.py
- airbyte/mcp/local.py
…implemented) Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Good catch — this is a pre-existing limitation in PyAirbyte's write pipeline, not introduced by this PR. The However, the smoke test harness doesn't go through The |
Summary
Adds namespace isolation to destination smoke tests using a
zz_deleteme_yyyymmdd_hhmmnaming convention. Every smoke test run now writes to a dedicated namespace, making test data easy to identify and safe for automated cleanup (any namespace containingdeletemeis fair game forDROP SCHEMA-style cleanup scripts).Changes across 4 files:
airbyte/_util/destination_smoke_tests.py—generate_namespace()helper, newnamespace_suffixandreuse_namespaceparams onrun_destination_smoke_test(), namespace passed through to source config,namespacefield added toDestinationSmokeTestResultairbyte/cli/smoke_test_source/source.py—namespaceconfig field in spec, set on streams indiscover()and on records inread()airbyte/cli/pyab.py—--namespace-suffixand--reuse-namespaceCLI optionsairbyte/mcp/local.py— corresponding MCP tool parametersReview & Testing Checklist for Human
run_destination_smoke_test()always generates a namespace unlessreuse_namespaceis provided. Verify this is the desired behavior, since pre-existing callers will now get a namespace injected where there was none before.namespaceis a required field onDestinationSmokeTestResult(no default) — this is a breaking change to the model. Any code constructing this result withoutnamespacewill fail.reuse_namespacedocstring says "cleanup is skipped" (implying cleanup exists otherwise), but there is no cleanup code in thefinallyblock. Verify this is acceptable for the first increment.destination-dev-null(JDK) failed — the JDK destination received records withnamespace=nulldespite the Python pipeline correctly setting namespace on both catalog and records. This may be a pre-existing PyAirbyte write-pipeline issue with namespace propagation to JDK destinations.Notes
namespace = config.get("namespace")call insource.pyread()is evaluated inside the per-stream loop but doesn't vary per stream — minor inefficiency, not a bug.PLR0913/PLR0917lint suppression was added to the MCP tool since it now has 9 parameters.Link to Devin session: https://app.devin.ai/sessions/9c72389579884c06bf18cef11c4550e8
Requested by: Aaron ("AJ") Steers (@aaronsteers)
Summary by CodeRabbit
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.