Skip to content

Multiplex upload#5

Merged
mhusbynflow merged 5 commits intomasterfrom
multiplex-upload
Mar 10, 2026
Merged

Multiplex upload#5
mhusbynflow merged 5 commits intomasterfrom
multiplex-upload

Conversation

@mhusbynflow
Copy link
Copy Markdown
Collaborator

No description provided.

mhusbynflow and others added 4 commits March 10, 2026 09:38
… errors

The annotation endpoint returns validation errors in a different format
({"validation": [...]}) than the standard {"error": ...} response.
A dedicated exception preserves the structured error list so callers can
inspect individual validation failures.

FLOW-547

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change _raise_for_error fallback from "Unknown error" to the full body
dict when the response has no "error" key. This preserves structured
annotation validation data ({"validation": [...]}) in the exception.

Also replace magic HTTP status code numbers with HTTPStatus constants
in _transport.py and exceptions.py.

FLOW-547

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generalize the chunked upload method so it can be reused for
multiplexed and annotation uploads. The method now takes an endpoint
and a static extra_fields dict instead of sample-specific parameters.

is_last_sample is now sent as a static per-file flag rather than
being computed per-chunk, relying on the API's updated finalization
logic (is_last_sample AND data.is_ready).

FLOW-547

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upload multiplexed reads and an annotation sheet in a single call.
The annotation is uploaded first so reads files are not uploaded if
the annotation is invalid.

- Annotation hard validation errors raise AnnotationValidationError
- Annotation warnings are auto-accepted by default (ignore_warnings=True)
  and included in the result for inspection
- Set ignore_warnings=False to raise AnnotationValidationError on warnings
- Supports single-end and paired-end reads with chunked uploads

FLOW-547

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mhusbynflow
Copy link
Copy Markdown
Collaborator Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude finished @mhusbynflow's task in 1m 42s —— View job


PR Review: Multiplex Upload

Todo:

  • Gather context and read changed files
  • Review flowbio/v2/samples.py (core changes)
  • Review flowbio/v2/exceptions.py
  • Review flowbio/v2/_transport.py
  • Review flowbio/v2/__init__.py
  • Review test files
  • Post comprehensive review

Summary

This PR adds upload_multiplexed_data() for uploading multiplexed reads + annotation sheets, along with AnnotationValidationError, MultiplexedUpload, get_annotation_template(), and transport layer improvements. The implementation is clean and well-tested. A few issues to address below.


Issues

1. Docstring incorrect: BadRequestError vs AnnotationValidationError for warnings

flowbio/v2/samples.py, line 296

The docstring says:

:raises BadRequestError: If ``ignore_warnings=False`` and the annotation has warnings.

But _upload_annotation actually raises AnnotationValidationError in that path (not BadRequestError):

# samples.py:471-473
if not ignore_warnings:
    raise AnnotationValidationError(
        errors=e.message["warnings"],
    ) from e

The test at test_ignore_warnings_false_raises_annotation_validation_error confirms AnnotationValidationError is raised. The docstring should read :raises AnnotationValidationError: here. Fix this →


2. Misleading field name: expected_file_size sends chunk offset, not file size

flowbio/v2/samples.py, line 445

form_data: dict[str, str] = {
    "filename": file_path.name,
    "expected_file_size": str(chunk_index * chunk_size),  # ← this is the byte offset
    ...
}

chunk_index * chunk_size is the byte offset of the start of each chunk (0, 40, 80...), not the file size. The variable file_size is computed on line 430 but never sent. This naming is confusing — if the API field is truly named expected_file_size, the comment is worth noting; if this field is meant to be the chunk start offset, the field should be renamed. Fix this →


3. File reopened on every chunk — minor inefficiency

flowbio/v2/samples.py, lines 450–452

for chunk_index in chunks:
    ...
    with open(file_path, "rb") as f:
        f.seek(chunk_index * chunk_size)
        chunk = f.read(chunk_size)

The file is opened, seeked, and closed on every iteration. For large files with many chunks this adds overhead. Consider opening the file once before the loop and seeking inside:

with open(file_path, "rb") as f:
    for chunk_index in chunks:
        f.seek(chunk_index * chunk_size)
        chunk = f.read(chunk_size)
        ...

Fix this →


4. Type inconsistency in form_data

flowbio/v2/samples.py, line 443

form_data: dict[str, str] = {
    ...
    "is_last": is_last_chunk,   # bool, not str
    "data": data_id,            # str | None, not str
    ...
}

The type annotation says dict[str, str] but the dict contains booleans and None values. httpx will coerce them, but the annotation is misleading and mypy/pyright would flag this. Either change the annotation to dict[str, Any] or explicitly convert values to strings.


Observations / Minor Notes

  • AnnotationValidationError used for both hard errors and warnings — when ignore_warnings=False and warnings are present, the same exception class is used for warnings as for hard validation errors (with the errors attribute containing warning dicts). This is consistent but callers can't easily distinguish "errors" from "warnings" without inspecting the source. Could be documented more explicitly.

  • MultiplexedUpload.warnings: list[dict] — the inner dict structure is untyped. A TypedDict would make the API more self-documenting, though this is low priority.

  • Module docstring (samples.py, lines 1–26) — the quick-start examples only show upload_sample. Now that upload_multiplexed_data is public, a short example there would help discoverability.


Tests

Test coverage is thorough: single-end, paired-end, annotation before reads ordering, chunked upload, validation errors, warning auto-retry, and ignore_warnings=False. The test_uploads_annotation_before_reads test setup has a minor awkwardness (the first .mock() call on ann_route is immediately overwritten by the second .mock() call with the side effect), but functionally the test captures the correct behaviour.


Overall: Good, well-structured addition. The docstring fix (#1) is the only correctness issue; the rest are code quality improvements.

- Fix docstring: `ignore_warnings=False` raises
  `AnnotationValidationError`, not `BadRequestError`
- Open file once before chunk loop instead of reopening per chunk
- Fix `form_data` type annotation to `str | bool | list[str] | None`
- Add comment clarifying `expected_file_size` is a byte offset
@mhusbynflow
Copy link
Copy Markdown
Collaborator Author

@claude changes should have been addressed. Please check

@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude finished @mhusbynflow's task in 48s —— View job


Review of Addressed Feedback

Todo:


Issues #2, #3, and #4 are fully resolved. Issue #1 is partially fixed — one spot still has the wrong exception class.


Remaining Issue

#1 (partial fix): :param ignore_warnings: description still says BadRequestError

The :raises lines (300–302) are now correct:

:raises AnnotationValidationError: If ``ignore_warnings=False``
    and the annotation has warnings.

But the ignore_warnings parameter description on line 295–296 still references the wrong exception:

# samples.py lines 293–296
:param ignore_warnings: If ``True`` (the default), annotation
    warnings are automatically accepted and included in the
    result. If ``False``, warnings cause a
    :class:`BadRequestError` to be raised.   # ← still wrong

Should be :class:AnnotationValidationError`` to be raised.` Fix this →


Resolved Issues

# Issue Status
#2 expected_file_size misleading — added clarifying comment ✅ Fixed
#3 File reopened per chunk — now opened once before loop ✅ Fixed
#4 form_data annotation was dict[str, str] — now dict[str, str | bool | list[str] | None] ✅ Fixed
#1 :raises lines corrected to AnnotationValidationError ✅ Fixed (partial)

One small fix remaining on the ignore_warnings param description.

@mhusbynflow mhusbynflow merged commit 7c6a246 into master Mar 10, 2026
2 checks passed
@mhusbynflow mhusbynflow deleted the multiplex-upload branch March 10, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant