Skip to content

typing: graduate opencontractserver.tasks.* from mypy baseline#1541

Merged
JSv4 merged 16 commits into
mainfrom
claude/fix-issue-1482-8mA5R
May 7, 2026
Merged

typing: graduate opencontractserver.tasks.* from mypy baseline#1541
JSv4 merged 16 commits into
mainfrom
claude/fix-issue-1482-8mA5R

Conversation

@JSv4
Copy link
Copy Markdown
Collaborator

@JSv4 JSv4 commented May 5, 2026

Closes #1482. Continues draining the mypy baseline established in #1331 (tracker: #1447).

Summary

Graduates all sixteen opencontractserver.tasks.* modules out of the mypy baseline. The package is now type-clean.

Removed from mypy.ini — sixteen [mypy-…] blocks:
agent_tasks, analyzer_tasks, badge_tasks, corpus_tasks, data_extract_tasks, doc_analysis_tasks, doc_tasks, embeddings_task, export_tasks, export_tasks_v2, extract_orchestrator_tasks, fork_tasks, import_tasks, import_tasks_v2, lookup_tasks, memory_tasks.

Pruned from docs/typing/mypy_baseline.txt — 227 lines (6813 → 6586).

Notable per-file fixes

  • badge_tasks: dropped the User = get_user_model() runtime-call-as-type pattern in favour of a direct from opencontractserver.users.models import User (mirrors the conftest.py pattern used elsewhere in the project).
  • doc_analysis_tasks: filter Anthropic response.content blocks via hasattr(resp, "text") so list comprehensions over the union type are legal; renamed the accumulator to avoid str | list shadowing.
  • analyzer_tasks: tightened request_gremlin_manifest signature to int and coerced add(*doc_ids) to ints since analyzed_documents rejects str.
  • data_extract_tasks: scoped # type: ignore[valid-type] on the output_type = list[output_type] runtime-parametric pattern; scoped # type: ignore[typeddict-item] on two TypedDict(**dict) expansions that mypy cannot prove cover all required keys.
  • doc_tasks: validate FieldFile.name before Storage.open; thread parent_id through the FUNSD annotation dict so it satisfies the TypedDict; cast page_data.bounds to BoundingBoxPythonType; widen annotation_map keys to int | str to match the declared return type; coerce file_type to FileTypeEnum before get_components_by_mimetype.
  • embeddings_task: cast _create_embedding_for_annotation to the HasEmbeddingMixin-typed callable signature expected by _apply_dual_embedding_strategy; explicit result: dict[str, Any] annotation so subscripted update / append cooperate with the mixed-shape result dict.
  • export_tasks / export_tasks_v2: handle None returns from package_corpus_for_export / package_label_set_for_export with RuntimeError before constructing the TypedDict; replace output_bytes (BytesIO) with ContentFile when calling FieldFile.save; rename inner FUNSD page-data variable to keep its tuple[int, str, str] type distinct from the bytes payload; fix UserExport.error typo (the actual field is errors).
  • import_tasks / import_tasks_v2: swap AbstractBaseUser for the project's concrete User (same pattern as corpuses/folder_service); narrow the V2-only branch so data_json.get(...) returns V2-typed values instead of the V1∩V2 object lower bound; widen aggregator dict to dict[str | int, int] to match import_doc_annotations's return type.
  • fork_tasks: replace implicit Optional[list[str]] = None defaults; guard FieldFile.name accesses; annotate empty folder_map / structural_set_map accumulators.
  • corpus_tasks: drop a stale return summary from a -> None task; coerce the str | int Celery args to int once at function entry; rename inner execution variable to avoid shadowing the outer loop binding; scoped # type: ignore[attr-defined] on Celery's dynamically-attached Signature.s / .si attributes.
  • extract_orchestrator_tasks: early-return when extract_id is None; scoped # type: ignore[attr-defined] on task_func.si.

Cross-cutting type tightening

  • LabelLookupPythonType.{text_labels, doc_labels} narrowed from dict[str | int, AnnotationLabelPythonType] to dict[str, ...] (the producer in utils.etl.build_label_lookups already keys by str(pk)).
  • build_label_lookups(corpus_id: ...) changed from str to int to match every call site.
  • utils.importing.load_or_create_labels and utils.importing.import_doc_annotations now accept Mapping instead of dict, so callers can pass OpenContractDocExport / AnnotationLabelPythonType TypedDicts directly without cast().
  • import_tasks._validate_sidecar_schema and _apply_sidecar_annotations accept str | None for sidecar_path since callers may pass a fallback that is None.

Verification

  • python -m mypy --config-file mypy.ini opencontractserver configSuccess: no issues found in 1032 source files.
  • All 16 graduated modules import cleanly under config.settings.mypy.

Test plan

  • CI passes (mypy / lint / backend tests).
  • Spot-check that the baseline file no longer contains any opencontractserver/tasks/ lines: grep -c "opencontractserver/tasks/" docs/typing/mypy_baseline.txt0.
  • Spot-check mypy.ini no longer contains any [mypy-opencontractserver.tasks.*] blocks.

https://claude.ai/code/session_01Mze1ELoLJu4feUTYWFq7Wt


Generated by Claude Code

…#1482)

Continues draining the mypy baseline established in #1331. Graduates all
sixteen task modules out of the baseline.

Removed from mypy.ini
- opencontractserver.tasks.{agent_tasks, analyzer_tasks, badge_tasks,
  corpus_tasks, data_extract_tasks, doc_analysis_tasks, doc_tasks,
  embeddings_task, export_tasks, export_tasks_v2,
  extract_orchestrator_tasks, fork_tasks, import_tasks, import_tasks_v2,
  lookup_tasks, memory_tasks}

Pruned from docs/typing/mypy_baseline.txt
- 227 lines (6813 -> 6586)

Highlights of the per-file fixes
- badge_tasks: dropped the `User = get_user_model()` runtime-call-as-type
  pattern in favour of a direct ``from opencontractserver.users.models
  import User`` (mirrors the conftest pattern used elsewhere).
- doc_analysis_tasks: filter Anthropic ``response.content`` blocks via
  ``hasattr(resp, "text")`` so list comprehensions on the union type are
  legal; renamed the accumulator to avoid the ``str | list`` shadowing.
- analyzer_tasks: tightened ``request_gremlin_manifest`` signature to
  ``int`` and coerce ``add(*doc_ids)`` to ints since ``analyzed_documents``
  rejects str.
- data_extract_tasks: scoped ``# type: ignore[valid-type]`` on the
  ``output_type = list[output_type]`` runtime-parametric pattern; scoped
  ``# type: ignore[typeddict-item]`` on two ``TypedDict(**dict)``
  expansions that mypy cannot prove cover all required keys.
- doc_tasks: validate FieldFile.name before passing to Storage.open;
  threaded the ``parent_id`` field through the FUNSD annotation dict so
  it satisfies the TypedDict; cast ``page_data.bounds`` to
  BoundingBoxPythonType where the dataclass field is widened to
  ``dict[str, int | float]``; widened ``annotation_map`` keys to
  ``int | str`` to match the function's declared return type; coerced
  ``file_type`` to FileTypeEnum before calling
  ``get_components_by_mimetype``.
- embeddings_task: cast ``_create_embedding_for_annotation`` to the
  ``HasEmbeddingMixin``-typed callable signature expected by
  ``_apply_dual_embedding_strategy``; explicit annotation
  ``result: dict[str, Any]`` so the mixed-shape result dict cooperates
  with subscripted update / append.
- export_tasks(_v2): handled None returns from
  ``package_corpus_for_export`` / ``package_label_set_for_export`` by
  raising RuntimeError before constructing the TypedDict; replaced
  ``output_bytes`` (BytesIO) with ContentFile when calling
  FieldFile.save; renamed the inner FUNSD page-data variable to keep its
  ``tuple[int, str, str]`` type distinct from the bytes payload; fixed
  the ``UserExport.error`` typo (the actual field is ``errors``).
- import_tasks(_v2): swapped ``AbstractBaseUser`` for the project's
  concrete ``User`` (same pattern as corpuses/folder_service); tightened
  the V2-only branch so ``data_json.get(...)`` returns the V2-typed
  values instead of the V1∩V2 ``object`` lower bound; widened
  aggregator dict to ``dict[str | int, int]`` to match
  ``import_doc_annotations``'s declared return type.
- fork_tasks: replaced implicit ``Optional[list[str]] = None`` parameter
  defaults; guarded FieldFile.name accesses; annotated the empty
  ``folder_map`` / ``structural_set_map`` accumulators.
- corpus_tasks: dropped a stale ``return summary`` from the
  ``-> None`` task; coerced the ``str | int`` celery args to ``int``
  once at function entry; renamed the inner ``execution`` loop variable
  to avoid shadowing the outer ``for doc_id, execution in
  execution_map.items()`` binding; scoped ``# type: ignore[attr-defined]``
  on Celery's dynamically-attached ``Signature.s`` / ``.si`` attributes.
- extract_orchestrator_tasks: early-return when ``extract_id`` is None;
  scoped ``# type: ignore[attr-defined]`` on ``task_func.si``.

Cross-cutting type tightening
- ``LabelLookupPythonType.{text_labels, doc_labels}`` narrowed from
  ``dict[str | int, AnnotationLabelPythonType]`` to ``dict[str, ...]``
  (the producer in ``utils.etl.build_label_lookups`` already keys by
  ``str(pk)``). Updated the ``corpus_id`` parameter on
  ``build_label_lookups`` from ``str`` to ``int`` to match every call
  site.
- ``utils.importing.load_or_create_labels`` and
  ``utils.importing.import_doc_annotations`` now accept ``Mapping``
  instead of ``dict``, so callers can pass ``OpenContractDocExport`` /
  ``AnnotationLabelPythonType`` TypedDicts directly without ``cast()``.
- ``import_tasks._validate_sidecar_schema`` and ``_apply_sidecar_annotations``
  accept ``str | None`` for the ``sidecar_path`` argument since the
  caller passes a fallback name that can be None.

Verification
- python -m mypy --config-file mypy.ini opencontractserver config
  -> "Success: no issues found in 1032 source files"
- All 16 graduated modules import cleanly under config.settings.mypy.
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #1541 — Graduate opencontractserver.tasks.* from mypy baseline

Overall this is a high-value PR. Turning off 16 ignore_errors = True blocks in one pass is significant, and the typing exercise uncovered several genuine runtime bugs. The description is unusually thorough, which is appreciated.


Real Bugs Fixed (strong positive)

These are not just type annotation housekeeping — they are actual runtime defects that were hiding behind the baseline:

  • agent_tasks.py: async for event in agent.stream(...)async for event in await agent.stream(...). Missing await on a coroutine returning an async generator is a real async bug.
  • export_tasks_v2.py: export.error = Trueexport.errors = str(e). Wrong attribute name silently swallowed export failure state.
  • export_tasks.py: FieldFile.save(..., BytesIO)FieldFile.save(..., ContentFile(...)). Django's FieldFile.save needs a File-like object, not a raw BytesIO; this could silently corrupt exports depending on the storage backend.
  • doc_analysis_tasks.py: Filtering response.content blocks via hasattr(resp, "text") correctly handles ToolUseBlock items in the Anthropic SDK union, which would otherwise raise AttributeError.
  • doc_tasks.py: Early guard on FieldFile.name before Storage.open turns a cryptic "str | None is not str" crash into a clear ValueError.
  • analyzer_tasks.py: Coercing doc_ids to int before .add() fixes a real ORM mismatch.
  • fork_tasks.py / corpus_tasks.py: Explicit Optional[list[str]] = None and int coercions eliminate silent type widening at Celery boundaries.

Potential Regression: FUNSD Export Key Type

This is the one item I'd want resolved before merge.

In doc_tasks.py, package_doc_for_funsd_export, the key used to be:

page = str(page_data.page_index)   # string key
annotation_map[page] = ...

This PR changes it to:

page = page_data.page_index        # int key
annotation_map[page] = ...

annotation_map is returned and consumed in export_tasks.py:package_funsd_exports, which still does:

if str(index) in funsd_annotations:   # unchanged — looks up string keys

str(0) in {0: [...]} evaluates to False, so FUNSD annotations will be silently omitted from every export after this change. The fix is either:

  • Change export_tasks.py to if index in funsd_annotations:, or
  • Keep string keys and update the dict annotation to dict[str, ...]

Neither direction is wrong; the current state leaves both sides inconsistent.


Minor / Follow-up Items

cast(V2Type, ...) then checking if corpus_export is None (export_tasks_v2.py):

corpus_export = cast(OpenContractCorpusV2Type, package_corpus_for_export(...))
...
if corpus_export is None or label_set_export is None:
    raise RuntimeError(...)

After cast, mypy considers corpus_export non-optional, so the None branch is dead code from a type perspective. The check is still useful defensively but consider reversing the order: check for None first, then cast the non-None value. That way the cast actually reflects what the type system can prove.

# type: ignore[arg-type] on temperature (data_extract_tasks.py line ~444):

temperature=extract_temperature,  # type: ignore[arg-type]

The PR description doesn't mention this one. What is the actual mismatch — is extract_temperature float | None and the parameter float? A narrow guard (if extract_temperature is not None) would be cleaner than silencing the error.

memory_tasks.py# type: ignore[arg-type] on values_list(named=True): Reasonable workaround for a known Django stubs limitation. Consider a brief comment so the next reader doesn't wonder.

Pre-existing mutable default in utils/importing.py:

def load_or_create_labels(..., existing_labels: dict[str, AnnotationLabel] = {}):

Not introduced by this PR, but now that this file is touched: {} as a default is a classic Python footgun. Changing to None and guarding with existing_labels or {} inside the function body would be a safe same-line fix.

process_corpus_action return type (corpus_tasks.py): Dropping return summary from a -> None task is correct per the type annotation, but the PR description should call out that any caller relying on the Celery task result dict will now receive None. A quick grep for .get() on the result of this task would confirm no callers are affected.


Typing Approach

The use of scoped # type: ignore[valid-type] and # type: ignore[typeddict-item] (as opposed to module-level suppression) is exactly the right pattern for genuinely unprovable runtime constructions. The cast("list[Any]", tools) with an explanatory comment about list invariance is also correct and well-documented.

The move from AbstractBaseUser to the concrete User model in TYPE_CHECKING blocks (import_tasks.py, import_tasks_v2.py) is consistent with the project's existing pattern (badge_tasks.py, corpuses/folder_service).


Summary

The PR is in good shape. The FUNSD key regression is the one blocking item — everything else is either clearly correct or a low-risk follow-up. Once that lookup inconsistency is resolved, the baseline reduction and the genuine bug fixes make this well worth merging.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

JSv4 added 3 commits May 4, 2026 22:26
Fixes:
- Apply pyupgrade rewrites to import_tasks_v2.py
- Fix FUNSD annotation_map key type regression: doc_tasks.py now uses int
  keys consistently, export_tasks.py looks up by int (not str(index)),
  and all return/parameter type annotations updated accordingly
- Narrow dict[str|int,...] to dict[str,...] in etl.build_label_lookups
  and build_document_export (keys are always str(pk))
- Cast get_component_by_name() results in embeddings.py, embeddings_task.py
  and doc_tasks.py to the specific subtype (BaseEmbedder / BaseParser /
  BaseThumbnailGenerator) to satisfy mypy
- Append FunsdAnnotationType directly rather than via dict() to preserve
  the TypedDict type
Tests added:
- test_convert_doc_to_funsd_annotation_map_uses_int_keys: regression guard
  that annotation_map keys are int, not str (FUNSD key-type fix)
- test_convert_doc_to_funsd_empty_doc_returns_empty_map: empty-annotation path
- test_export_tasks_typing_fixes.py (new file):
  - FinalizeExportTestCase: ContentFile save round-trip, seekable-buffer behaviour
  - OnDemandPostProcessorsTestCase: RuntimeError on missing export file,
    no-op when post_processors is empty
  - PackageFunsdExportsIntKeyTestCase: int-key annotation lookup (regression),
    empty-annotation pages, multi-page multi-annotation packaging
  - PackageCorpusExportV2ErrorsFieldTestCase: export.errors field populated
    on failure (was non-existent export.error before the fix)
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review — PR #1541: Graduate opencontractserver.tasks.* from mypy baseline

This is a well-executed typing graduation. The mechanical changes are clean, the # type: ignore annotations are correctly scoped, and the PR description is thorough. Importantly, the process surfaced several genuine pre-existing bugs (not just annotation noise), which are fixed here. A few observations below, roughly ordered by severity.


Genuine Bug Fixes (Good Catches)

1. Silent FUNSD annotation loss (doc_tasks.py / export_tasks.py)

The most impactful fix. doc_tasks.py was building annotation_map with int keys but export_tasks.py looked them up via str(index), so every FUNSD export silently produced empty annotation lists. Both sides now use int keys consistently. The regression tests in test_doc_tasks.py and test_export_tasks_typing_fixes.py cover this well.

2. Wrong field name in V2 export error handling (export_tasks_v2.py line ~267)

# Before
export.error = True
# After
export.errors = str(e)

The field is errors: str, not error: bool. The old code was silently swallowed by Django (unknown attribute assignment on a model instance doesn't raise), leaving exports in an inconsistent state with no error message recorded.

3. M2M add receiving strings (analyzer_tasks.py)

# Before
analysis.analyzed_documents.add(*doc_ids)  # doc_ids: list[int | str]
# After
analysis.analyzed_documents.add(*[int(doc_id) for doc_id in doc_ids])

Celery JSON serialisation widens int to str, so this was a latent runtime bug on every Celery-dispatched call.

4. process_corpus_action returning a value from a -> None task (corpus_tasks.py)

Dropped the stale return summary. Celery ignores task return values by default, so this was harmless, but it's a correctness fix.

5. request_gremlin_manifest not guarding None return (analyzer_tasks.py)

Added return analyzer_manifests or []. The original return analyzer_manifests could return None if the Gremlin request produced no results, breaking callers that iterate the list.


One Type-Correctness Nit

export_tasks_v2.pycast before None check (~line 167)

corpus_export = cast(
    OpenContractCorpusV2Type,
    package_corpus_for_export(corpus, v2_format=True),
)
...
if corpus_export is None or label_set_export is None:
    raise RuntimeError(...)

cast is a runtime no-op so this is safe at runtime, but mypy will now infer corpus_export: OpenContractCorpusV2Type (non-optional) and may flag corpus_export is None as always-false, depending on strictness. Cleaner to check for None first, then cast:

_corpus_export = package_corpus_for_export(corpus, v2_format=True)
_label_set_export = package_label_set_for_export(corpus.label_set)
if _corpus_export is None or _label_set_export is None:
    raise RuntimeError(...)
corpus_export = cast(OpenContractCorpusV2Type, _corpus_export)
label_set_export = _label_set_export

This is minor and does not affect runtime behaviour.


Pre-existing Issue Surfaced (Out of Scope, FYI)

Mutable default argument in utils/importing.py

def load_or_create_labels(
    ...
    existing_labels: dict[str, AnnotationLabel] = {},  # ← shared across calls
) -> dict[str, AnnotationLabel]:

This is a Python anti-pattern where the default {} is shared across all call sites that don't pass the argument explicitly. In practice it won't cause problems here if the dict is never mutated through the default (it's only read), but it's fragile. Not introduced by this PR, just noting it.


Minor Notes

  • badge_tasks.py: Switching from get_user_model() to a direct import of User is consistent with how the rest of the project handles this (e.g., conftest.py, corpuses/folder_service). Not objectionable.
  • data_extract_tasks.py: temperature=extract_temperature # type: ignore[arg-type] — the suppression is scoped, but it would be worth confirming extract_temperature is typed as float | None while the parameter expects float. If so, a simple temperature=extract_temperature or 0.0 (or a proper None guard) might be cleaner than the ignore.
  • doc_tasks.py: The [dict(row) for row in corpus_data] materialisation passed to _create_document_processed_notifications is correct (converts ValuesQuerySet rows to plain dicts as the signature expects).
  • Scoped # type: ignore comments: All ignores are narrow (e.g., [attr-defined], [valid-type], [typeddict-item]). That's the right approach.

Test Coverage

The new test file test_export_tasks_typing_fixes.py is solid — it covers the ContentFile wrapping in finalize_export, the RuntimeError guard in on_demand_post_processors, the int-key regression in package_funsd_exports, and the errors field fix in package_corpus_export_v2. The additions to test_doc_tasks.py round out the FUNSD key-type coverage. Good test quality overall.


Summary: The PR achieves its stated goal cleanly and, more importantly, the typing exercise uncovered and fixed several real production bugs. The nit about cast order in export_tasks_v2.py is the only change I'd consider before merge. Everything else looks correct.

- export_tasks_v2.py: reorder so the None check on package_corpus_for_export
  happens before the cast, allowing mypy to actually narrow the type
  (review #1541 nit). Cast is then a runtime no-op for the narrowed value.
- test_export_tasks_typing_fixes.py: black reformat picked up by CI.
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #1541 — Graduate opencontractserver.tasks.* from mypy baseline

This is a solid, well-scoped typing PR. The diff is large but nearly all of it is legitimate type-annotation work. More importantly, the process uncovered several real production bugs that are fixed along the way. Notes below.


Real Bug Fixes (Excellent — these are the most valuable parts of this PR)

1. Silent export-failure tracking (export_tasks_v2.py line ~266)
The old export.error = True referenced a non-existent model field. Django's dynamic attribute assignment meant this was silently setting a throwaway instance attribute — it was never persisted. The UserExport model has an errors = TextField(...) field, not an error boolean. This fix from export.error = Trueexport.errors = str(e) turns a silent failure-masking bug into a properly persisted error string. The regression test in test_export_tasks_typing_fixes.py correctly validates this.

2. FUNSD annotation key type mismatch (doc_tasks.py / export_tasks.py)
convert_doc_to_funsd was building an annotation map with str keys (e.g. '0'), while package_funsd_exports was looking up by integer index (if index in funsd_annotations). This meant all FUNSD annotations were silently dropped during export — a data-loss bug. The int-key fix plus the regression test in test_doc_tasks.py (checking assertIsInstance(key, int)) correctly pins this.

3. ContentFile wrapping in finalize_export (export_tasks.py line ~88)
Passing raw BytesIO to FieldFile.save() is incorrect — Django's FieldFile.save() expects a File or ContentFile wrapper. Some storage backends (S3, GCS) would silently fail or produce corrupt files. Wrapping with ContentFile(output_bytes.read()) is the right idiom.

4. None-return guard for package_corpus_for_export / package_label_set_for_export
Both functions can return None, and the old code was passing them directly into TypedDict fields. Adding the RuntimeError guard before TypedDict construction is correct — it makes the failure loud and attributable rather than silently corrupting the export payload.


Type System Changes (Good)

  • LabelLookupPythonType narrowing from dict[str | int, …] to dict[str, …] is verified correct — the producer (build_label_lookups) always emits str(pk) keys.
  • Mapping instead of dict for load_or_create_labels / import_doc_annotations is the right Liskov substitution (TypedDicts are Mappings, not dicts).
  • # type: ignore comments are all correctly scoped to Celery's dynamic .s() / .si() / .si() attributes, which genuinely lack stubs. No broad suppression.
  • build_label_lookups(corpus_id: int) correcting from str to int aligns with every call site.

Minor Concerns

assert s3 is not None in package_funsd_exports (export_tasks.py):

assert s3 is not None  # set above when STORAGE_BACKEND == "AWS"

Python assert statements are stripped when running with the -O (optimize) flag. In Celery workers this is unlikely but not impossible. Prefer an explicit guard:

if s3 is None:
    raise RuntimeError("S3 client unavailable for AWS storage backend")

This is a minor nit since Celery workers rarely run with -O, but it's more robust.

Celery serialization comment in corpus_tasks.py (~line 187):

# Coerce ids that may have arrived as strings (Celery JSON serialisation
# widens ``int`` to ``str``) ...

Celery's standard JSON serializer actually preserves int types — it doesn't widen them to str. The coercion is still defensively correct (and harmless), but the comment is misleading. If the values are arriving as strings it's more likely due to the caller constructing them from URL params or form data. Suggest updating the comment to something like: # Guard against callers that pass string IDs (e.g. from Celery task args constructed via URL params).

process_analyzer early None guard (corpus_tasks.py ~line 104):
The new ValueError("process_analyzer requires a non-null analyzer") replaces what would have been an AttributeError on .id. This is strictly better, but worth confirming no existing callsite intentionally passes analyzer=None as a no-op. A quick grep of callers to verify would be prudent.


Test Coverage

The new test_export_tasks_typing_fixes.py file covers exactly the four real bugs found:

  • ContentFile wrapping in finalize_export
  • RuntimeError on missing export.file in on_demand_post_processors
  • Int keys in FUNSD annotation map (via package_funsd_exports)
  • export.errors field (vs. the non-existent export.error) in package_corpus_export_v2

The test_doc_tasks.py additions (int-key regression + empty-doc baseline) are also well-targeted. Good test coverage for a typing PR.


Summary

Approve with minor suggestions. The type graduation is clean, # type: ignore uses are correctly scoped, and the PR surfaced four genuine production bugs (silent failure tracking, FUNSD data loss, BytesIO storage incompatibility, and None propagation into TypedDicts). The two actionable suggestions are: (1) replace assert s3 is not None with an explicit RuntimeError guard, and (2) correct the Celery serialization comment in corpus_tasks.py. Neither is a blocker.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review — PR #1541: Graduate opencontractserver/tasks/* from mypy baseline

Overview

This PR finishes the typing graduation of all 16 opencontractserver/tasks/* modules. It removes 16 [mypy-…] baseline blocks from mypy.ini and prunes 227 lines from docs/typing/mypy_baseline.txt. Importantly, several changes in this PR fix real runtime bugs, not just appease mypy.


Real Bug Fixes (beyond pure typing)

These are the most important changes in the PR, each backed by a regression test:

agent_tasks.py — missing await

# Before (silently iterates a coroutine, never yields):
async for event in agent.stream(user_message, store_messages=False):
# After (correct):
async for event in await agent.stream(user_message, store_messages=False):

This was a silent failure: iterating a coroutine object directly produces no events. The fix is correct.

export_tasks_v2.py — wrong field name in exception handler

# Before (AttributeError would shadow the original exception):
export.error = True
# After:
export.errors = str(e)

The double-masking bug (wrong field raises AttributeError which eats the original exception) is now correctly handled and tested.

export_tasks.pyBytesIO passed to FieldFile.save

# Before (FieldFile.save requires a File, not a raw BytesIO):
export.file.save(filename, output_bytes)
# After:
export.file.save(filename, ContentFile(output_bytes.read()))

Correct fix. The test test_finalize_export_reads_from_seekable_buffer also verifies the explicit seek(0) contract.

export_tasks.py — FUNSD annotation_map int/str key mismatch

doc_tasks.convert_doc_to_funsd built annotation_map with int page keys. package_funsd_exports looked up with str(index). The result: all FUNSD annotations were silently dropped. Both sides now use int consistently, and test_annotation_on_page_zero_appears_in_zip_with_int_key is a solid regression guard.


Code Quality — Positive Observations

  • Scoped # type: ignore comments ([attr-defined], [valid-type], [typeddict-item]) are precise rather than blanket, which is the right practice. The Celery .s / .si attribute ignores are unavoidable until upstream adds stubs.
  • Mapping over dict for load_or_create_labels and import_doc_annotations is the idiomatic solution for accepting TypedDicts without cast(). Good call.
  • build_label_lookups(corpus_id: int) correction removes a long-standing mismatch between the declared str type and integer call-sites (lookup_tasks, export_tasks_v2).
  • LabelLookupPythonType narrowing from dict[str | int, …] to dict[str, …] is safe: build_label_lookups already keys via str(pk). Narrowing this removes width from every consumer.
  • _validate_sidecar_schema / _apply_sidecar_annotations accepting str | None for sidecar_path matches how callers pass a fallback that may be None.
  • Guard on FieldFile.name before Storage.open (doc_tasks, fork_tasks) prevents a runtime TypeError for documents without an uploaded file.

Minor Suggestions / Observations

1. import_tasks_v2.py — redundant membership test in walrus comprehension

source_ids: list[int] = [
    new_id
    for old_id in rel_data.get("source_annotation_ids", [])
    if str(old_id) in annot_id_map              # ← redundant
    and (new_id := annot_id_map.get(str(old_id))) is not None
]

The str(old_id) in annot_id_map guard is redundant when the walrus expression already checks is not None. A get on a missing key returns None, so the None check is sufficient:

source_ids: list[int] = [
    new_id
    for old_id in rel_data.get("source_annotation_ids", [])
    if (new_id := annot_id_map.get(str(old_id))) is not None
]

Not a correctness issue, but two dict lookups per element is worth cleaning up.

2. data_extract_tasks.py — unaddressed temperature type ignore

temperature=extract_temperature,  # type: ignore[arg-type]

This suppression isn't mentioned in the PR description. If extract_temperature can be None or an unexpected type, this may be masking a real caller bug. Worth a brief comment explaining why the ignore is needed (e.g., "the function accepts float | str but extract_temperature is narrower/wider than the stub").

3. corpus_tasks.py — removed return summary was dead code

The PR correctly removes return summary from a -> None task. Worth noting this was unreachable dead code (Celery never uses the return value of shared_task functions in fire-and-forget mode). No action needed; just calling it out as a positive cleanup.

4. import_tasks.pyTYPE_CHECKING import alias

if TYPE_CHECKING:
    from opencontractserver.users.models import User as AbstractBaseUser

Aliasing User as AbstractBaseUser is slightly confusing since they are distinct abstractions. Naming it User directly (as done in import_tasks_v2.py) would be clearer. Minor nit.


Test Coverage

The new test_export_tasks_typing_fixes.py (326 lines) covers all four primary bug fixes with good edge cases:

  • FinalizeExportTestCase — file save + seek behaviour
  • OnDemandPostProcessorsTestCase — missing file guard + no-op path
  • PackageFunsdExportsIntKeyTestCase — int-key regression, empty annotation, multi-page
  • PackageCorpusExportV2ErrorsFieldTestCaseerrors field on failure

test_doc_tasks.py additions cover the int-key regression and the empty-annotation base case.

One gap: there are no tests for the await agent.stream() fix in agent_tasks.py. The async streaming path is hard to unit-test, but a simple mock that verifies the awaitable is consumed would prevent a future regression.


Summary

This is a well-executed graduation PR. The real bug fixes (missing await, wrong error field, BytesIO / ContentFile, FUNSD key mismatch) make this more valuable than a pure type-annotation pass. The targeted # type: ignore scoping and the Mapping widening are idiomatic. The test coverage is solid for all behaviour changes. The minor items above are suggestions, not blockers.

Verdict: Approve with optional follow-ups on the redundant membership test and the temperature ignore explanation.

JSv4 added 2 commits May 5, 2026 00:37
…ignore, rename UserModel alias

- import_tasks_v2.py: walrus comprehension drops the redundant
  'str(old_id) in annot_id_map' guard. dict.get returns None for unknown
  keys, so the 'is not None' check on the walrus result is sufficient
  (one dict lookup per element instead of two).
- data_extract_tasks.py: explain why temperature=extract_temperature
  carries '# type: ignore[arg-type]'. The downstream stub declares
  temperature: float; we intentionally pass None for Claude models so
  the temperature=0 override engages. Tracked in #1381.
- import_tasks.py: rename TYPE_CHECKING 'User as AbstractBaseUser' to
  'User as UserModel' for consistency with import_tasks_v2.py. The alias
  is required because the runtime 'User = get_user_model()' would
  otherwise collide with the type-only import.
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #1541 — Typing graduation for opencontractserver.tasks.*

This is a clean, well-scoped typing PR. The baseline-drain approach is sound — removing 227 lines from the baseline and all 16 [mypy-…] ignore blocks in one shot is a meaningful checkpoint. Several genuine bugs were caught by the type-checker in the process; those are worth calling out explicitly.


Real bugs fixed by the type-checker (not just annotation noise)

  1. agent_tasks.py — missing await on agent.stream() (async for event in await agent.stream(...))
    This was a real runtime bug. Without await, the async for iterates over a Coroutine object rather than the AsyncGenerator, which raises TypeError at runtime. The fix is correct.

  2. export_tasks_v2.pyexport.errorexport.errors
    UserExport has no error field. Before this fix, any exception in package_corpus_export_v2 would cause the except-block itself to raise AttributeError, hiding the original error and leaving backend_lock=True forever. The errors field fix + regression test is exactly right.

  3. doc_tasks.py / export_tasks.py — FUNSD annotation_map key type mismatch
    doc_tasks.convert_doc_to_funsd was building annotation_map with page = str(page_data.page_index) (string key), while export_tasks.package_funsd_exports was looking up with if index in funsd_annotations (integer index). After the fix both sides use int keys consistently, and all FUNSD annotations that were previously silently dropped will now be included in exports. The regression test is a good guard.

  4. export_tasks.pyContentFile wrapper on FieldFile.save
    BytesIO doesn't fully satisfy Django's internal File protocol. Wrapping with ContentFile(output_bytes.read()) is the correct fix.

  5. fork_tasks.pyFieldFile.name guard before Storage.open
    An unset FieldFile.name would crash default_storage.open(None). The if old_label_set.icon and old_label_set.icon.name: guard is correct.

  6. analyzer_tasks.pyreturn analyzer_manifests or []
    Prevents a None return from a function typed as list[AnalyzerManifest].

  7. extract_orchestrator_tasks.py — early return on extract_id is None
    Extract.objects.get(pk=None) would raise DoesNotExist/TypeError depending on the DB backend. The explicit guard + logger.error is better.


Type-system tightening — all look correct

  • LabelLookupPythonType.{text_labels, doc_labels} narrowed to dict[str, …] — The producer in build_label_lookups already emits str(pk) keys, so the narrowing matches reality.
  • build_label_lookups(corpus_id: int) — Every call site passes an integer. Fixing the declared type is correct.
  • load_or_create_labels / import_doc_annotations accept Mapping — Correct application of Liskov; TypedDicts satisfy Mapping but not dict.
  • _validate_sidecar_schema / _apply_sidecar_annotations accept str | None for sidecar_path — Correct; callers may pass a fallback None.

Observations / suggestions

1. Remaining # type: ignore — tracked appropriately?

The scoped ignores in data_extract_tasks.py reference issue #1381 inline, which is good. The ignores in import_tasks_v2.py for unpack_label_set_from_export / unpack_corpus_from_export / prepare_import_labels don't have issue references:

labelset_obj = unpack_label_set_from_export(label_set_data, user_obj)  # type: ignore[arg-type]

Are these tracked somewhere, or is the intent to fix them in the same typing-graduation series? If not tracked yet, worth adding a # TODO(#NNNN) reference so they don't go stale.

2. cast("list[Any]", tools) in agent_tasks.py

This silences the invariant-list complaint but also loses type safety for callers passing a list[str]. The real fix is to widen the AgentAPI.for_corpus/for_document tools parameter to Sequence[str | CoreTool | …] (which is covariant), which would let mypy accept list[str] without a cast. If that's tracked under the same umbrella, the cast is fine short-term, but worth noting.

3. assert s3 is not None in export_tasks.py

assert s3 is not None  # set above when STORAGE_BACKEND == "AWS"

assert statements are stripped with python -O. In a production task that runs under Celery, this is equivalent to no guard if optimizations are enabled. Consider a proper guard:

if s3 is None:
    raise RuntimeError("S3 client is None in AWS storage branch — should not happen")

4. corpus_tasks.py — dropped return summary

The return summary removal is correct per the -> None annotation, but it's worth confirming no call site relied on the returned dict. Quick grep in the repo would be sufficient; the callers are Celery delay()/apply_async() invocations so in practice nothing captures the return value, but explicit verification is cheap.

5. Double blank line in CHANGELOG.md — Minor: there are two blank lines after the new entry (lines 10-11 in the diff). One is conventional.


Test coverage

Solid. test_export_tasks_typing_fixes.py is a well-structured new test file with four focused test classes targeting the real bugs. The additions to test_doc_tasks.py for int-keyed annotation_map are exactly the right regression guards. Good to see the tests are exercising the actual data flow (building a real zip, reading it back) rather than just patching everything away.


Summary

Approve with minor suggestions. The type fixes are correct, the real bugs uncovered are all addressed, test coverage is present for the most important behavioral changes, and the approach of scoped # type: ignore with documented reasons is appropriate for the remaining edge cases. The observations above (tracked ignores, cast vs. API widening, assert in production path) are all non-blocking but worth addressing before the baseline drain reaches the tests package.

JSv4 added 2 commits May 5, 2026 22:30
…GELOG

Three follow-ups from the Claude review on PR #1541:

- Replace assert s3 is not None in export_tasks.py with a real
  RuntimeError guard. Asserts are stripped under python -O and
  Celery tasks may run with optimisations enabled in production, so
  the assert effectively no-ops in the AWS branch. The new guard
  fails loud with a descriptive message — defence-in-depth for a
  branch that should be unreachable but covers any future ordering
  changes to the s3 initialisation above.

- Add # TODO(#1482) references to the three # type: ignore
  suppressions in import_tasks_v2.py. The dict<->TypedDict bridging
  is not a permanent state — it goes away once
  unpack_label_set_from_export / unpack_corpus_from_export /
  prepare_import_labels widen their signatures to Mapping[str, Any]
  in the next typing-graduation chunk. The TODO refs ensure these
  ignores can't quietly outlive the issue they belong to.

- Drop the stray double blank line between the new tasks-graduation
  changelog entry and the Docker hardening entry. The project norm
  is one blank line between entries.
@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Code Review

Overview

This PR graduates all 16 opencontractserver.tasks.* modules from the mypy baseline. Beyond typing cleanup, several real bugs were fixed in the process — which is exactly the value of this exercise. The changes are methodical, the scope is well-bounded, and the test coverage directly targets the behavior changes.


Real Bugs Fixed (worth calling out explicitly)

These are not just annotation cleanup — they fix actual runtime or data-correctness problems:

agent_tasks.py — Missing await on agent.stream() (line ~202). The coroutine was never iterated because async for event in <coroutine> silently fails; the fix adds the required await. This was a genuine runtime bug.

export_tasks_v2.pyexport.error = Trueexport.errors = str(e). The field error doesn't exist on UserExport; this would have raised AttributeError in the exception handler, masking the original failure and leaving backend_lock = True forever. Now the actual error string is recorded and the lock is cleared.

FUNSD key type mismatchdoc_tasks.py was building annotation_map with int keys, but export_tasks.py was looking them up with str(index). The result: all FUNSD annotations were silently dropped during export. The fix aligns both sides on int keys and adds regression tests in both test_doc_tasks.py and test_export_tasks_typing_fixes.py. Good catch.

export_tasks.pyexport.file.save(filename, output_bytes) where output_bytes is a BytesIO. Django's FieldFile.save expects a File instance, not a raw BytesIO. Wrapping in ContentFile is correct.

corpus_tasks.py / extract_orchestrator_tasks.py — Coercing Celery str | int args to int at function entry is correct; Celery's JSON serializer can widen integers to strings depending on how tasks are called, and downstream APIs (e.g. bulk_queue, CorpusActionExecution.get) expect concrete int.


Type Narrowing Quality

LabelLookupPythonType narrowing (dict[str | int, …]dict[str, …]) is sound — the producer (build_label_lookups) has always keyed by str(pk). Narrowing corpus_id on build_label_lookups from str to int matches every call site.

Mapping widening for load_or_create_labels / import_doc_annotations is correct and Liskov-friendly. TypedDicts are structurally compatible with Mapping[str, Any].

cast() usage is appropriate and minimal — used for get_component_by_name return types (untyped registry), Celery's list invariance issue, and the V2 corpus type narrowing after a None-check. No broad cast(Any, …) escapes.

# type: ignore scoping is properly scoped with error codes on every instance (e.g. [attr-defined], [valid-type], [typeddict-item]). No bare type: ignore entries.


Minor Issues / Suggestions

import_tasks_v2.pyTODO(#1482) markers: Three ignores are tagged TODO(#1482) referencing the "typing-graduation umbrella." Since this PR is #1482 and those utilities (unpack_label_set_from_export, unpack_corpus_from_export, prepare_import_labels) live outside the tasks package, the TODOs should reference the broader tracker (#1447) or a follow-up issue, not the current one. Minor but will cause confusion when this PR merges.

doc_tasks.py — early ValueError for missing file names:

if not doc.pawls_parse_file.name or not doc.pdf_file.name:
    raise ValueError(f"Document {doc_id} is missing pawls_parse_file or pdf_file")

This is a behavioral improvement (fail-fast with a clear message vs. a confusing TypeError from Storage.open(None)), but worth noting in the CHANGELOG/PR notes since callers that caught the old exception type will need to handle ValueError instead.

memory_tasks.py # type: ignore[arg-type]: The ignore on conversation.chat_messages.filter(...) passed to database_sync_to_async(lambda: list(...)) suppresses a legitimate mypy complaint about QuerySet[Row4] vs the expected iterable type. A brief note explaining why mypy can't infer this (the named-tuple values_list return type is opaque to mypy's Django plugin) would help future readers.

corpus_tasks.py comment style: The comment block added for Celery arg coercion is useful, but per the project style guide in CLAUDE.md, inline comments should be kept to one short line for non-obvious WHY reasoning. Consider condensing to: # Celery JSON serialisation may widen int → str.

data_extract_tasks.py temperature ignore: The # type: ignore[arg-type] # TODO(#1381) is fine as a bridge. Just make sure #1381 is still open and tracked — the described fix (widening temperature to Optional[float]) is straightforward.


Test Coverage

The new tests are solid:

  • test_convert_doc_to_funsd_annotation_map_uses_int_keys directly guards the FUNSD key regression.
  • test_convert_doc_to_funsd_empty_doc_returns_empty_map is a useful boundary case.
  • test_export_tasks_typing_fixes.py tests all four behavior changes (ContentFile, missing-file guard, int-key FUNSD lookup, errors field). The temp-file cleanup with try/finally is correct.

One gap: there's no test for the agent.stream() missing-await fix. That's hard to unit-test (needs a running agent), but a note in the PR description about manual verification would be reassuring.


Summary

High-quality typing graduation PR with real bug fixes bundled in. The three TODO(#1482) markers in import_tasks_v2.py should be updated to reference the correct tracking issue before merge. Everything else looks correct and consistent with project conventions.

JSv4 added 2 commits May 6, 2026 07:58
CI fixes:
- ``extract_thumbnail`` (``doc_tasks.py``): the typing graduation wrapped
  ``get_components_by_mimetype(file_type)`` in ``FileTypeEnum(file_type)``,
  which raised ``ValueError`` because ``FileTypeEnum`` members are short
  labels (``"pdf"``) and ``file_type`` carries the MIME string
  (``"application/pdf"``). The helper already handles MIME strings via
  ``FileTypeEnum.from_mimetype`` internally, so the wrapping is removed
  and the unused import along with it. Restores
  ``test_pdf_thumbnail_extraction`` (3 orientations) and
  ``test_txt_thumbnail_extraction``.

- ``test_package_annotated_docs.py``: setUp now creates a ``LabelSet`` and
  attaches it to the test corpus. The PR's ``RuntimeError`` guard in
  ``export_tasks.py`` (``Failed to package corpus or label set``) tightens
  the contract — passing ``None`` for ``label_set`` is no longer silently
  accepted into ``OpenContractsExportDataJsonPythonType``. Both
  ``test_skips_failed_placeholder_and_packages_successful_docs`` and
  ``test_all_failed_produces_empty_annotated_docs_without_crashing`` now
  set up a complete corpus + label_set fixture.

- ``test_export_tasks_typing_fixes::test_errors_field_populated_on_failure``:
  the test asserted ``package_corpus_export_v2`` swallows the failure,
  but the production code records the error on the export AND re-raises
  so Celery can mark FAILURE. Test now wraps the call in
  ``assertRaises(Corpus.DoesNotExist)`` and pins both the recording side
  effects (``backend_lock`` cleared, ``errors`` populated) and the
  propagation contract.

Review feedback:
- ``import_tasks_v2.py``: three ``# type: ignore[arg-type] # TODO(#1482)``
  markers retargeted to ``TODO(#1447)`` — the lifts they describe live in
  ``utils.importing``, which is not part of this PR's scope (#1482); the
  broader umbrella tracker (#1447) is the correct destination.

- ``memory_tasks.py``: expand the one-line ``[arg-type]`` ignore comment
  into a paragraph explaining *why* mypy needs the suppression
  (``values_list(..., named=True)`` returns a synthesised ``QuerySet[Row*]``
  that the django-stubs Manager descriptor does not propagate through a
  custom Manager).

- ``corpus_tasks.py``: condense the Celery-int-coercion comment to one
  line per the project's terse-comment style.
@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Code Review: PR #1541 — Graduate opencontractserver.tasks.* from mypy baseline

Overall assessment: Strong PR, approve with minor notes.

This is a high-quality typing graduation that goes well beyond cosmetic fixes — it surfaces and repairs several real bugs along the way. The test coverage is excellent. Notes below are mostly observations and one or two minor concerns.


Real bugs fixed (not just type noise)

These are the most valuable changes in the PR:

  • export_tasks_v2.py: export.error = Trueexport.errors = str(e). The previous code would have raised AttributeError inside the except block, masking the original failure entirely and leaving backend_lock set. Critical fix.
  • FUNSD int/str key mismatch (doc_tasks.py + export_tasks.py): doc_tasks.py was building annotation_map with str page index keys, while export_tasks.py looked them up with int(index) — or vice versa. Every FUNSD export silently dropped all annotations. The regression test in test_doc_tasks.py and the PackageFunsdExportsIntKeyTestCase suite are exactly right.
  • doc_analysis_tasks.py: resp.text was being accessed unconditionally on the Anthropic response union type. The hasattr(resp, "text") guard prevents an AttributeError on tool-call or other non-text content blocks.
  • corpus_tasks.py: return summary removed from a -> None task — the return value was silently discarded by Celery anyway, but this was a type violation that could mislead callers.
  • export_tasks.py: BytesIOContentFile for FieldFile.save(). Some storage backends (especially S3 via django-storages) require a proper Django file wrapper; raw BytesIO can fail silently on non-local backends.
  • fork_tasks.py: list[str] = None default is an invalid annotation in strict mode. Fixed to Optional[list[str]] = None.

Type annotation quality

Good patterns used throughout:

  • cast() is used surgically where the runtime guarantees a narrower type than mypy can infer (e.g., after v2_format=True, after is_v2 branch).
  • All # type: ignore comments are scoped ([attr-defined], [valid-type], etc.) with explanatory prose. No bare ignores.
  • Mapping instead of dict in load_or_create_labels / import_doc_annotations is the right call — these functions only read, so widening to Mapping allows TypedDicts to pass without cast().
  • TODO markers with issue numbers (#1381, #1447) for ignores that represent deferred work.

One concern — temperature=extract_temperature # type: ignore[arg-type] in data_extract_tasks.py:

# ``get_structured_response_and_sources_from_document`` currently declares
# ``temperature: float``; we intentionally pass ``None`` for Claude models…
temperature=extract_temperature,  # type: ignore[arg-type]

This is passing None to a float parameter and relying on downstream behaviour (temperature=0 override in _structured_response_raw). The comment is honest about the debt and #1381 is tracked, but this is a semantic landmine for non-Claude models that don't have that override. Worth making sure the issue description covers that risk.


# type: ignore[typeddict-item] on dynamic TypedDict construction

Two instances in data_extract_tasks.py:

span_data: TextSpanData = TextSpanData(**anno_json)  # type: ignore[typeddict-item]
pawls_by_index[pg_ind] = PawlsPagePythonType(**page_obj)  # type: ignore[typeddict-item]

These unpack unvalidated external data (user annotation JSON, PAWLS parse file) into TypedDicts. The except Exception blocks immediately after handle parse failures correctly, so this is safe in practice. But consider whether a TypedDict(**data) spread is the right idiom here vs. explicit field extraction — if the JSON ever contains unexpected extra keys, some TypedDict implementations will silently discard them. Leaving this as-is is fine for now, just noting the pattern.


Test coverage

The test suite additions are thorough:

  • test_export_tasks_typing_fixes.py (328 lines): covers finalize_export, on_demand_post_processors, the FUNSD int-key regression, and the V2 errors field fix. The _make_tiny_zip helper is clean and reusable.
  • test_doc_tasks.py: two new cases — int-key regression guard and empty-document baseline.
  • test_package_annotated_docs.py: fixture update to attach a LabelSet so the new RuntimeError guard doesn't break existing tests.

The test_finalize_export_reads_from_seekable_buffer test is a nice edge-case guard — it verifies the seek(io.SEEK_SET) inside finalize_export actually does what it claims even when the caller doesn't rewind.

Minor: PackageCorpusExportV2ErrorsFieldTestCase uses corpus_pk 999_999_999 to trigger Corpus.DoesNotExist. This is fine but slightly fragile — the test would silently pass even if the assertion changed, since the assertRaises block catches the exception before refresh_from_db runs. The current structure is correct (the with block catches, then we assert on the DB state outside), so this is just an observation.


Nit: import ordering in import_tasks.py

if TYPE_CHECKING:
    from opencontractserver.annotations.models import AnnotationLabel

    # Aliased to avoid colliding with the runtime ``User = get_user_model()``
    # below; matches the pattern in ``import_tasks_v2.py``.
    from opencontractserver.users.models import User as UserModel
    from opencontractserver.utils.metadata_file_parser import DocumentMetadata

The alias comment is helpful. The OpenContractDocExport import was removed from the runtime imports — make sure that doesn't surface as a NameError anywhere else in the file that wasn't updated (mypy should catch it, but worth a grep).


Summary

Area Status
Type correctness ✅ Clean — all 16 modules graduate to zero mypy errors
Real bug fixes ✅ At least 7 substantive fixes, not just cosmetic
Test coverage ✅ Good regression suite, covers the non-obvious bugs
# type: ignore hygiene ✅ Scoped and documented; deferred work tracked with issue numbers
Behavioural regressions risk Low — changes are either additive guards or tightening already-required invariants
Concern temperature=None passed as float (issue #1381) — deferred but documented

This is exactly the kind of typing work that pays dividends: the mypy pass found real bugs that wouldn't have been caught in normal testing. Approving.

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Code Review: PR #1541 — Graduate opencontractserver.tasks.* from mypy baseline

Overall this is a high-quality typing PR. The consistent # type: ignore[…] scoping, the explanatory comments on each suppression, and the test coverage for the runtime bugs uncovered here all set a strong precedent for the rest of the graduation effort.


Real Bugs Fixed (highlights — these alone justify the PR)

1. Missing await on agent.stream() (agent_tasks.py)

# Before
async for event in agent.stream(user_message, store_messages=False):
# After
async for event in await agent.stream(user_message, store_messages=False):

This was a genuine async bug that mypy caught — iterating over a Coroutine directly rather than the AsyncGenerator it resolves to. Without the await, the async for block would have silently no-op'd, producing empty agent responses. Good catch.

2. UserExport.errorUserExport.errors (export_tasks_v2.py)
This field-name typo would have silently swallowed error state on V2 export failures. The fix is correct.

3. FUNSD annotation_map key type inconsistency (doc_tasks.py / export_tasks.py)
doc_tasks.convert_doc_to_funsd was building annotation_map with str page-index keys (page = str(page_data.page_index)) while export_tasks.package_funsd_exports was looking up with str(index). The fix switches both to int. The regression test in test_export_tasks_typing_fixes.py::PackageFunsdExportsIntKeyTestCase is exactly right. Note: the corresponding test in test_doc_tasks.py::test_convert_doc_to_funsd_annotation_map_uses_int_keys expects int keys — make sure export_tasks.py's lookup (if index in funsd_annotations) matches (it does post-fix).

4. BytesIOContentFile in finalize_export (export_tasks.py)
FieldFile.save requires a File-like object; passing a raw BytesIO works on local storage but fails silently (or errors) on cloud backends. Wrapping with ContentFile is the correct fix.


Type Improvements Worth Highlighting

LabelLookupPythonType key narrowing (str | intstr)
The producer utils.etl.build_label_lookups already uses str(pk) as keys, so narrowing the TypedDict is correct. The corresponding corpus_id: str → int fix on build_label_lookups resolves the callers in export_tasks_v2.py and lookup_tasks.py without casts.

Mapping acceptance in load_or_create_labels / import_doc_annotations
Widening from dict to Mapping lets TypedDicts pass directly, eliminating cast() at every call site. Good DRY improvement.

request_gremlin_manifest parameter narrowing (str | intint) — minor concern
The defensive str | int was there historically because Celery can serialize task arguments as JSON and widen int → str on some configurations. The PR addresses this pattern explicitly for corpus_tasks.py (int_document_ids = [int(d) for d in document_ids]) but not here. If Celery is configured to pass str IDs to this task, the narrowed signature will fail at runtime (though mypy is now satisfied). Consider adding an int(gremlin_id) coercion inside the function body for defensive robustness:

def request_gremlin_manifest(gremlin_id: int) -> list[AnalyzerManifest]:
    gremlin_id = int(gremlin_id)  # defensive coercion — Celery JSON may widen int→str

# type: ignore audit

All suppressions are scoped (e.g. [attr-defined], [valid-type], [arg-type]) and annotated with either a reason or a TODO(#1447) back-reference. The pattern is consistent and readable. A few specific notes:

  • corpus_tasks.py:516 — # type: ignore[misc] on the lambda: the underlying issue is that mypy can't infer the lambda type with a captured loop variable. Registering the callback as a named helper function would eliminate the suppression entirely, but it's a minor style point for a Celery on_commit callback.
  • data_extract_tasks.pytemperature=extract_temperature # type: ignore[arg-type] with tracker Anthropic models silently fail in doc_extract_query_task — return tool calls/thinking instead of structured output #1381 is correct hygiene. No action needed here.
  • import_tasks_v2.py — several # TODO(#1447) suppressions around dict/TypedDict widening. These are well-explained and scoped. Acceptable incremental technical debt.

cast() usage

cast("list[Any]", tools) in agent_tasks.py (lines 389, 1174) works around list invariance. The underlying fix would be to change AgentAPI.for_corpus / for_document to accept Sequence[str | CoreTool | Callable] (which is covariant). That's a follow-up for when the agent module graduates (as mypy itself suggested in the baseline error). The cast is safe here since list[str] is an Iterable[str | CoreTool | Callable].

cast(type[BaseParser], ...) / cast(type[BaseEmbedder], ...) / cast(type[BaseThumbnailGenerator], ...) — all correct, since get_component_by_name returns Any and the cast narrows to the expected class.


process_corpus_action return type change

The function was annotated -> None but had a stray return summary that made mypy report it as invalid. Removing the return is the correct fix. Callers (process_corpus_action.delay(...)) don't use the return value of a Celery task directly, so this is safe. Worth noting in case any caller was using .get() on the AsyncResult and expecting the dict.


Test Coverage Assessment

Bug Test
int-keyed annotation_map in convert_doc_to_funsd test_doc_tasks.py::test_convert_doc_to_funsd_annotation_map_uses_int_keys
FUNSD int-key lookup in package_funsd_exports test_export_tasks_typing_fixes.py::PackageFunsdExportsIntKeyTestCase
ContentFile in finalize_export test_export_tasks_typing_fixes.py::FinalizeExportTestCase
RuntimeError on missing export file test_export_tasks_typing_fixes.py::OnDemandPostProcessorsTestCase
UserExport.errors field fix Not directly tested (V2 export happy-path test would exercise this)
Missing await on agent.stream() Not tested (async agent task — harder to unit test, but worth an integration test)

The missing-await fix is the highest-impact unfixed runtime bug from a test-coverage perspective. Even a minimal test that mocks agent.stream to return an AsyncGenerator and asserts the streaming loop is entered would prevent regression.


Minor nits

  1. doc_tasks.py — new guard raises ValueError before the PAWLS file open. The error message says "missing pawls_parse_file or pdf_file" but only pawls_parse_file is used below. Consider narrowing the guard to only check pawls_parse_file.name (the only field actually opened at that point), or document that pdf_file is checked as a pre-condition for the caller.

  2. fork_tasks.pystructural_set_map: dict[int, StructuralAnnotationSet] is now explicitly typed. The multiline parenthesised assignment structural_set_map: dict[int, StructuralAnnotationSet] = (\n {}\n) is stylistically unusual; a single-line structural_set_map: dict[int, StructuralAnnotationSet] = {} is cleaner.

  3. import_tasks_v2.py::_import_document_with_annotations return type: changed from dict[str, int] to dict[str | int, int]. This correctly matches import_doc_annotations's declared return type. The aggregator all_annot_id_maps: dict[str | int, int] follows. Good.


Verdict

Approve with the optional follow-up suggestions above. The PR is production-ready — it fixes real runtime bugs, all # type: ignore suppressions are scoped and justified, and the new tests are targeted regression guards. The request_gremlin_manifest Celery coercion note and the missing await test are the only items worth considering before merge.

JSv4 added 2 commits May 6, 2026 23:50
After widening label_data_dict to Mapping[str, Mapping[str, Any]] in
load_or_create_labels, calling .copy() on the inner Mapping fails because
Mapping is read-only. Build a mutable dict via dict(label_data) instead
and rename the local to label_data_copy for clarity.
@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Code Review: typing — graduate opencontractserver.tasks.* from mypy baseline

Overview

Solid graduation PR that removes 16 ignore_errors = True blocks from mypy.ini and prunes 227 lines from the baseline. Importantly, several of the type errors turn out to correspond to real runtime bugs that the type system was silently hiding. The scope and approach are appropriate.


Real Bug Fixes (worth calling out explicitly)

  1. Missing await in agent_tasks.py (line ~345 in diff):

    # Before (silently returns a coroutine, iterates nothing):
    async for event in agent.stream(...)
    # After (correct):
    async for event in await agent.stream(...)

    This is a significant async correctness fix. Silent data loss (no events processed) would only surface at runtime.

  2. Wrong field name in export_tasks_v2.py:

    # Before (AttributeError silently masked the original exception):
    export.error = True
    # After:
    export.errors = str(e)

    The original except block would have raised AttributeError instead of recording the failure, causing Celery to see a different error than the actual one.

  3. BytesIOContentFile in export_tasks.py:

    export.file.save(filename, ContentFile(output_bytes.read()))

    BytesIO isn't compatible with FieldFile.save's File[Any] expectation. This is a real bug, not just a type annotation fix.

  4. FUNSD annotation_map key type (strint):
    The doc_tasks.py producer now uses int keys and export_tasks.py consumer uses if index in funsd_annotations (int lookup). Prior code would silently drop all FUNSD annotations. The regression test test_convert_doc_to_funsd_annotation_map_uses_int_keys correctly validates this.

  5. Spurious return summary removed from corpus_tasks.process_corpus_action: The function was declared -> None but returned a dict. Callers that expected a dict would have gotten None in a hypothetical future refactor.


Concerns

Comment verbosity

The project's CLAUDE.md specifies: "Never write multi-line docstrings or multi-line comment blocks — one short line max." Several # type: ignore suppressions are preceded by multi-line explanation blocks that violate this:

  • memory_tasks.py: 6-line block explaining the QuerySet[Row*] inference issue
  • data_extract_tasks.py: 7-line block explaining the temperature argument
  • doc_tasks.py: 8-line block around get_components_by_mimetype
  • import_tasks_v2.py: multiple 3-4 line blocks for the dictTypedDict mismatch

The underlying reasoning is non-obvious and worth preserving, but it should be collapsed. For example:

# type: ignore[arg-type]  # temperature accepts None for Claude models; widen upstream in #1381

Consider trimming these to the essential one-liner before merge.

cast() vs. proper interface fixes in embeddings_task.py

embed_func=cast(
    "Callable[[HasEmbeddingMixin, BaseEmbedder, str], bool]",
    _create_embedding_for_annotation,
),

_create_embedding_for_annotation takes an Annotation (a concrete subtype of HasEmbeddingMixin), not HasEmbeddingMixin itself. This is a covariant/contravariant mismatch — the cast is safe in practice since Annotation IS a HasEmbeddingMixin, but the correct fix would be to widen _create_embedding_for_annotation's first parameter to HasEmbeddingMixin. The cast papers over the issue rather than fixing it. Suggest adding a TODO comment referencing a follow-up issue.

badge_tasks.py — direct User import

from opencontractserver.users.models import User

This replaces get_user_model() with a concrete import. The project uses a custom User model and the CLAUDE.md mentions the conftest.py pattern. This is internally consistent, but means badge_tasks now hard-codes the concrete model rather than respecting Django's AUTH_USER_MODEL setting. If the project ever swaps the user model (unlikely but possible), this module would break silently. A comment noting this deliberate choice would help future readers.

Early return in extract_orchestrator_tasks.run_extract

if extract_id is None:
    logger.error("run_extract called without an extract_id — aborting")
    return

Returning silently from a task that received no extract ID means the calling code gets no feedback. Consider raising an exception instead so Celery marks the task as FAILED and the issue is visible in monitoring. The signature already documents Optional[str | int] as the input type, so None is technically valid input — but silently aborting is operationally invisible.


Type Correctness

  • LabelLookupPythonType narrowing (dict[str | int, ...]dict[str, ...]): Correct — build_label_lookups already emits str(pk) keys. Good cross-cutting fix.
  • Mapping instead of dict in load_or_create_labels / import_doc_annotations: Correct. Accepting Mapping is more Liskov-correct for read-only lookup functions.
  • request_gremlin_manifest return fix (or [] guard): Correct. The return type was list[...] but the value could be None.
  • Celery str | int coercions in corpus_tasks: Correct defensive pattern for Celery JSON serialization widening.
  • TODO(#1447) pattern for deferred dict → TypedDict fixes in import_tasks_v2.py: Acceptable as tracked debt.

Test Coverage

The new tests are well-structured and directly target the real bugs found:

  • FinalizeExportTestCaseContentFile fix
  • OnDemandPostProcessorsTestCasefile.name guard
  • PackageFunsdExportsIntKeyTestCase — int-key regression (including multi-page case)
  • PackageCorpusExportV2ErrorsFieldTestCaseerrors field fix
  • test_convert_doc_to_funsd_annotation_map_uses_int_keys — FUNSD key type regression
  • test_convert_doc_to_funsd_empty_doc_returns_empty_map — empty-doc baseline

One gap: there are no tests for the await agent.stream() fix in agent_tasks.py. This is understandable (async Celery task tests are complex to set up) but it is the most impactful bug fix in the PR. Consider adding at minimum a unit test that mocks agent.stream as a coroutine returning an async generator and verifies events are consumed.


Summary

Recommend: Approve with minor requests

  • Trim multi-line type: ignore comment blocks to single-line (project style)
  • Consider raising instead of returning silently in run_extract when extract_id is None
  • Add a TODO comment on the embeddings_task cast() calls noting the proper fix
  • Optionally add a test for the await agent.stream() fix

The bug fixes (especially the missing await and the wrong field name) alone justify merging this PR. The type coverage improvements are a bonus.

Targets the previously-uncovered patch lines so codecov ticks above 90%:
- ``pii_highlighter_claude``: new tests directly exercise the
  ``hasattr(resp, "text")`` filter and the ``claude_blocks`` rename.
- ``run_extract``: explicit guard test for ``extract_id is None``.
- ``process_analyzer``: ValueError guard test when analyzer is None.
- ``convert_doc_to_funsd``: ValueError test when pawls/pdf files missing.
- ``package_funsd_exports``: AWS S3 + GCS storage-backend branches.
- ``package_corpus_export_v2``: RuntimeError when corpus packager
  returns None (covers the new guard and the errors-field error path).
- ``package_annotated_docs``: RuntimeError tests for None packager
  return values.
- ``_setup_corpus_and_labels``: RuntimeError tests when label-set or
  corpus unpack returns None.
@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Code Review — PR #1541: Graduate opencontractserver.tasks.* from mypy baseline

Overall this is a high-quality typing PR. The scope is well-defined, the baseline pruning is clean, and the most important finding is that several type errors were masking real production bugs — not just annotation issues. The test additions are targeted and well-structured. A few items below deserve attention before merge.


Real Bug Fixes Uncovered (High Priority)

These deserve explicit call-out because they affect runtime behavior, not just type-checking:

1. Missing await in agent_tasks.py (line ~204)

-async for event in agent.stream(user_message, store_messages=False):
+async for event in await agent.stream(user_message, store_messages=False):

This was a silent bug: agent.stream() returns a Coroutine[..., AsyncGenerator], not the generator itself. Iterating the coroutine without await would silently yield nothing, meaning agent responses were never streamed. The fix is correct and critical.

2. FUNSD annotation map key type (doc_tasks.py / export_tasks.py)
The annotation map changed from str-keyed to int-keyed, and export_tasks.py was previously looking up via str(index). This means all FUNSD exports were silently omitting every annotation prior to this fix. The regression tests in test_export_tasks_typing_fixes.py and test_doc_tasks.py confirm the fix. Verify that the export_tasks.py lookup also uses int keys now (not just that the dict is typed correctly).

3. UserExport.errors typo in export_tasks_v2.py

-export.error = True   # AttributeError silently swallowed in except handler
+export.errors = ...   # correct field name

The error handler was raising AttributeError instead of recording the failure, and the original exception was lost. The regression test (test_errors_field_populated_on_failure) correctly covers this.

4. hasattr(resp, "text") filter in doc_analysis_tasks.py

-claude_response = [resp.text for resp in claude_blocks]        # crashes on ToolUseBlock
+claude_text_parts = [resp.text for resp in claude_blocks if hasattr(resp, "text")]

Without this guard, any Anthropic response that included a ToolUseBlock would raise AttributeError. The PII highlighter test suite is solid here.


Correctness Concerns

5. LabelLookupPythonType narrowing — breaking change risk

-text_labels: dict[str | int, AnnotationLabelPythonType]
+text_labels: dict[str, AnnotationLabelPythonType]

The PR description says the producer always emits str(pk) keys, which is true in build_label_lookups. But there are three things to verify:

  • Are there any callers that write to a LabelLookupPythonType with integer keys? A dict[str, ...] rejects d[42] = val at the type level but not at runtime — so if any caller is still passing ints, it would silently produce the wrong lookup behavior.
  • import_tasks_v2 is listed as widening its aggregator dict to dict[str | int, int] — confirm this doesn't create a mismatch with the narrowed LabelLookupPythonType downstream.

6. build_label_lookups(corpus_id: str → int) — breaking API change

All internal call sites are int, which is correct. But this is a public utility function with a documented str signature. Any external integration that was passing "123" will now fail. A note in the changelog about this breaking change would help. (The CHANGELOG entry mentions it but only in passing.)

7. process_corpus_action drops its return summary

-    return summary

This task previously returned {"actions_processed": ..., "executions_queued": ...}. If any caller uses .get() on the Celery result (e.g., in synchronous flows or tests), it now gets None. A quick grep for callers consuming the return value would confirm this is safe.

8. finalize_export buffer seek behavior

The test test_finalize_export_reads_from_seekable_buffer only checks bool(self.export.file.name) — it confirms a file was saved but not that the complete file content was saved after advancing the buffer position. If finalize_export doesn't explicitly buf.seek(0) before wrapping in ContentFile, partial bytes would be saved silently. Consider strengthening this test to verify file size or checksum matches the full buffer.


Code Quality

Positive patterns:

  • All # type: ignore comments are scoped to specific error codes ([attr-defined], [valid-type], etc.) rather than bare # type: ignore. This is exactly right.
  • The cast("list[Any]", tools) in agent_tasks.py is accompanied by a comment explaining the invariance constraint — well done.
  • Renaming result → failure_record, execution → doc_execution eliminates shadowing correctly without over-engineering.
  • Mapping widening in load_or_create_labels / import_doc_annotations is a clean DRY improvement — TypedDicts can now be passed directly.
  • from opencontractserver.users.models import User replacing User = get_user_model() in badge_tasks.py / import_tasks.py is the correct pattern (consistent with conftest.py).

Minor nits:

  • corpus_tasks.py keeps a comment block where the 16 [mypy-…] blocks were removed:

    # --- opencontractserver.tasks ---
    #
    # Issue #1482 graduated all 16 modules out of the baseline. The package is
    # now type-clean.

    This comment will silently rot. The PR description and CHANGELOG already document this; consider removing the mypy.ini comment entirely to avoid future confusion.

  • The data_extract_tasks.py temperature=extract_temperature # type: ignore[arg-type] comment block (7 lines of explanation) is thorough but longer than CLAUDE.md recommends for inline comments. The issue tracker reference #1381 is sufficient — the rest belongs in the commit/PR description.


Test Coverage

The test additions are well-targeted:

  • test_corpus_tasks_typing_coverage.py — guards analyzer=None (new ValueError)
  • test_doc_tasks.py — int key regression, empty-doc path, missing-files ValueError
  • test_export_tasks_typing_fixes.py — ContentFile save, RuntimeError guards, FUNSD regression with AWS/GCP storage variants
  • test_extract_tasks.pyrun_extract None guard
  • test_import_tasks_v2_typing_coverage.py — None from unpackers
  • test_pii_highlighter_claude_typing.py — exhaustive block-filter coverage

One gap: test_analyze_tasks.py — the mark_analysis_complete int() coercion fix (converting str doc IDs to int) and the request_gremlin_manifest or [] guard don't appear to have dedicated tests. Consider adding a test that passes string doc IDs to mark_analysis_complete to confirm the coercion works correctly.


Summary

The PR is merge-ready with one important verification item:

  • Confirm the export_tasks.py FUNSD lookup also uses int keys (not just typed as int), since the annotation map key type changed in doc_tasks.py. The regression tests suggest it does, but the actual export_tasks.py diff wasn't visible in the review.

The await fix in agent_tasks.py, the UserExport.errors typo fix, and the FUNSD key fix are the most impactful changes here — they correct real silent failures that type-checking correctly surfaced.

@JSv4 JSv4 merged commit 24a9b29 into main May 7, 2026
14 checks passed
@JSv4 JSv4 deleted the claude/fix-issue-1482-8mA5R branch May 7, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typing: graduate opencontractserver.tasks.* from mypy baseline

2 participants