Skip to content

fix: do not post_process before finally in ModelOutputThunk.astream#580

Merged
psschwei merged 4 commits intogenerative-computing:mainfrom
psschwei:drop-finally
Mar 6, 2026
Merged

fix: do not post_process before finally in ModelOutputThunk.astream#580
psschwei merged 4 commits intogenerative-computing:mainfrom
psschwei:drop-finally

Conversation

@psschwei
Copy link
Member

@psschwei psschwei commented Mar 5, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

  • Link to Issue: Do not post_process before finally in ModelOutputThunk.astream #577

  • Removed the finally block in ModelOutputThunk.astream that called post_process even when an exception occurred during generation

  • Exceptions from generation now propagate directly instead of being swallowed by secondary failures in post_process

  • post_process and output parsing now only run on the successful completion path

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Mar 5, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@psschwei psschwei marked this pull request as draft March 5, 2026 01:43
psschwei added 2 commits March 4, 2026 20:47
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@psschwei
Copy link
Member Author

psschwei commented Mar 5, 2026

@nrfulton feel free to treat this with full skepticism

@jakelorocco
Copy link
Contributor

I think this looks good and is a return to what we had before. Let's do a full clean install and test run to make sure the changes to core didn't mess anything up. Also, I believe the reason the always post_process change was made was to accommodate telemetry span closure, we should make sure this doesn't cause exceptions / errors there.

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@psschwei
Copy link
Member Author

psschwei commented Mar 5, 2026

Well, trying to run the tests crashed my laptop, so that's not promising... need to look into why

@psschwei
Copy link
Member Author

psschwei commented Mar 5, 2026

I ran the vllm tests remotely and they all passed. Well, technically I ran uv run pytest, so everything that could run in our remote environment did. No test failures, everything either passed or skipped (skipped ones all seem to have been either because they were marked always skip or because they needed ollama).

full logs:

============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 470 items

======================================================================
Heavy GPU Test Process Isolation Active
======================================================================
Running 22 heavy GPU test module(s) in separate processes
to ensure GPU memory is fully released between modules.


[1/22] Running: /remote/docs/examples/aLora/101_example.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/aLora/101_example.py::101_example.py PASSED                [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 passed in 54.52s ==============================
[W305 18:55:26.352945863 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/aLora/101_example.py

[2/22] Running: /remote/docs/examples/aLora/102_example.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/aLora/102_example.py::102_example.py SKIPPED (uncondit...) [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 skipped in 5.07s ==============================
[W305 18:55:35.902405995 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/aLora/102_example.py

[3/22] Running: /remote/docs/examples/aLora/example_readme_generator.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item / 1 error

==================================== ERRORS ====================================
_______ ERROR collecting docs/examples/aLora/example_readme_generator.py _______
docs/examples/aLora/example_readme_generator.py:6: in <module>
    generate_readme(
cli/alora/readme_generator.py:268: in generate_readme
    m = start_session()
        ^^^^^^^^^^^^^^^
mellea/stdlib/session.py:187: in start_session
    backend = backend_class(model_id, model_options=model_options, **backend_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/backends/ollama.py:85: in __init__
    raise Exception(err)
E   Exception: could not create OllamaModelBackend: ollama server not running at None
------------------------------- Captured stdout --------------------------------
�[31;20m=== 18:55:39-ERROR ======
could not create OllamaModelBackend: ollama server not running at None�[0m
=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
=========================== short test summary info ============================
ERROR docs/examples/aLora/example_readme_generator.py - Exception: could not ...
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 1.76s ===============================
[W305 18:55:40.315207687 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✗ Module failed: /remote/docs/examples/aLora/example_readme_generator.py

[4/22] Running: /remote/docs/examples/aLora/make_training_data.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/aLora/make_training_data.py::make_training_data.py SKIPPED [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 skipped in 1.29s ==============================
[W305 18:55:43.153322853 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/aLora/make_training_data.py

[5/22] Running: /remote/docs/examples/aLora/stembolts_intrinsic.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/aLora/stembolts_intrinsic.py::stembolts_intrinsic.py SKIPPED [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 skipped in 1.52s ==============================
[W305 18:55:47.194924197 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/aLora/stembolts_intrinsic.py

[6/22] Running: /remote/docs/examples/intrinsics/answer_relevance.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/answer_relevance.py::answer_relevance.py PASSED [100%]

=============================== warnings summary ===============================
.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285
  /remote/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
======================== 1 passed, 1 warning in 53.85s =========================
[W305 18:56:46.328156176 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/answer_relevance.py

[7/22] Running: /remote/docs/examples/intrinsics/answerability.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/answerability.py::answerability.py PASSED       [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 passed in 28.36s ==============================
[W305 18:57:19.244868382 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/answerability.py

[8/22] Running: /remote/docs/examples/intrinsics/citations.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/citations.py::citations.py PASSED               [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 passed in 29.24s ==============================
[W305 18:57:52.175508878 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/citations.py

[9/22] Running: /remote/docs/examples/intrinsics/context_relevance.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/context_relevance.py::context_relevance.py PASSED [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 passed in 28.40s ==============================
[W305 18:58:25.766422279 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/context_relevance.py

[10/22] Running: /remote/docs/examples/intrinsics/hallucination_detection.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/hallucination_detection.py::hallucination_detection.py PASSED [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 passed in 36.00s ==============================
[W305 18:59:05.671154334 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/hallucination_detection.py

[11/22] Running: /remote/docs/examples/intrinsics/intrinsics.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/intrinsics.py::intrinsics.py PASSED             [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
========================= 1 passed in 75.32s (0:01:15) =========================
[W305 19:00:24.797084174 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/intrinsics.py

[12/22] Running: /remote/docs/examples/intrinsics/query_rewrite.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/intrinsics/query_rewrite.py::query_rewrite.py PASSED       [100%]

=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
============================== 1 passed in 28.28s ==============================
[W305 19:00:56.019855949 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/intrinsics/query_rewrite.py

[13/22] Running: /remote/docs/examples/mify/rich_document_advanced.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

docs/examples/mify/rich_document_advanced.py::rich_document_advanced.py SKIPPED [100%]

=============================== warnings summary ===============================
<frozen abc>:106
  <frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.

.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490
.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490
.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490
  /remote/.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
    for ann in item.annotations

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
  • mify.py: Ollama not available (port 11434 not listening)
  • rich_table_execute_basic.py: Ollama not available (port 11434 not listening)
================== 1 skipped, 4 warnings in 93.17s (0:01:33) ===================
[W305 19:02:34.749979509 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/docs/examples/mify/rich_document_advanced.py

[14/22] Running: /remote/test/backends/test_huggingface.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 19 items

test/backends/test_huggingface.py::test_adapters PASSED                  [  5%]
test/backends/test_huggingface.py::test_system_prompt PASSED             [ 10%]
test/backends/test_huggingface.py::test_constraint_lora_with_requirement PASSED [ 15%]
test/backends/test_huggingface.py::test_constraint_lora_override PASSED  [ 21%]
test/backends/test_huggingface.py::test_constraint_lora_override_does_not_override_alora PASSED [ 26%]
test/backends/test_huggingface.py::test_llmaj_req_does_not_use_alora PASSED [ 31%]
test/backends/test_huggingface.py::test_instruct PASSED                  [ 36%]
test/backends/test_huggingface.py::test_multiturn PASSED                 [ 42%]
test/backends/test_huggingface.py::test_chat PASSED                      [ 47%]
test/backends/test_huggingface.py::test_format PASSED                    [ 52%]
test/backends/test_huggingface.py::test_generate_from_raw PASSED         [ 57%]
test/backends/test_huggingface.py::test_generate_from_raw_with_format PASSED [ 63%]
test/backends/test_huggingface.py::test_async_parallel_requests PASSED   [ 68%]
test/backends/test_huggingface.py::test_async_avalue PASSED              [ 73%]
test/backends/test_huggingface.py::test_generate_with_lock PASSED        [ 78%]
test/backends/test_huggingface.py::test_generate_with_lock_does_not_block_when_awaiting_value PASSED [ 84%]
test/backends/test_huggingface.py::test_streaming_error_with_intrinsics PASSED [ 89%]
test/backends/test_huggingface.py::test_error_during_generate_with_lock PASSED [ 94%]
test/backends/test_huggingface.py::test_assert_correct_adapters PASSED   [100%]

=============================== warnings summary ===============================
test/backends/test_huggingface.py::test_generate_with_lock
  /remote/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 19 passed, 1 warning in 37.85s ========================
[W305 19:03:17.949617640 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/backends/test_huggingface.py

[15/22] Running: /remote/test/backends/test_huggingface_tools.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

test/backends/test_huggingface_tools.py::test_tool PASSED                [100%]

========================= 1 passed in 61.38s (0:01:01) =========================
[W305 19:04:24.321115926 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/backends/test_huggingface_tools.py

[16/22] Running: /remote/test/backends/test_openai_vllm.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 7 items

test/backends/test_openai_vllm.py::test_instruct PASSED                  [ 14%]
test/backends/test_openai_vllm.py::test_multiturn PASSED                 [ 28%]
test/backends/test_openai_vllm.py::test_chat PASSED                      [ 42%]
test/backends/test_openai_vllm.py::test_chat_stream PASSED               [ 57%]
test/backends/test_openai_vllm.py::test_format PASSED                    [ 71%]
test/backends/test_openai_vllm.py::test_generate_from_raw PASSED         [ 85%]
test/backends/test_openai_vllm.py::test_generate_from_raw_with_format PASSED [100%]

======================== 7 passed in 113.10s (0:01:53) =========================
[W305 19:06:22.112444565 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/backends/test_openai_vllm.py

[17/22] Running: /remote/test/backends/test_vllm.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 8 items

test/backends/test_vllm.py::test_system_prompt PASSED                    [ 12%]
test/backends/test_vllm.py::test_instruct PASSED                         [ 25%]
test/backends/test_vllm.py::test_multiturn PASSED                        [ 37%]
test/backends/test_vllm.py::test_format PASSED                           [ 50%]
test/backends/test_vllm.py::test_generate_from_raw PASSED                [ 62%]
test/backends/test_vllm.py::test_generate_from_raw_with_format PASSED    [ 75%]
test/backends/test_vllm.py::test_async_parallel_requests PASSED          [ 87%]
test/backends/test_vllm.py::test_async_avalue PASSED                     [100%]

======================== 8 passed in 174.13s (0:02:54) =========================
[W305 19:09:22.833300376 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/backends/test_vllm.py

[18/22] Running: /remote/test/backends/test_vllm_tools.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

test/backends/test_vllm_tools.py::test_tool PASSED                       [100%]

=============================== warnings summary ===============================
test/backends/test_vllm_tools.py::test_tool
  /remote/.venv/lib/python3.12/site-packages/mistral_common/tokens/tokenizers/sentencepiece.py:125: FutureWarning: `get_control_token` is deprecated. Use `get_special_token` instead.
    warnings.warn("`get_control_token` is deprecated. Use `get_special_token` instead.", FutureWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================== 1 passed, 1 warning in 101.63s (0:01:41) ===================
[W305 19:11:09.829667643 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/backends/test_vllm_tools.py

[19/22] Running: /remote/test/cli/test_alora_train_integration.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 2 items

test/cli/test_alora_train_integration.py::test_alora_training_integration PASSED [ 50%]
test/cli/test_alora_train_integration.py::test_lora_training_integration PASSED [100%]

=============================== warnings summary ===============================
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
  /remote/.venv/lib/python3.12/site-packages/trl/trainer/utils.py:103: DeprecationWarning: This class is deprecated and will be removed in version 0.20.0. To train on completion only, please use the parameter `completion_only_loss` of `SFTConfig` instead.
    warnings.warn(

test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
  /remote/.venv/lib/python3.12/site-packages/trl/trainer/sft_config.py:257: DeprecationWarning: `max_seq_length` is deprecated and will be removed in version 0.20.0. Use `max_length` instead.
    warnings.warn(

test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
  /remote/.venv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:678: DeprecationWarning: Failed to apply the formatting function due to the following error: string index out of range. This may be because the function is designed for batched input. Please update it to process one example at a time (i.e., accept and return a single example). For now, we will attempt to apply the function in batched mode, but note that batched formatting is deprecated and will be removed in version 0.21.
    warnings.warn(

test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
  /remote/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/pin_memory.py:57: DeprecationWarning: The argument 'device' of Tensor.pin_memory() is deprecated. Please do not pass this argument. (Triggered internally at /pytorch/aten/src/ATen/native/Memory.cpp:46.)
    return data.pin_memory(device)

test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
  /remote/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/pin_memory.py:57: DeprecationWarning: The argument 'device' of Tensor.is_pinned() is deprecated. Please do not pass this argument. (Triggered internally at /pytorch/aten/src/ATen/native/Memory.cpp:31.)
    return data.pin_memory(device)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================= 2 passed, 10 warnings in 26.38s ========================
[W305 19:11:40.768958447 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/cli/test_alora_train_integration.py

[20/22] Running: /remote/test/core/test_component_typing.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 8 items

test/core/test_component_typing.py::test_mot_init_typing PASSED          [ 12%]
test/core/test_component_typing.py::test_simple_component_parsing PASSED [ 25%]
test/core/test_component_typing.py::test_subclassed_component_parsing PASSED [ 37%]
test/core/test_component_typing.py::test_component_parsing_fails PASSED  [ 50%]
test/core/test_component_typing.py::test_incorrect_type_override PASSED  [ 62%]
test/core/test_component_typing.py::test_generating SKIPPED (Ollama ...) [ 75%]
test/core/test_component_typing.py::test_message_typing SKIPPED (Oll...) [ 87%]
test/core/test_component_typing.py::test_generating_with_sampling SKIPPED [100%]

========================= 5 passed, 3 skipped in 0.39s =========================
[W305 19:11:45.201739531 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/core/test_component_typing.py

[21/22] Running: /remote/test/stdlib/components/intrinsic/test_rag.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 9 items

test/stdlib/components/intrinsic/test_rag.py::test_answerability PASSED  [ 11%]
test/stdlib/components/intrinsic/test_rag.py::test_query_rewrite PASSED  [ 22%]
test/stdlib/components/intrinsic/test_rag.py::test_citations PASSED      [ 33%]
test/stdlib/components/intrinsic/test_rag.py::test_context_relevance PASSED [ 44%]
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection PASSED [ 55%]
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance PASSED [ 66%]
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance_classifier PASSED [ 77%]
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_positive PASSED [ 88%]
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_negative PASSED [100%]

=============================== warnings summary ===============================
test/stdlib/components/intrinsic/test_rag.py::test_query_rewrite
test/stdlib/components/intrinsic/test_rag.py::test_citations
test/stdlib/components/intrinsic/test_rag.py::test_context_relevance
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection
test/stdlib/components/intrinsic/test_rag.py::test_answer_relevance
test/stdlib/components/intrinsic/test_rag.py::test_query_clarification_positive
  /remote/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 9 passed, 6 warnings in 48.43s ========================
[W305 19:12:38.577049376 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/stdlib/components/intrinsic/test_rag.py

[22/22] Running: /remote/test/stdlib/sampling/test_think_budget_forcing.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0 -- /remote/.venv/bin/python
cachedir: .pytest_cache
rootdir: /remote
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, asyncio-1.3.0, timeout-2.4.0, Faker-37.12.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 2 items

test/stdlib/sampling/test_think_budget_forcing.py::test_think_big SKIPPED [ 50%]
test/stdlib/sampling/test_think_budget_forcing.py::test_think_little SKIPPED [100%]

============================== 2 skipped in 0.37s ==============================
[W305 19:12:42.143880933 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
✓ Module passed: /remote/test/stdlib/sampling/test_think_budget_forcing.py

======================================================================
Failed modules (1):
  /remote/docs/examples/aLora/example_readme_generator.py:
    (module failed but couldn't parse specific test names)
======================================================================


=============================== Skipped Examples ===============================
The following examples were skipped during collection:

  • hello_world.py: Ollama not available (port 11434 not listening)
  • react_using_mellea.py: Ollama not available (port 11434 not listening)
  • react.py: Ollama not available (port 11434 not listening)
  • react_instruct.py: Ollama not available (port 11434 not listening)
  • contexts_with_sampling.py: Ollama not available (port 11434 not listening)
  • generate_with_context.py: Ollama not available (port 11434 not listening)
  • generative_gsm8k.py: Ollama not available (port 11434 not listening)
  • generative_slots.py: Ollama not available (port 11434 not listening)
  • generative_slots_with_requirements.py: Ollama not available (port 11434 not listening)
  • investment_advice.py: Ollama not available (port 11434 not listening)
  • decision_aides.py: Ollama not available (port 11434 not listening)
  • summarize_and_decide.py: Ollama not available (port 11434 not listening)
  • summarizers.py: Ollama not available (port 11434 not listening)
  • helpers.py: Ollama not available (port 11434 not listening)
  • vision_litellm_backend.py: Ollama not available (port 11434 not listening)
  • vision_ollama_chat.py: Ollama not available (port 11434 not listening)
  • vision_openai_examples.py: Ollama not available (port 11434 not listening)
  • 101_with_gen_slots.py: Ollama not available (port 11434 not listening)
  • advanced_with_m_instruct.py: Ollama not available (port 11434 not listening)
  • 101_email.py: Ollama not available (port 11434 not listening)
  • 101_email_comparison.py: Ollama not available (port 11434 not listening)
  • 101_email_with_requirements.py: Ollama not available (port 11434 not listening)
  • 101_email_with_validate.py: Ollama not available (port 11434 not listening)
  • advanced_email_with_validate_function.py: Ollama not available (port 11434 not listening)
  • langchain_messages.py: Ollama not available (port 11434 not listening)
  • m_decomp_result.py: Example marked to always skip (skip_always marker)
  • python_decompose_example.py: Ollama not available (port 11434 not listening)
  • python_decompose_result.py: Example marked to always skip (skip_always marker)
  • client.py: Example marked to always skip (skip_always marker)
  • m_serve_example_simple.py: Ollama not available (port 11434 not listening)
  • pii_serve.py: Example marked to always skip (skip_always marker)
  • mcp_example.py: Example marked to always skip (skip_always marker)
  • lazy.py: Ollama not available (port 11434 not listening)
  • lazy_fib.py: Ollama not available (port 11434 not listening)
  • lazy_fib_sample.py: Ollama not available (port 11434 not listening)
  • simple_example.py: Ollama not available (port 11434 not listening)
  • states.py: Ollama not available (port 11434 not listening)
  • mify.py: Ollama not available (port 11434 not listening)
  • rich_table_execute_basic.py: Ollama not available (port 11434 not listening)
  • context_docs.py: Ollama not available (port 11434 not listening)
  • researcher.py: Ollama not available (port 11434 not listening)
  • table.py: Ollama not available (port 11434 not listening)
  • mellea_pdf.py: Example marked to always skip (skip_always marker)
  • simple_rag_with_filter.py: Example marked to always skip (skip_always marker)
  • guardian.py: Ollama not available (port 11434 not listening)
  • guardian_huggingface.py: Ollama not available (port 11434 not listening)
  • repair_with_guardian.py: Ollama not available (port 11434 not listening)
  • creating_a_new_type_of_session.py: Ollama not available (port 11434 not listening)
  • sofai_graph_coloring.py: Ollama not available (port 11434 not listening)
  • telemetry_example.py: Ollama not available (port 11434 not listening)
  • interpreter_example.py: Ollama not available (port 11434 not listening)
  • smolagents_example.py: Ollama not available (port 11434 not listening)
  • tool_decorator_example.py: Ollama not available (port 11434 not listening)
  • compositionality_with_generative_slots.py: Ollama not available (port 11434 not listening)
  • context_example.py: Ollama not available (port 11434 not listening)
  • document_mobject.py: Ollama not available (port 11434 not listening)
  • example.py: Ollama not available (port 11434 not listening)
  • instruct_validate_repair.py: Ollama not available (port 11434 not listening)
  • model_options_example.py: Ollama not available (port 11434 not listening)
  • sentiment_classifier.py: Ollama not available (port 11434 not listening)
  • simple_email.py: Ollama not available (port 11434 not listening)
  • table_mobject.py: Ollama not available (port 11434 not listening)
====================== no tests ran in 1133.53s (0:18:53) ======================
!!!! _pytest.outcomes.Exit: Heavy GPU tests completed in isolated processes !!!!

@psschwei
Copy link
Member Author

psschwei commented Mar 5, 2026

will run ollama locally now

@psschwei
Copy link
Member Author

psschwei commented Mar 6, 2026

Ollama tests pass, except for the decompose test that should be fixed in #569

$ uv run pytest -m ollama
======================================================= test session starts ========================================================
platform darwin -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0
rootdir: /Users/paulschw/generative-computing/mellea-pr-580
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, anyio-4.11.0, timeout-2.4.0, asyncio-1.3.0, langsmith-0.6.6, Faker-37.12.0, cov-7.0.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 516 items / 354 deselected / 2 skipped / 162 selected

test/backends/test_litellm_ollama.py ........                                                                                [  4%]
test/backends/test_mellea_tool.py ..                                                                                         [  6%]
test/backends/test_ollama.py .....X....                                                                                      [ 12%]
test/backends/test_openai_ollama.py .............                                                                            [ 20%]
test/backends/test_tool_calls.py ...                                                                                         [ 22%]
test/backends/test_vision_ollama.py ....                                                                                     [ 24%]
test/backends/test_vision_openai.py ....                                                                                     [ 27%]
test/core/test_astream_incremental.py ......                                                                                 [ 30%]
test/core/test_component_typing.py ...                                                                                       [ 32%]
test/core/test_model_output_thunk.py ..                                                                                      [ 33%]
test/stdlib/components/test_genslot.py ...................                                                                   [ 45%]
test/stdlib/requirements/test_requirement.py .....                                                                           [ 48%]
test/stdlib/sampling/test_majority_voting.py ..                                                                              [ 50%]
test/stdlib/sampling/test_sampling_ctx.py ..                                                                                 [ 51%]
test/stdlib/sampling/test_sofai_graph_coloring.py ...                                                                        [ 53%]
test/stdlib/sampling/test_sofai_sampling.py .                                                                                [ 53%]
test/stdlib/sampling/test_think_budget_forcing.py ..                                                                         [ 54%]
test/stdlib/test_chat_view.py ..                                                                                             [ 56%]
test/stdlib/test_functional.py ....                                                                                          [ 58%]
test/stdlib/test_session.py s.......                                                                                         [ 63%]
test/telemetry/test_tracing.py ....                                                                                          [ 66%]
docs/examples/agents/react/react_from_scratch/react.py .                                                                     [ 66%]
docs/examples/agents/react/react_from_scratch/react_instruct.py .                                                            [ 67%]
docs/examples/agents/react/react_using_mellea.py .                                                                           [ 67%]
docs/examples/context/contexts_with_sampling.py .                                                                            [ 68%]
docs/examples/generative_slots/generate_with_context.py .                                                                    [ 69%]
docs/examples/generative_slots/generative_gsm8k.py .                                                                         [ 69%]
docs/examples/generative_slots/generative_slots.py .                                                                         [ 70%]
docs/examples/generative_slots/generative_slots_with_requirements.py .                                                       [ 70%]
docs/examples/generative_slots/inter_module_composition/decision_aides.py .                                                  [ 71%]
docs/examples/generative_slots/inter_module_composition/summarize_and_decide.py .                                            [ 72%]
docs/examples/generative_slots/inter_module_composition/summarizers.py .                                                     [ 72%]
docs/examples/generative_slots/investment_advice.py .                                                                        [ 73%]
docs/examples/hello_world.py .                                                                                               [ 74%]
docs/examples/helper/helpers.py .                                                                                            [ 74%]
docs/examples/image_text_models/vision_litellm_backend.py .                                                                  [ 75%]
docs/examples/image_text_models/vision_ollama_chat.py .                                                                      [ 75%]
docs/examples/image_text_models/vision_openai_examples.py .                                                                  [ 76%]
docs/examples/information_extraction/101_with_gen_slots.py .                                                                 [ 77%]
docs/examples/information_extraction/advanced_with_m_instruct.py .                                                           [ 77%]
docs/examples/instruct_validate_repair/101_email.py .                                                                        [ 78%]
docs/examples/instruct_validate_repair/101_email_comparison.py .                                                             [ 79%]
docs/examples/instruct_validate_repair/101_email_with_requirements.py .                                                      [ 79%]
docs/examples/instruct_validate_repair/101_email_with_validate.py .                                                          [ 80%]
docs/examples/instruct_validate_repair/advanced_email_with_validate_function.py .                                            [ 80%]
docs/examples/library_interop/langchain_messages.py .                                                                        [ 81%]
docs/examples/m_decompose/python/python_decompose_example.py F                                                               [ 82%]
docs/examples/m_serve/m_serve_example_simple.py .                                                                            [ 82%]
docs/examples/melp/lazy.py .                                                                                                 [ 83%]
docs/examples/melp/lazy_fib.py .                                                                                             [ 83%]
docs/examples/melp/lazy_fib_sample.py .                                                                                      [ 84%]
docs/examples/melp/simple_example.py .                                                                                       [ 85%]
docs/examples/melp/states.py .                                                                                               [ 85%]
docs/examples/mify/mify.py .                                                                                                 [ 86%]
docs/examples/mify/rich_table_execute_basic.py .                                                                             [ 87%]
docs/examples/mini_researcher/context_docs.py .                                                                              [ 87%]
docs/examples/mini_researcher/researcher.py .                                                                                [ 88%]
docs/examples/mobject/table.py .                                                                                             [ 88%]
docs/examples/safety/guardian.py .                                                                                           [ 89%]
docs/examples/safety/guardian_huggingface.py .                                                                               [ 90%]
docs/examples/safety/repair_with_guardian.py .                                                                               [ 90%]
docs/examples/sessions/creating_a_new_type_of_session.py .                                                                   [ 91%]
docs/examples/sofai/sofai_graph_coloring.py .                                                                                [ 91%]
docs/examples/telemetry/telemetry_example.py .                                                                               [ 92%]
docs/examples/tools/interpreter_example.py .                                                                                 [ 93%]
docs/examples/tools/smolagents_example.py .                                                                                  [ 93%]
docs/examples/tools/tool_decorator_example.py .                                                                              [ 94%]
docs/examples/tutorial/compositionality_with_generative_slots.py .                                                           [ 95%]
docs/examples/tutorial/context_example.py .                                                                                  [ 95%]
docs/examples/tutorial/document_mobject.py .                                                                                 [ 96%]
docs/examples/tutorial/example.py .                                                                                          [ 96%]
docs/examples/tutorial/instruct_validate_repair.py .                                                                         [ 97%]
docs/examples/tutorial/model_options_example.py .                                                                            [ 98%]
docs/examples/tutorial/sentiment_classifier.py .                                                                             [ 98%]
docs/examples/tutorial/simple_email.py .                                                                                     [ 99%]
docs/examples/tutorial/table_mobject.py .                                                                                    [100%]

============================================================= FAILURES =============================================================
_______________________________________________ usecase: python_decompose_example.py _______________________________________________
Example failed with exit code 1.
Stderr:
  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:06<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:03<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:02<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:01<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:01<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:01<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:11<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:03<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:02<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:04<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:03<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:02<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:04<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:05<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:03<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/paulschw/generative-computing/mellea-pr-580/docs/examples/m_decompose/python/python_decompose_example.py", line 226, in <module>
    main()
  File "/Users/paulschw/generative-computing/mellea-pr-580/docs/examples/m_decompose/python/python_decompose_example.py", line 205, in main
    result = run_decompose(task_prompt)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulschw/generative-computing/mellea-pr-580/docs/examples/m_decompose/python/python_decompose_example.py", line 33, in run_decompose
    result = decompose(
             ^^^^^^^^^^
  File "/Users/paulschw/generative-computing/mellea-pr-580/cli/decompose/pipeline.py", line 135, in decompose
    ).parse()
      ^^^^^^^
  File "/Users/paulschw/generative-computing/mellea-pr-580/cli/decompose/prompt_modules/_prompt_modules.py", line 34, in parse
    return self._parser(self.__str__())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulschw/generative-computing/mellea-pr-580/cli/decompose/prompt_modules/subtask_constraint_assign/_subtask_constraint_assign.py", line 104, in _default_parser
    raise TagExtractionError(
cli.decompose.prompt_modules.subtask_constraint_assign._exceptions.TagExtractionError: Module Error "subtask_constraint_assign"; LLM failed to generate correct tags for extraction: "<assigned_constraints>"


------------------------------------------------------- Captured stdout call -------------------------------------------------------
======================================================================
Mellea Decompose Example
======================================================================

Original Task:

Write a short blog post about the benefits of morning exercise.
Include a catchy title, an introduction paragraph, three main benefits
with explanations, and a conclusion that encourages readers to start
their morning exercise routine.

Running decomposition pipeline...

=== 23:01:58-INFO ======
SUCCESS
=== 23:02:01-INFO ======
SUCCESS
=== 23:02:04-INFO ======
SUCCESS
=== 23:02:05-INFO ======
SUCCESS
=== 23:02:07-INFO ======
SUCCESS
=== 23:02:08-INFO ======
SUCCESS
=== 23:02:20-INFO ======
SUCCESS
=== 23:02:24-INFO ======
SUCCESS
=== 23:02:26-INFO ======
SUCCESS
=== 23:02:31-INFO ======
SUCCESS
=== 23:02:35-INFO ======
SUCCESS
=== 23:02:38-INFO ======
SUCCESS
=== 23:02:42-INFO ======
SUCCESS
=== 23:02:47-INFO ======
SUCCESS
=== 23:02:51-INFO ======
SUCCESS
=== 23:02:51-INFO ======
SUCCESS
=== 23:02:52-INFO ======
SUCCESS
=== 23:02:53-INFO ======
SUCCESS
=== 23:02:54-INFO ======
SUCCESS
=== 23:02:54-INFO ======
SUCCESS
=== 23:02:55-INFO ======
SUCCESS
=== 23:02:56-INFO ======
SUCCESS
========================================================= warnings summary =========================================================
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_generate_from_raw
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/aiohttp/connector.py:963: DeprecationWarning: enable_cleanup_closed ignored because https://github.com/python/cpython/pull/118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=12, releaselevel='final', serial=0)
    super().__init__(

test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='1 + 1 eq...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Subject:...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='yes', ro...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_gen_slot
test/backends/test_litellm_ollama.py::test_generate_from_raw
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='{\n    "...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_async_parallel_requests
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py:1855: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
    obj_dict = processed_chunk.dict()

test/backends/test_litellm_ollama.py::test_async_parallel_requests
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Goodbye!...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_mellea_tool.py::test_from_callable_generation
  /Users/paulschw/generative-computing/mellea-pr-580/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Hello! H...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_tool_calls.py::test_tool_called_from_context_action
  <frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.

test/backends/test_vision_ollama.py::test_image_block_construction
  /Users/paulschw/generative-computing/mellea-pr-580/test/backends/test_vision_ollama.py:38: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
    random_image = Image.fromarray(random_pixel_data, "RGB")

test/backends/test_vision_openai.py::test_image_block_construction
  /Users/paulschw/generative-computing/mellea-pr-580/test/backends/test_vision_openai.py:48: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
    random_image = Image.fromarray(random_pixel_data, "RGB")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================= Skipped Examples =========================================================
The following examples were skipped during collection:

  • m_decomp_result.py: Example marked to always skip (skip_always marker)
  • python_decompose_result.py: Example marked to always skip (skip_always marker)
  • client.py: Example marked to always skip (skip_always marker)
  • pii_serve.py: Example marked to always skip (skip_always marker)
  • mcp_example.py: Example marked to always skip (skip_always marker)
  • mellea_pdf.py: Example marked to always skip (skip_always marker)
  • simple_rag_with_filter.py: Example marked to always skip (skip_always marker)
========================================================== tests coverage ==========================================================
________________________________________ coverage: platform darwin, python 3.12.12-final-0 _________________________________________

Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
===================================================== short test summary info ======================================================
FAILED docs/examples/m_decompose/python/python_decompose_example.py::python_decompose_example.py - Example failed with exit code 1.
================== 1 failed, 159 passed, 3 skipped, 354 deselected, 1 xpassed, 19 warnings in 1054.11s (0:17:34) ===================

@psschwei psschwei marked this pull request as ready for review March 6, 2026 04:14
@psschwei psschwei requested a review from a team as a code owner March 6, 2026 04:14
@planetf1
Copy link
Contributor

planetf1 commented Mar 6, 2026

This also closes #432 which I observed in testing

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; tests pass and potential issues with telemetry spans were addressed

@psschwei psschwei added this pull request to the merge queue Mar 6, 2026
Merged via the queue into generative-computing:main with commit af25037 Mar 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants