Skip to content

Feat: add the to_dict()#1288

Open
rodrigobnogueira wants to merge 26 commits intoaio-libs:masterfrom
rodrigobnogueira:feat/to-dict
Open

Feat: add the to_dict()#1288
rodrigobnogueira wants to merge 26 commits intoaio-libs:masterfrom
rodrigobnogueira:feat/to-dict

Conversation

@rodrigobnogueira
Copy link
Copy Markdown
Member

What do these changes do?

Implement to_dict() methods for MultiDict, CIMultiDict, MultiDictProxy, and CIMultiDictProxy. This method groups values with the same key into a list.

Implementation Details

  • C Extension: Added multidict_to_dict in _multidict.c with direct PyDict and PyList manipulation for performance.
  • Python Fallback: Added generic to_dict in _multidict_py.py.
  • Tests: Added comprehensive test coverage in tests/test_to_dict.py, including verification of order preservation, case-insensitivity, mixed types, and proxy mutation isolation.
  • Memory Safety: Added a new isolated leak check (tests/isolated/multidict_to_dict.py) and integrated it into the CI suite (tests/test_leaks.py). Validated with intentional leak inserted in code (removed after test).

Example

md = MultiDict([("a", 1), ("b", 2), ("a", 3)])
md.to_dict()

# Result: {'a': [1, 3], 'b': [2]}

Are there changes in behavior for the user?

No existing behavior changes. This adds a new method to_dict()

Related issue number

Fixes #783 (Add to_dict method)

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes

rodrigo.nogueira added 2 commits January 22, 2026 14:36
The md_pop_one function in hashtable.h was missing a Py_DECREF(identity)
call when the key was not found. This caused a reference leak on the
identity object, which is particularly problematic for CIMultiDict where
a new lowercase string is created for each lookup.

Fixes: aio-libs#1273
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Jan 25, 2026

Merging this PR will not alter performance

✅ 245 untouched benchmarks


Comparing rodrigobnogueira:feat/to-dict (a5920ec) with master (81fc6b9)

Open in CodSpeed

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jan 25, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.86%. Comparing base (81fc6b9) to head (a5920ec).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #1288    +/-   ##
========================================
  Coverage   99.85%   99.86%            
========================================
  Files          26       28     +2     
  Lines        3513     3657   +144     
  Branches      253      258     +5     
========================================
+ Hits         3508     3652   +144     
  Misses          3        3            
  Partials        2        2            
Flag Coverage Δ
CI-GHA 99.86% <100.00%> (+<0.01%) ⬆️
pytest 99.86% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rodrigobnogueira
Copy link
Copy Markdown
Member Author

About the Coveralls Coverage Metrics
Coveralls report shows "40.44% coverage on new lines". Coveralls is reporting MyPy type precision coverage, not pytest test execution coverage.

@rodrigobnogueira
Copy link
Copy Markdown
Member Author

rodrigobnogueira commented Jan 26, 2026

"""Benchmark comparing C vs Python implementation of to_dict()."""

import timeit
import gc
from multidict._multidict import MultiDict as CMultiDict
from multidict._multidict_py import MultiDict as PyMultiDict


def create_multidict(cls, num_keys: int, vals_per_key: int):
    items = [(f"key{i % num_keys}", f"value{i}") for i in range(num_keys * vals_per_key)]
    return cls(items)


def benchmark_to_dict(md, iterations: int = 1000) -> float:
    # Disable GC during timing to reduce noise
    gc_old = gc.isenabled()
    gc.disable()
    try:
        def run():
            md.to_dict()
        return timeit.timeit(run, number=iterations)
    finally:
        if gc_old:
            gc.enable()


def run_benchmark(num_keys: int, vals_per_key: int, iterations: int = 1000) -> None:
    total_items = num_keys * vals_per_key
    c_md = create_multidict(CMultiDict, num_keys, vals_per_key)
    py_md = create_multidict(PyMultiDict, num_keys, vals_per_key)
    
    c_time = benchmark_to_dict(c_md, iterations)
    py_time = benchmark_to_dict(py_md, iterations)
    
    speedup = py_time / c_time if c_time > 0 else 0
    
    print(f"| {num_keys:>6} | {vals_per_key:>6} | {total_items:>7} | {c_time*1000:>10.2f} | {py_time*1000:>10.2f} | {speedup:>7.2f}x |")


def main() -> None:
    print("\n" + "=" * 80)
    print("to_dict() Benchmark: C Extension vs Pure Python")
    print("=" * 80)
    print(f"\n{'Iterations per test:':30} 1000")
    print(f"{'Time unit:':30} milliseconds (total for 1000 calls)\n")
    
    print("|  Keys  | V/Key  |  Total  |   C (ms)   |   Py (ms)  | Speedup |")
    print("|--------|--------|---------|------------|------------|---------|")
    
    scenarios = [
        (10, 1),
        (10, 10),
        (100, 10),
        (1000, 10),
        (100, 100),
        # Larger datasets to target >2s execution time for Python
        (2000, 20),   # 40,000 items
        (5000, 10),   # 50,000 items
    ]
    
    for num_keys, vals_per_key in scenarios:
        run_benchmark(num_keys, vals_per_key)
    
    print("\n" + "=" * 80)
    print("Speedup = Python time / C time (higher is better for C)")
    print("=" * 80 + "\n")


if __name__ == "__main__":
    main()

to_dict() Benchmark: C Extension vs Pure Python

Iterations per test: 1000
Time unit: milliseconds (total for 1000 calls)

Keys V/Key Total C (ms) Py (ms) Speedup
10 1 10 0.73 1.77 2.43x
10 10 100 3.58 9.16 2.56x
100 10 1000 37.63 93.52 2.49x
1000 10 10000 456.58 858.80 1.88x
100 100 10000 315.51 715.76 2.27x
2000 20 40000 1694.38 3462.59 2.04x
5000 10 50000 2684.84 4875.64 1.82x

=====================================
Speedup = Python time / C time (higher is better for C)

@rodrigobnogueira rodrigobnogueira marked this pull request as ready for review January 26, 2026 02:27
Copy link
Copy Markdown
Member

@Vizonex Vizonex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've applied some smarter logic like traverse all keys and gathering all the values up with something like getall(...) when going over keys since it would be a bit faster due to less time needing to check weather or not something is a list. Otherwise I think this is a good start in the right direction. There might be a quicker method though that you can try which I've attempted to visualize for you although I haven't benchmarked it but just incase here was my idea.

It's advantages mainly revolve around not needing to check on weather or not a key has already been used.

from multidict import MultiDict
# incase you need a visual of the logic I'm trying to explain
def to_dict(md: MutliDict[str]):
    return {k: md.getall(k) for k in md.keys()}

@rodrigobnogueira
Copy link
Copy Markdown
Member Author

rodrigobnogueira commented Jan 27, 2026

Hello @Vizonex ,

I've posted about using getall in this comment: #783 (comment)

I haven't look into the getall details yet, but it's not viable to use it as-is in the to_dict() transformation:
I couldn't even run the most demanding cases of the benchmark I posted in a previous message.

Using the getall function we get:

================================================================================
to_dict() Benchmark: C Extension vs Pure Python

Iterations per test: 1000
Time unit: milliseconds (total for 1000 calls)

Keys V/Key Total C (ms) Py (ms) Speedup
10 1 10 0.93 18.91 20.27x
10 10 100 5.66 495.79 87.56x
100 10 1000 41.58 4644.99 111.72x
1000 10 10000 424.65 53012.13 124.84x

================================================================================
Speedup = Python time / C time (higher is better for C)

Copy link
Copy Markdown
Member

@Vizonex Vizonex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converter looks good. I have no complaints with this one. Great job with the pytest module also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: to_dict method

3 participants