Conversation
The md_pop_one function in hashtable.h was missing a Py_DECREF(identity) call when the key was not found. This caused a reference leak on the identity object, which is particularly problematic for CIMultiDict where a new lowercase string is created for each lookup. Fixes: aio-libs#1273
for more information, see https://pre-commit.ci
0fe5e80 to
ed6d373
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1288 +/- ##
========================================
Coverage 99.85% 99.86%
========================================
Files 26 28 +2
Lines 3513 3657 +144
Branches 253 258 +5
========================================
+ Hits 3508 3652 +144
Misses 3 3
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
About the Coveralls Coverage Metrics |
"""Benchmark comparing C vs Python implementation of to_dict()."""
import timeit
import gc
from multidict._multidict import MultiDict as CMultiDict
from multidict._multidict_py import MultiDict as PyMultiDict
def create_multidict(cls, num_keys: int, vals_per_key: int):
items = [(f"key{i % num_keys}", f"value{i}") for i in range(num_keys * vals_per_key)]
return cls(items)
def benchmark_to_dict(md, iterations: int = 1000) -> float:
# Disable GC during timing to reduce noise
gc_old = gc.isenabled()
gc.disable()
try:
def run():
md.to_dict()
return timeit.timeit(run, number=iterations)
finally:
if gc_old:
gc.enable()
def run_benchmark(num_keys: int, vals_per_key: int, iterations: int = 1000) -> None:
total_items = num_keys * vals_per_key
c_md = create_multidict(CMultiDict, num_keys, vals_per_key)
py_md = create_multidict(PyMultiDict, num_keys, vals_per_key)
c_time = benchmark_to_dict(c_md, iterations)
py_time = benchmark_to_dict(py_md, iterations)
speedup = py_time / c_time if c_time > 0 else 0
print(f"| {num_keys:>6} | {vals_per_key:>6} | {total_items:>7} | {c_time*1000:>10.2f} | {py_time*1000:>10.2f} | {speedup:>7.2f}x |")
def main() -> None:
print("\n" + "=" * 80)
print("to_dict() Benchmark: C Extension vs Pure Python")
print("=" * 80)
print(f"\n{'Iterations per test:':30} 1000")
print(f"{'Time unit:':30} milliseconds (total for 1000 calls)\n")
print("| Keys | V/Key | Total | C (ms) | Py (ms) | Speedup |")
print("|--------|--------|---------|------------|------------|---------|")
scenarios = [
(10, 1),
(10, 10),
(100, 10),
(1000, 10),
(100, 100),
# Larger datasets to target >2s execution time for Python
(2000, 20), # 40,000 items
(5000, 10), # 50,000 items
]
for num_keys, vals_per_key in scenarios:
run_benchmark(num_keys, vals_per_key)
print("\n" + "=" * 80)
print("Speedup = Python time / C time (higher is better for C)")
print("=" * 80 + "\n")
if __name__ == "__main__":
main()to_dict() Benchmark: C Extension vs Pure PythonIterations per test: 1000
=====================================
|
There was a problem hiding this comment.
I would've applied some smarter logic like traverse all keys and gathering all the values up with something like getall(...) when going over keys since it would be a bit faster due to less time needing to check weather or not something is a list. Otherwise I think this is a good start in the right direction. There might be a quicker method though that you can try which I've attempted to visualize for you although I haven't benchmarked it but just incase here was my idea.
It's advantages mainly revolve around not needing to check on weather or not a key has already been used.
from multidict import MultiDict
# incase you need a visual of the logic I'm trying to explain
def to_dict(md: MutliDict[str]):
return {k: md.getall(k) for k in md.keys()}|
Hello @Vizonex , I've posted about using I haven't look into the Using the ================================================================================
|
| Keys | V/Key | Total | C (ms) | Py (ms) | Speedup |
|---|---|---|---|---|---|
| 10 | 1 | 10 | 0.93 | 18.91 | 20.27x |
| 10 | 10 | 100 | 5.66 | 495.79 | 87.56x |
| 100 | 10 | 1000 | 41.58 | 4644.99 | 111.72x |
| 1000 | 10 | 10000 | 424.65 | 53012.13 | 124.84x |
================================================================================
Speedup = Python time / C time (higher is better for C)
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
ee4cb7d to
d58c07f
Compare
Vizonex
left a comment
There was a problem hiding this comment.
Converter looks good. I have no complaints with this one. Great job with the pytest module also.
What do these changes do?
Implement
to_dict()methods forMultiDict,CIMultiDict,MultiDictProxy, andCIMultiDictProxy. This method groups values with the same key into a list.Implementation Details
multidict_to_dictin_multidict.cwith directPyDictandPyListmanipulation for performance.to_dictin_multidict_py.py.tests/test_to_dict.py, including verification of order preservation, case-insensitivity, mixed types, and proxy mutation isolation.tests/isolated/multidict_to_dict.py) and integrated it into the CI suite (tests/test_leaks.py). Validated with intentional leak inserted in code (removed after test).Example
Are there changes in behavior for the user?
No existing behavior changes. This adds a new method
to_dict()Related issue number
Fixes #783 (Add
to_dictmethod)Checklist