Skip to content

py/map: Convert map implementation to preserve insertion order. #10

Open
andrewleech wants to merge 3 commits intomasterfrom
py-map-ordered
Open

py/map: Convert map implementation to preserve insertion order. #10
andrewleech wants to merge 3 commits intomasterfrom
py-map-ordered

Conversation

@andrewleech
Copy link
Copy Markdown
Owner

Summary

py/map: Convert map implementation to preserve insertion order.

Testing

TBD

Trade-offs and Alternatives

TBD

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 25, 2026

Code size report:

Reference:  docs/esp32: Replace 'esptool.py' by 'esptool' in command line example. [e4920d6]
Comparison: tests: Add ordered dict tests and benchmarks. [merge of 2197cfe]
  mpy-cross:   +64 +0.017% 
   bare-arm:   +36 +0.064% 
minimal x86:  +128 +0.068% 
   unix x64:  +696 +0.081% standard
      stm32: +7440 +1.886% PYBV10[incl +396(bss)]
      esp32:  +356 +0.020% ESP32_GENERIC[incl -32(data)]
     mimxrt:  +320 +0.083% TEENSY40
        rp2:  +544 +0.059% RPI_PICO_W
       samd:  +320 +0.116% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +606 +0.133% VIRT_RV32

Repository owner deleted a comment from mpy-reviewer Bot Feb 25, 2026
@andrewleech
Copy link
Copy Markdown
Owner Author

/review

Copy link
Copy Markdown

@mpy-reviewer mpy-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several blocking issues need addressing before this can land. The most critical are: an out-of-bounds read in mp_map_init_copy that ASAN catches on dict_fixed.py, a memory leak from m_del being called with the wrong byte size on every map dealloc/rehash, and heap-locked allocation failures in settrace_stackless (heapalloc.py, extreme_exc.py, heapalloc_super.py crash with MemoryError: memory allocation failed, heap is locked). DEVLOG.md should not be committed to the repository. Also missing: a MICROPY_PY_MAP_ORDERED compile-time toggle to select between implementations (noted as TODO in the DEVLOG itself).

CI Issues: sanitize_address fails on basics/dict_fixed.py (out-of-bounds read in mp_map_init_copy). settrace_stackless fails on micropython/heapalloc.py, micropython/extreme_exc.py, and micropython/heapalloc_super.py — all crash with MemoryError: memory allocation failed, heap is locked, indicating the new implementation triggers heap allocation during previously allocation-free operations.

Comment thread DEVLOG.md Outdated
@@ -0,0 +1,314 @@
# MicroPython Ordered Dict Development Log
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't commit this to the repo.

Comment thread py/map.c Outdated
map->is_ordered = src->is_ordered;
size_t n = MP_MAP_TABLE_BYTE_SIZE(map->alloc);
map->table = m_malloc0(n);
memcpy(map->table, src->table, n);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src->table for an ordered map (including ROM maps) has only src->alloc * sizeof(mp_map_elem_t) bytes — there are no hash indices appended. Copying MP_MAP_TABLE_BYTE_SIZE(src->alloc) bytes is an out-of-bounds read. This is what ASAN catches in dict_fixed.py. Need:

size_t n = src->is_ordered
    ? src->alloc * sizeof(mp_map_elem_t)
    : MP_MAP_TABLE_BYTE_SIZE(src->alloc);

Comment thread py/map.c
mp_map_lookup(map, old_table[i].key, MP_MAP_LOOKUP_ADD_IF_NOT_FOUND)->value = old_table[i].value;
}
}
m_del(mp_map_elem_t, old_table, old_alloc);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_del(mp_map_elem_t, old_table, old_alloc) frees old_alloc * sizeof(mp_map_elem_t) bytes, but the allocation was MP_MAP_TABLE_BYTE_SIZE(old_alloc) bytes. Memory leak of MP_MAP_INDEX_SIZE(old_alloc) * old_alloc bytes on every rehash. Same bug in mp_map_compact and in mp_map_deinit (which uses the pre-existing m_del call that now also has the wrong size). Need m_free(old_table, MP_MAP_TABLE_BYTE_SIZE(old_alloc)) at all three sites.

Comment thread py/map.c
// Skip if dict is now empty (no point compacting an empty dict).
if (map->filled > 0 && map->used - map->filled > map->filled / 2) {
mp_map_compact(map);
// After compact, original slot is invalid; return value in first empty slot
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After mp_map_compact(), map->used == map->filled and &map->table[map->used] is the next insertion slot. Storing the deleted value there with key = MP_OBJ_NULL is fragile — any reentrant lookup that inserts will clobber that slot before the caller reads slot->value. Pass the value out via an out-param or save it before calling compact.

Comment thread py/obj.h Outdated
void mp_map_init(mp_map_t *map, size_t n);
void mp_map_init_fixed_table(mp_map_t *map, size_t n, const mp_obj_t *table);
void mp_map_init_copy(mp_map_t *map, const mp_map_t *src);
mp_map_t *mp_map_new(size_t n);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mp_map_new and mp_map_free are declared here but not defined anywhere in this diff.

Comment thread py/map.c Outdated
// Macros and functions to deal with key/value table and hash table.
// map->table points to the key/value table, then the hash table follows,
// which can be uint8_t, uint16_t, or uint32_t depending on allocation size.
#define MP_MAP_IS_UINT8(alloc) ((alloc) < 255)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be < 256. With 1-based indexing, alloc=255 stores max index 255 which fits in uint8_t.

Suggested change
#define MP_MAP_IS_UINT8(alloc) ((alloc) < 255)
#define MP_MAP_IS_UINT8(alloc) ((alloc) < 256)

Comment thread py/mpconfig.h Outdated

// Whether to support dicts with >65535 elements (requires uint32_t hash indices)
#ifndef MICROPY_PY_MAP_LARGE
#define MICROPY_PY_MAP_LARGE (1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should default to 0 — most embedded targets won't need >65535-element dicts, and enabling this by default adds the uint32_t index paths to all builds.

Comment thread py/obj.h Outdated
size_t used : (8 * sizeof(size_t) - 3);
size_t used : (8 * sizeof(size_t) - 3); // next insertion index in dense array
size_t alloc;
size_t filled; // number of non-deleted entries (for O(1) len())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a full size_t filled costs +4 or +8 bytes per mp_map_t. Every module dict, class dict, and instance dict pays this. Is there a way to track it more cheaply, e.g., derive it from used minus the tombstone count?

@andrewleech andrewleech force-pushed the py-map-ordered branch 9 times, most recently from 740bcf8 to fb2071d Compare March 29, 2026 11:40
dpgeorge and others added 3 commits March 31, 2026 11:35
Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@andrewleech andrewleech force-pushed the master branch 7 times, most recently from ce2c0c9 to 9f396bb Compare May 1, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants