Fix #571: Loose-island collider generation hangs and crashes for large assets#603
Fix #571: Loose-island collider generation hangs and crashes for large assets#603Weisl merged 13 commits intoWeisl:mainfrom
Conversation
TestGetFaceIslands covers: single triangle, two disconnected triangles, adjacent faces forming one island, many disconnected islands, and a 50x50 quad grid (2500 connected faces, one island). Two performance benchmarks (always-pass, print stats to stdout): test_performance_large_connected_island: 50x50 quad grid, 2500 connected quad faces, 50 trials. Exercises traversal and vertex-deduplication cost on a single large island. test_performance_many_disconnected_islands: 11,000 disconnected triangles, 50 trials. Exercises the high-island-count case; with the initial recursive implementation this benchmark raises RecursionError (get_face_islands recurses once per island), documenting the failure mode that the following commit fixes. _PERF_VERSION = 1 marks the baseline for the initial recursive implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert _get_face_islands() from recursive to iterative, using BMFace.tag (set by _get_linked_faces() during traversal) to skip already-processed faces. Rename to _get_face_islands() as it has no production call sites (the one external call, test_mesh_split_by_island.py, is updated in the same commit). Fix the mutable default argument (face_islands=[]) to face_islands=None: Python evaluates default argument values once at function definition time, so the same list object would be reused across calls, causing islands from a previous call to appear in subsequent ones. Convert _get_linked_faces() from recursive DFS to iterative DFS using an explicit stack. The recursive version was O(N^2) due to extend() copying sub-call results up the call stack at every level, and would also hit Python's recursion limit for large connected islands, preventing local reproduction of the crash in Weisl#571. The iterative version is O(N) — each face is visited and appended exactly once. Since _get_linked_faces() no longer recurses, remove the recursion limit setup from _get_face_islands() and import sys. The artificially high recursion limit that _get_linked_faces() had been setting also masked a separate RecursionError in Welzl's sphere algorithm, which the reviewer found and was fixed in PR Weisl#594. Replace O(N^2) linear-scan vertex deduplication in construct_python_faces() with a dict keyed on BMVert objects for O(1) lookup. Remove the bm.faces.ensure_lookup_table() call from inside the island loop. The original recursive implementation called faces[0] on a BMFaceSeq, which is integer index access and does require ensure_lookup_table(). The iterative replacement uses `for face in faces:` instead, so neither _get_linked_faces nor construct_python_faces accesses faces by integer index any longer, making the call unnecessary. Update test import to match the renamed _get_face_islands(). Bump _PERF_VERSION to 2 to mark the new performance baseline. Measured on an Apple M4 Max (Blender 5.0.1, 50x50 quad grid, 50 trials): Before (v1, recursive + O(N^2) vertex scan): mean: 150.807 ms median: 149.910 ms std dev: 2.614 ms min: 147.464 ms max: 160.670 ms After (v2, iterative + O(1) dict lookup): mean: 2.030 ms median: 1.776 ms std dev: 1.030 ms min: 1.700 ms max: 6.486 ms ~84x speedup (median). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests three scenarios in the loose-island collider path: - create_objs_from_island must not leave the source object in edit mode - create_objs_from_island must return one object per disconnected island - apply_all_modifiers must only be called when my_use_modifier_stack is True TestApplyModifiersGuard drives get_pre_processed_mesh_objs directly via _OperatorSpy with create_objs_from_island patched to a no-op, so the modifier-application decision can be verified in isolation from the island-splitting step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses failing case attached to Weisl#571. Bug 1 (root cause): apply_all_modifiers was called unconditionally at the loose-island split site, regardless of my_use_modifier_stack. Any object with a modifier therefore had it baked into the mesh before island detection even when "Use Modifier Stack" was OFF. Fix: guard apply_all_modifiers with `if self.my_use_modifier_stack`. Bug 2: create_objs_from_island switched to edit mode via mode_set + bmesh.from_edit_mesh, then called bm.free() on the resulting bmesh. That bmesh is owned by Blender's edit mesh system, so freeing it is incorrect. The mode switch also synchronously evaluates the full modifier stack for the viewport display, which is unnecessary and potentially expensive. The object was additionally left in edit mode after the function returned. Fix: use bmesh.new() + bm.from_mesh(obj.data) instead, which reads the same data without any mode switch and frees the caller-owned bmesh correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TestPrimitivePostprocessingDoesNotHideTmpMeshes: verifies that primitive_postprocessing() does not hide tmp_meshes on each call. Pre-fix it hides all N objects on every call, producing O(N²) total hide_set() calls for N islands. Fails pre-fix, will pass after the hide loop is moved to reset_to_initial_state(). - TestCustomSetParentPreservesTransform: regression guard for the custom_set_parent() fix. Verifies that after replacing the bpy.ops.object.parent_set() call with direct Python assignment, child.parent is set and child.matrix_world is preserved (±1e-5). Also includes test_child_world_position_when_location_set_without_depsgraph_update, a regression test that prevents a latent positional bug: when child.location is set after the last depsgraph evaluation (as cylinder and sphere colliders do), the cached matrix_world is stale and would cause custom_set_parent to place the child at the old cursor/origin position rather than the intended centre. The prefs lookup in primitive_postprocessing() requires the add-on to be registered. setUpClass patches _prim_mod.base_package to the extension key found in bpy.context.preferences.addons so no re-enable is needed. A skip guard handles environments where the add-on is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
custom_set_parent(): replace bpy.ops.object.parent_set() with direct Python
assignment. The operator triggers a full depsgraph evaluation on every call;
with N islands that is N full rebuilds, each growing more expensive as scene
objects accumulate.
The child's world matrix is built from child.location, child.rotation_euler,
and child.scale rather than child.matrix_world, because matrix_world is only
updated by the depsgraph. For cylinder and sphere colliders the location is
set after the last depsgraph-updating operation, so matrix_world is stale and
would capture the old cursor/origin position rather than the intended centre:
mtx = Matrix.LocRotScale(child.location, child.rotation_euler, child.scale)
child.parent = parent
child.matrix_parent_inverse = parent.matrix_world.inverted()
child.matrix_world = mtx
Box colliders are unaffected: their position is baked into mesh vertices (no
obj.location set) or set via obj.matrix_world = M (which updates the cache
immediately), so matrix_world was never stale when custom_set_parent ran.
primitive_postprocessing() / reset_to_initial_state(): move the tmp_meshes
hide loop out of primitive_postprocessing() (called once per collider) and
into reset_to_initial_state() (called once per execute()). The old placement
ran N iterations over N objects → N² hide_set() calls for N islands. For the
11k-island asset this was ~121 million calls. The loop now runs exactly once
with N calls total.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
unique_name() restarted its counter from 1 on every call. With N islands
all sharing the same name base (same parent), total bpy.data.objects
lookups were 1+2+…+N ≈ N²/2 — about 60 million for 11k islands.
Fix: add an optional cache parameter (a plain dict) that callers can pass
to resume the search from the previous position rather than restarting
from 1. The dict maps base name → last used count; each subsequent call
for the same base starts immediately after the last issued number.
The cache is wired through the instance:
__init__: self._naming_cache = {} (always initialised)
execute(): self._naming_cache = {} (fresh per generation)
collider_name: passes self._naming_cache as cache=
class_collider_name: forwards cache= to unique_name()
Direct callers outside the operator hierarchy (regenerate_name,
convert_to_shape, user_groups, preferences) pass no cache, preserving
their existing search-from-1 semantics.
TestUniqueNameDoesNotRestartCounterPerCall patches create_name_number with
a counting wrapper and verifies that N sequential unique_name() calls with
N pre-existing names make O(N) total calls, not O(N²).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…StructRNA Three test cases in TestRemoveObjectsCleansUpMeshData: 1. test_mesh_data_removed_with_objects: verifies that bpy.types.Mesh data blocks exclusively owned by removed objects (users == 1) are absent from bpy.data.meshes after the call. Pre-fix: bpy.data.objects.remove() is called per-object but leaves mesh data blocks orphaned. Fails with the current implementation. 2. test_shared_mesh_data_not_removed: verifies that mesh data shared by multiple objects is not removed. Passes with the current implementation (regression guard only). 3. test_handles_stale_object_reference: verifies that remove_objects() does not raise ReferenceError when called with a stale Python wrapper. get_pre_processed_mesh_objs() appends tmp_ob to self.tmp_meshes then removes it via the modifier-stack path; the next execute() cycle passes the stale reference back through remove_objects(). Fails with the current implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… StructRNA handling Three changes in a single final implementation: 1. Use bpy.data.batch_remove() instead of calling bpy.data.objects.remove() in a loop. batch_remove() processes all data-blocks in a single pass, avoiding per-object depsgraph invalidation. 2. Collect each object's associated bpy.types.Mesh data block (when users == 1, meaning the object owns it exclusively) and include it in the same batch_remove() call. Previously those blocks were left orphaned in bpy.data.meshes; for the loose-island case with ~11k islands this accumulated ~22k orphaned meshes per execute cycle. 3. Wrap per-object data access in try/except ReferenceError. get_pre_processed_mesh_objs() appends tmp_ob to self.tmp_meshes then removes it via the modifier-stack path; the stale Python wrapper remains in the list and accessing ob.data on a removed StructRNA raises ReferenceError. Stale entries are now skipped silently. Also consolidates the hand-rolled removal loop in cancel_cleanup() to use remove_objects(), so it gains the same batch behaviour and mesh cleanup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…firmation Two changes: 1. fix_inverse_matrix() gains update_depsgraph=True parameter. When False the final view_layer.update() is skipped, letting the caller perform one batch update after processing all objects. The standalone COLLISION_OT_FixColliderTransform operator continues to call it with the default True so its per-call behaviour is unchanged. 2. The confirmation loop in add_bounding_primitive.modal() is split into two passes: Pass 1 — origin recentre, custom rotation, modifier cleanup, display settings. Each collider is independent, so no depsgraph update is needed between iterations. Pass 2 — fix_inverse_matrix with update_depsgraph=False, bracketed by a single view_layer.update() before (so Pass-1 location changes are visible) and a single view_layer.update() after (to propagate matrix resets). For the 10k-island pathological case this reduces the number of depsgraph evaluations at confirmation from 2N ≈ 20,000 to 2. TestFixInverseMatrixUpdateDepsgraph verifies that the batched path (update_depsgraph=False + single update) produces identical vertex positions and matrix_world to the per-call path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
set_origin_to_center_of_mass() called evaluated_depsgraph_get() on every iteration. Each preceding obj.location = com assignment dirtied the depsgraph, so every subsequent call forced a full scene re-evaluation — O(N) per call × N colliders = O(N²) total. At 11k objects this caused the confirmation loop to hang for several minutes. Fix by adding an optional depsgraph parameter to set_origin_to_center_of_mass() and hoisting a single evaluated_depsgraph_get() call before the confirmation loop. Each object's evaluated data is still correct because it was unmodified when the depsgraph was fetched; the hidden per-iteration re-evaluations are eliminated entirely. TestSetOriginToCoMDepsgraphParam verifies that the pre-fetched depsgraph path produces vertex positions and object location identical to the per-call path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hi, Test File: ’ll do some more testing on my end to better understand the root cause, but I wanted to share this with you in case you have a chance to look into it. Let me know if you need more details or if there’s anything else I can provide to help debug this. |
|
Interesting! This is what I see when I try that file locally, which as far as I can tell is the same before and after the changes. |
|
(As a note - I also gave it a try on a Windows machine and see the same there, too, so at least that much is ruled out.) So maybe also a screenshot or copy of your current settings for the add-on could help! |
|
Looks like this only happens on Linux—Windows is fine. Any idea why the OS would make a difference? windows_losse_parts.mp4 |
|
Is it possible it is just a graphics issue, though the colliders themselves are correct? E.g., if you produce the colliders on Linux, save them, and open the file on Windows, does it look correct? What about going the other direction? |
|
Just to confirm: Is the weird behavior with the array-of-cubes a regression introduced by these changes, or does it also appear like that in |
|
main.mp4 |
|
OK I will set up a Linux VM locally and see if I can reproduce the issue! Am I right you are using Ubuntu? |
|
It appears that both |
|
I am using Mint (Ubuntu based) |
|
Just a quick update: I did manage to get a VM setup with Mint today, and can reproduce the problem on my end, so should be able to get it figured out :) |
py_verts stores v.co (a live BMesh-backed reference) rather than v.co.copy() (an independent Vector). After bm.free() those references are dangling; on Linux/glibc freed pages are quickly zeroed or reused so vertices collapse to the origin. On Windows/macOS the allocator leaves pages intact longer, hiding the bug — which is why loose-island collider positions are wrong on Linux but correct on the other platforms. The aliasing tests zero all BMVert coords immediately after construct_python_faces() / _get_face_islands() return, then assert that py_verts still holds the original values. This is fully deterministic: if py_verts aliases live BMesh memory the zeroing is immediately visible in the stored entries; independent copies are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…esh memory py_verts.append(v.co) stored a live mathutils.Vector backed by the BMesh's internal C memory pool. create_objs_from_island() calls bm.free() before iterating py_verts, leaving every stored reference dangling. On Linux/glibc the freed pages are quickly zeroed or reused, so all vertices collapse to the origin — producing colliders with wrong positions and sizes that differ from what was previewed. On Windows/macOS the allocator leaves freed pages intact long enough that the bug does not manifest. Fix: py_verts.append(v.co.copy()) — each entry is now an independent mathutils.Vector owned by Python, unaffected by bm.free(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Ok! I managed to fix it locally with d81fb2f. Please give it a try and see if it works again :) |
Weisl
left a comment
There was a problem hiding this comment.
The Use Loose Island command is working as expected now. Thank you for all your hard work. I appreciate the general performance and cleanup improvements, especially the significant reduction in dependency graph updates. This benefits the entire addon.
Looking at the code and checking the comments, I feel like I can learn a lot from you! Great update.
| def unique_name(name, digits=3, exclude=None): | ||
| """Function to find a unique name using a loop""" | ||
| count = 1 | ||
| def unique_name(name, digits=3, exclude=None, cache=None): |
There was a problem hiding this comment.
I'm having trouble understanding the cache parameter. It appears to be either "None" or an empty dictionary, but I'm not sure where it's set or used. I understand that it is used for counting in the naming process, but I don't know how.
There was a problem hiding this comment.
Hey! So the idea behind the default None value is just that other call sites outside add_bounding_primitive.py don't have to be updated to continue functioning as they were. (Though looking at this, there's only one other call site and it would probably also benefit from using the cache - I'll take care of that quick in a follow-up.)
Where we get a benefit is when we know we need to create a bunch of unique names in one go (i.e. tons of islands). In that case, we don't need/want to start counting from 1 every time we need a new name - we want to count from the last number we know we used. As you noted, the add_bounding_object base class basically creates an empty one when a new operator instance is created, but also each time the base execute() implementation is called. So we have a clean cache for each execute() call, which may get used thousands of times for that single run.





Issue
Generating colliders from a mesh with an EdgeSplit modifier and "Use Modifier Stack" disabled caused Simple Collider to hang indefinitely or crash with a
RecursionError. The root cause was thatapply_all_modifiers()was called unconditionally before island detection, splitting what should have been a single mesh into ~11k separate triangle islands. Once that many islands were in play, a cascade of quadratic and linear-per-island operations made the operator unusable. Post-hang generation also left thousands of orphaned mesh data blocks inbpy.data.meshes.The root causes were spread across several files and compounded each other; no single change was sufficient to unblock the pathological case.
Root causes and fixes
1. RecursionError + O(N²) island detection (
mesh_split_by_island.py)Problem:
_get_linked_faces()was a recursive DFS that hit Python's default recursion limit on large connected islands. Asys.setrecursionlimitworkaround masked a separateRecursionErrorin Welzl's algorithm (fixed in PR #594). Theextend()-based result accumulation was also O(N²) — every level of recursion copied partial results up the call stack.construct_python_faces()used a linear scan (v.co not in py_verts) for vertex deduplication, another O(N²) operation.get_face_islands()carried a mutable default argument (face_islands=[]) that caused islands from one call to bleed into the next.Fix: Convert both functions to iterative algorithms.
_get_linked_faces()uses an explicit stack andBMFace.tagto track visited faces — O(N) with no recursion._get_face_islands()iterates over all faces and skips already-tagged ones, eliminating the recursiveget_face_islands()wrapper and the mutable-default-argument bug.construct_python_faces()replaces the list scan with adictkeyed onBMVertobjects for O(1) lookup.bm.faces.ensure_lookup_table()is removed since the iterative code never accesses faces by integer index.Measured speedup (Apple M4 Max, Blender 5.0.1, 50×50 grid, 50 trials):
~84× speedup (median).
2. Modifiers applied unconditionally; edit-mode leak (
mesh_split_by_island.py,add_bounding_primitive.py)Problem:
apply_all_modifiers()was called at the loose-island split site regardless ofmy_use_modifier_stack. Any object with a modifier had it baked into the mesh before island detection even when "Use Modifier Stack" was OFF.create_objs_from_island()switched to edit mode viabpy.ops.object.mode_set()+bmesh.from_edit_mesh(), then calledbm.free()on the resulting bmesh — which is owned by Blender's edit-mesh system and must not be freed by the caller. The object was also left in edit mode after the function returned.Fix: Guard
apply_all_modifiers()withif self.my_use_modifier_stack. Replace themode_set + from_edit_meshpattern increate_objs_from_island()withbmesh.new() + bm.from_mesh(obj.data), which reads the same mesh data without any mode switch, owns the bmesh correctly, and leaves the object in OBJECT mode.3. O(N²) hide_set calls; O(N) depsgraph evaluations during parenting (
add_bounding_primitive.py)Problem:
primitive_postprocessing()(called once per collider) iteratedself.tmp_meshesand calledhide_set(True)on every entry — N iterations over N objects = N² calls for N islands (~121 million for 11k islands).custom_set_parent()usedbpy.ops.object.parent_set(), which triggers a full depsgraph evaluation on every call; with N islands that is N full rebuilds, each growing more expensive as the scene accumulates objects.Fix: Move the
tmp_mesheshide loop fromprimitive_postprocessing()toreset_to_initial_state()(called once perexecute()), reducing N² calls to N. Replacebpy.ops.object.parent_set()with direct Python matrix assignment:child.matrix_worldis only updated by the depsgraph; for cylinder and sphere collidersobj.locationis set after the last depsgraph-updating operation, somatrix_worldis stale and would capture the old cursor/origin position rather than the intended center.child.location,child.rotation_euler, andchild.scaleare always current because they are stored directly on the object, not computed by the depsgraph. This defers all depsgraph evaluation to Blender's natural update cycle.4. O(N²) name search in
unique_name()(add_bounding_primitive.py)Problem:
unique_name()reset its counter to 1 on every call. With N islands all sharing the same name base, totalbpy.data.objectslookups were 1 + 2 + … + N ≈ N²/2 — about 60 million for 11k islands.Fix: Add an optional
cacheparameter (a plaindictmapping base name → last used count). Each subsequent call for the same base starts from the cached position rather than restarting at 1, reducing total lookups to O(N). The cache is initialised in__init__(), reset to{}at the start of eachexecute(), and threaded throughcollider_name()andclass_collider_name(). All callers outside the operator hierarchy (e.g.regenerate_name,convert_to_shape) pass no cache and retain their existing search-from-1 semantics.5. Orphaned mesh data + stale StructRNA in
remove_objects()(add_bounding_primitive.py)Problem: The per-object
bpy.data.objects.remove()loop left associatedbpy.types.Meshdata blocks orphaned inbpy.data.meshes. For the 11k-island case this accumulated ~22k orphaned meshes per execute cycle. Additionally,get_pre_processed_mesh_objs()appendstmp_obtoself.tmp_meshesthen removes it via the modifier-stack path; the stale Python StructRNA wrapper remained in the list, and accessingob.dataon it raisedReferenceErroron the next call toremove_objects().Fix: Switch to
bpy.data.batch_remove(). Before building the removal list, collect each object'sbpy.types.Meshwhenob.data.users == 1(exclusively owned) and add it to the same batch. Wrap per-object data access intry/except ReferenceErrorto silently skip stale wrappers. Also consolidate the hand-rolled removal loop incancel_cleanup()to callremove_objects(), giving it the same batch behaviour and mesh cleanup.6. O(N²) depsgraph re-evaluations in the confirmation loop (
add_bounding_primitive.py,utility_operators.py)Problem: Two independent sources of per-object depsgraph updates in the confirmation loop compounded to 2N evaluations for N colliders (~20k for 10k islands):
set_origin_to_center_of_mass()calledevaluated_depsgraph_get()internally; each precedingobj.location = comdirtied the depsgraph, so every call forced a full scene re-evaluation (O(N²) total).fix_inverse_matrix()calledview_layer.update()once per object (N evaluations), and the confirmation loop called it again before eachfix_inverse_matrix()call (another N).Fix: Add an optional
depsgraphparameter toset_origin_to_center_of_mass(); hoist a singleevaluated_depsgraph_get()call before the loop and pass it to every iteration. Addupdate_depsgraph=Truetofix_inverse_matrix(); set it toFalsein the confirmation loop and bracket the entire pass with twoview_layer.update()calls (one before, one after), reducing 2N evaluations to 2.Test status
tests/test_mesh_split_by_island.py— 7/7 passtest_single_islandtest_two_islandstest_many_islandstest_connected_faces_single_islandtest_large_connected_islandtest_performance_large_connected_islandtest_performance_many_disconnected_islandstests/test_bounding_object_operator.py— 14/14 pass (1 skipped without macOS extension support)TestApplyModifiersGuard.test_modifier_not_applied_when_stack_disabledTestApplyModifiersGuard.test_modifier_applied_when_stack_enabledTestCreateObjsFromIsland.test_does_not_leave_object_in_edit_modeTestCreateObjsFromIsland.test_returns_one_object_per_islandTestCustomSetParentPreservesTransform.test_child_parent_is_setTestCustomSetParentPreservesTransform.test_child_world_transform_preservedTestCustomSetParentPreservesTransform.test_child_world_position_when_location_set_without_depsgraph_updateTestFixInverseMatrixUpdateDepsgraph.test_batched_result_matches_individualTestPrimitivePostprocessingDoesNotHideTmpMeshes.test_does_not_hide_tmp_meshesTestRemoveObjectsCleansUpMeshData.test_mesh_data_removed_with_objectsTestRemoveObjectsCleansUpMeshData.test_shared_mesh_data_not_removedTestRemoveObjectsCleansUpMeshData.test_handles_stale_object_referenceTestSetOriginToCoMDepsgraphParam.test_prefetched_depsgraph_matches_resultTestUniqueNameDoesNotRestartCounterPerCall.test_call_count_is_linear_not_quadraticTestPrimitivePostprocessingDoesNotHideTmpMeshesrequires the add-on to be fully enabled so that its customScene.active_physics_materialproperty is registered. On platforms not listed inblender_manifest.toml(currentlywindows-x64andlinux-x64) the extension fails to enable and the test is skipped cleanly. A later PR will addmacos-arm64support, but I confirmed the test passes on macOS with local changes.Files changed
bmesh_operations/mesh_split_by_island.pycollider_shapes/add_bounding_primitive.pycollider_operators/utility_operators.pyupdate_depsgraphparameter forfix_inverse_matrix()tests/test_mesh_split_by_island.pytests/test_bounding_object_operator.py