Skip to content

fix: use compiled spdlog with hidden visibility to prevent symbol collision#332

Merged
jhchouuu merged 1 commit into
mainfrom
fix/spdlog-visibility-te-collision
May 21, 2026
Merged

fix: use compiled spdlog with hidden visibility to prevent symbol collision#332
jhchouuu merged 1 commit into
mainfrom
fix/spdlog-visibility-te-collision

Conversation

@jhchouuu
Copy link
Copy Markdown
Collaborator

@jhchouuu jhchouuu commented May 21, 2026

Summary

  • Switch spdlog from header-only (spdlog::spdlog_header_only) to compiled static library (spdlog::spdlog) with CXX_VISIBILITY_PRESET=hidden and VISIBILITY_INLINES_HIDDEN=ON, preventing spdlog symbols from leaking into MORI's dynamic symbol table.

Problem

When TransformerEngine is imported before MORI shmem init, the process crashes with free(): invalid size (SIGABRT).
Root cause: Both MORI and rocroller (loaded transitively via libtransformer_engine.solibhipblaslt.solibrocroller.so) statically link spdlog. TE loads its core library with ctypes.RTLD_GLOBAL, which promotes rocroller's spdlog symbols into the global namespace. MORI's spdlog calls (e.g. registry::initialize_logger) are then interposed by rocroller's copy, and cross-library object lifetime mismatch causes heap corruption.
GDB backtrace confirms the interposition:
#11 spdlog::details::registry::initialize_logger() ← from libmori_application.so #10 spdlog::sinks::ansicolor_sink::set_formatter() ← from librocroller.so (!) #7 spdlog::details::full_formatter::~full_formatter() ← from librocroller.so #6 free() → SIGABRT

Fix

Before After
target_compile_options(spdlog PRIVATE -fvisibility=hidden ...) set_target_properties(spdlog PROPERTIES CXX_VISIBILITY_PRESET hidden VISIBILITY_INLINES_HIDDEN ON)
spdlog::spdlog_header_only spdlog::spdlog

With compiled mode, spdlog is built into libspdlog.a with hidden visibility. When linked into MORI's .so files, the spdlog symbols are present but hidden from the dynamic symbol table — they cannot be interposed by rocroller's copy.

Test plan

  • TE container: mori-only → PASS
  • TE container: mori-then-te → PASS
  • TE container: te-then-mori → PASS (was CRASH)
  • JAX container: BUILD_XLA_FFI_OPS=ON pip install -e . → PASS
  • JAX container: AOT .hsaco kernel generation → PASS

…lision

Both MORI and rocroller (loaded via hipblaslt/TransformerEngine) statically
link spdlog. When TransformerEngine loads libtransformer_engine.so with
RTLD_GLOBAL, rocroller's spdlog symbols enter the global namespace and
interpose MORI's spdlog calls (e.g. registry::initialize_logger dispatches
into rocroller's set_formatter), causing heap corruption (free(): invalid
size / SIGABRT).

Switch from spdlog::spdlog_header_only to the compiled spdlog::spdlog
target and set CXX_VISIBILITY_PRESET=hidden + VISIBILITY_INLINES_HIDDEN=ON.
This compiles spdlog into a static archive with hidden visibility; when
linked into MORI's .so files, the spdlog symbols are not exported to the
dynamic symbol table and cannot be interposed.

The previous header-only mode (introduced in aceb70c for an XLA build fix)
made target_compile_options on the spdlog target ineffective because MORI
never linked the compiled spdlog library — the visibility flags were applied
to an unused target. The XLA build concern no longer applies after the
header decoupling work in commits 53a8513 and 40c0820.

Tested: TE (all 3 init orders PASS), JAX/XLA (BUILD_XLA_FFI_OPS=ON PASS).
@jhchouuu jhchouuu force-pushed the fix/spdlog-visibility-te-collision branch from 6d0a579 to a805e11 Compare May 21, 2026 06:12
@jhchouuu jhchouuu requested review from Copilot and pemeliya May 21, 2026 06:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts MORI’s build configuration to prevent runtime crashes caused by cross-library spdlog symbol interposition when other dependencies (e.g., rocroller via hipblaslt/TransformerEngine loaded with RTLD_GLOBAL) also statically link spdlog.

Changes:

  • Switch MORI’s logging dependency from spdlog::spdlog_header_only to the compiled spdlog::spdlog target.
  • Apply hidden symbol visibility settings to the spdlog target via CXX_VISIBILITY_PRESET=hidden and VISIBILITY_INLINES_HIDDEN=ON.
  • Add detailed build-system documentation in CMakeLists.txt explaining the symbol-collision failure mode being addressed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jhchouuu jhchouuu merged commit 8af97fc into main May 21, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants