Skip to content

Commit 4ed4014

Browse files
pablogsalencukouvstinner
authored
gh-149202: Fix frame pointer unwinding on s390x and ARM (GH-149362)
-fno-omit-frame-pointer is not enough to make every target walkable by the simple manual frame pointer unwinder. The helper used by test_frame_pointer_unwind used to assume the frame pointer named a two-word record where fp[0] was the previous frame pointer and fp[1] was the return address. That is only the generic layout used by some targets. This patch keeps that default, but moves the slots behind named offsets so architecture-specific layouts can describe where the backchain and return address really live. On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames. Once backchains are present, the helper must also stop at the current thread's known C stack bounds; otherwise it can follow the final backchain far enough to dereference an invalid frame and segfault. For Linux s390x backchain frames, the documented z/Architecture stack-frame layout saves r14, the return-address register, at byte offset 112 from the frame pointer, so read the return address from that named slot instead of fp[1]. The 112-byte offset comes from Linux's s390 debugging documentation: its Stack Frame Layout table shows z/Architecture backchain frames with the backchain at offset 0 and saved r14 of the caller function at offset 112: https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout This helper remains scoped to Linux s390x backchain frames. GNU SFrame's s390x notes state that the s390x ELF ABI does not generally mandate where RA and FP are saved, or whether they are saved at all: https://sourceware.org/binutils/docs/sframe-spec.html#s390x As Jens Remus noted, -fno-omit-frame-pointer is not needed when -mbackchain is present. On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb prologue keeps the saved frame pointer and link register at offsets that depend on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper. Use -marm when it is supported for frame-pointer builds, and teach the helper the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the saved LR return address is at fp[0]. Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
1 parent 646853d commit 4ed4014

9 files changed

Lines changed: 302 additions & 26 deletions

File tree

Doc/howto/perf_profiling.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,8 +218,8 @@ How to obtain the best results
218218
------------------------------
219219

220220
For best results, keep frame pointers enabled. On supported GCC-compatible
221-
toolchains, CPython builds itself with ``-fno-omit-frame-pointer`` and, when
222-
available, ``-mno-omit-leaf-frame-pointer`` by default. These flags allow
221+
toolchains, CPython builds itself with ``-fno-omit-frame-pointer`` and similar
222+
flags (see :option:`--without-frame-pointers` for details). These flags allow
223223
profilers to unwind using only the frame pointer and not on DWARF debug
224224
information. This is because as the code that is interposed to allow ``perf``
225225
support is dynamically generated it doesn't have any DWARF debugging information

Doc/using/configure.rst

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -784,11 +784,19 @@ also be used to improve performance.
784784

785785
Disable frame pointers, which are enabled by default (see :pep:`831`).
786786

787-
By default, the build appends ``-fno-omit-frame-pointer`` (and
788-
``-mno-omit-leaf-frame-pointer`` when the compiler supports it) to
789-
``BASECFLAGS`` so profilers, debuggers, and system tracing tools
790-
(``perf``, ``eBPF``, ``dtrace``, ``gdb``) can walk the C call stack
791-
without DWARF metadata. The flags propagate to third-party C
787+
By default, the build appends flags to generate frame or backchain
788+
pointers to ``BASECFLAGS``:
789+
790+
- ``-fno-omit-frame-pointer`` and/or ``-mno-omit-leaf-frame-pointer``
791+
are added when the compiler supports them.
792+
- ``-marm`` is added on 32-bit ARM when supported,
793+
- on s390x platforms, when supported, ``-mbackchain`` is added *instead*.
794+
of the above frame pointer flags.
795+
796+
Frame pointers enable profilers, debuggers, and system tracing tools
797+
(``perf``, ``eBPF``, ``dtrace``, ``gdb``) to walk the C call stack
798+
without DWARF metadata.
799+
The flags propagate to third-party C
792800
extensions through :mod:`sysconfig`. On compilers that do not
793801
understand them, the build silently skips them.
794802

Doc/whatsnew/3.15.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2552,6 +2552,19 @@ Build changes
25522552
and :option:`-X dev <-X>` is passed to the Python or Python is built in :ref:`debug mode <debug-build>`.
25532553
(Contributed by Donghee Na in :gh:`141770`.)
25542554

2555+
.. _whatsnew315-frame-pointers:
2556+
2557+
* CPython is now built with frame pointers enabled by default
2558+
(:pep:`831`). Pass :option:`--without-frame-pointers` to opt out.
2559+
2560+
Authors of C extensions and native libraries built with custom build
2561+
systems should ensure the unwind chain is intact.
2562+
This is usually done by adding ``-fno-omit-frame-pointer`` and
2563+
similar flags to ``CFLAGS``. See :option:`--without-frame-pointers`
2564+
documentation for the specific flags Python uses.
2565+
2566+
(Contributed by Pablo Galindo Salgado and Savannah Ostrowski in :gh:`149201`.)
2567+
25552568
.. _whatsnew315-windows-tail-calling-interpreter:
25562569

25572570
* 64-bit builds using Visual Studio 2026 (MSVC 18) may now use the new

Lib/test/test_frame_pointer_unwind.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,16 @@
2121

2222

2323
def _frame_pointers_expected(machine):
24+
_Py_WITH_FRAME_POINTERS = getattr(
25+
_testinternalcapi,
26+
"_Py_WITH_FRAME_POINTERS",
27+
-1,
28+
)
29+
if _Py_WITH_FRAME_POINTERS > 0:
30+
return True
31+
if _Py_WITH_FRAME_POINTERS == 0:
32+
return False
33+
2434
cflags = " ".join(
2535
value for value in (
2636
sysconfig.get_config_var("PY_CORE_CFLAGS"),
Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
Enable frame pointers by default for GCC-compatible CPython builds, including
2-
``-mno-omit-leaf-frame-pointer`` when the compiler supports it, so profilers
3-
and debuggers can unwind native interpreter frames more reliably. Users can pass
4-
``--without-frame-pointers`` to opt out.
2+
``-mno-omit-leaf-frame-pointer``, ``-marm`` on 32-bit ARM, and/or ``-mbackchain``
3+
on s390x platforms when the compiler supports them, so profilers and debuggers
4+
can unwind native interpreter frames more reliably. Users can pass
5+
:option:`--without-frame-pointers` to ``./configure`` to opt out.

Modules/_testinternalcapi.c

Lines changed: 139 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,40 @@
6363
static const uintptr_t min_frame_pointer_addr = 0x1000;
6464
#define MAX_UNWIND_FRAMES 200
6565

66+
#ifdef __s390x__
67+
// Linux's s390 "Stack Frame Layout" table documents that z/Architecture
68+
// backchain frames start with the backchain at offset 0 and store "saved r14
69+
// of caller function" at offset 112. The same document's register table
70+
// identifies r14 as the return-address register, so this backchain unwinder
71+
// reads the return address from fp + 112.
72+
// https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout
73+
//
74+
// This is only for Linux s390x backchain frames. The s390x ELF ABI does not
75+
// generally mandate where RA and FP are saved, or whether they are saved at all.
76+
// https://sourceware.org/binutils/docs/sframe-spec.html#s390x
77+
# define S390X_FRAME_RETURN_ADDRESS_OFFSET 112
78+
#endif
79+
80+
// The generic manual unwinder treats the frame pointer as a two-word record:
81+
// fp[0] is the previous frame pointer and fp[1] is the return address. That is
82+
// not true for every architecture, even with frame pointers enabled, so these
83+
// offsets describe the actual slots used by each supported frame layout.
84+
#if defined(__arm__) && !defined(__thumb__) && !defined(__clang__)
85+
// GCC ARM mode keeps the caller's fp one word below fp and the saved LR at
86+
// fp[0], so the return address is not in the generic fp[1] slot.
87+
# define FRAME_POINTER_NEXT_OFFSET (-1)
88+
# define FRAME_POINTER_RETURN_OFFSET 0
89+
#elif defined(__s390x__)
90+
// s390x backchain frames keep the previous frame pointer at fp[0], but save the
91+
// return-address register in the ABI register save area rather than fp[1].
92+
# define FRAME_POINTER_NEXT_OFFSET 0
93+
# define FRAME_POINTER_RETURN_OFFSET \
94+
(S390X_FRAME_RETURN_ADDRESS_OFFSET / (Py_ssize_t)sizeof(uintptr_t))
95+
#else
96+
# define FRAME_POINTER_NEXT_OFFSET 0
97+
# define FRAME_POINTER_RETURN_OFFSET 1
98+
#endif
99+
66100

67101
static PyObject *
68102
_get_current_module(void)
@@ -329,15 +363,96 @@ get_jit_backend(PyObject *self, PyObject *Py_UNUSED(args))
329363
#endif
330364
}
331365

366+
static int
367+
stack_address_is_valid(uintptr_t addr, uintptr_t stack_min, uintptr_t stack_max)
368+
{
369+
if (addr < min_frame_pointer_addr) {
370+
return 0;
371+
}
372+
if (stack_min != 0 && (addr < stack_min || addr >= stack_max)) {
373+
return 0;
374+
}
375+
return 1;
376+
}
377+
378+
static int
379+
frame_pointer_slot_is_valid(uintptr_t *frame_pointer, Py_ssize_t offset,
380+
uintptr_t stack_min, uintptr_t stack_max)
381+
{
382+
uintptr_t fp_addr = (uintptr_t)frame_pointer;
383+
uintptr_t slot_addr;
384+
uintptr_t delta = (uintptr_t)Py_ABS(offset) * sizeof(uintptr_t);
385+
if (offset < 0) {
386+
if (fp_addr < delta) {
387+
return 0;
388+
}
389+
slot_addr = fp_addr - delta;
390+
}
391+
else {
392+
if (fp_addr > UINTPTR_MAX - delta) {
393+
return 0;
394+
}
395+
slot_addr = fp_addr + delta;
396+
}
397+
if (!stack_address_is_valid(slot_addr, stack_min, stack_max)) {
398+
return 0;
399+
}
400+
if (stack_max != 0) {
401+
if (slot_addr > UINTPTR_MAX - sizeof(uintptr_t)) {
402+
return 0;
403+
}
404+
if (slot_addr + sizeof(uintptr_t) > stack_max) {
405+
return 0;
406+
}
407+
}
408+
return 1;
409+
}
410+
411+
static int
412+
next_frame_pointer_is_valid(uintptr_t *frame_pointer, uintptr_t *next_fp,
413+
uintptr_t stack_min, uintptr_t stack_max)
414+
{
415+
uintptr_t fp_addr = (uintptr_t)frame_pointer;
416+
uintptr_t next_addr = (uintptr_t)next_fp;
417+
if (!stack_address_is_valid(next_addr, stack_min, stack_max)) {
418+
return 0;
419+
}
420+
if ((next_addr % sizeof(uintptr_t)) != 0) {
421+
return 0;
422+
}
423+
#if _Py_STACK_GROWS_DOWN
424+
return next_addr > fp_addr;
425+
#else
426+
return next_addr < fp_addr;
427+
#endif
428+
}
429+
332430
static PyObject *
333431
manual_unwind_from_fp(uintptr_t *frame_pointer)
334432
{
335-
int stack_grows_down = _Py_STACK_GROWS_DOWN;
433+
uintptr_t stack_min = 0;
434+
uintptr_t stack_max = 0;
435+
436+
#ifdef __s390x__
437+
Py_BUILD_ASSERT(S390X_FRAME_RETURN_ADDRESS_OFFSET % sizeof(uintptr_t) == 0);
438+
#endif
336439

337440
if (frame_pointer == NULL) {
338441
return PyList_New(0);
339442
}
340443

444+
PyThreadState *tstate = _PyThreadState_GET();
445+
if (tstate != NULL) {
446+
_PyThreadStateImpl *tstate_impl = (_PyThreadStateImpl *)tstate;
447+
#if _Py_STACK_GROWS_DOWN
448+
stack_min = tstate_impl->c_stack_hard_limit;
449+
stack_max = tstate_impl->c_stack_top;
450+
#else
451+
stack_min = tstate_impl->c_stack_top;
452+
stack_max = tstate_impl->c_stack_hard_limit;
453+
#endif
454+
}
455+
341456
PyObject *result = PyList_New(0);
342457
if (result == NULL) {
343458
return NULL;
@@ -357,7 +472,21 @@ manual_unwind_from_fp(uintptr_t *frame_pointer)
357472
MAX_UNWIND_FRAMES);
358473
return NULL;
359474
}
360-
uintptr_t return_addr = frame_pointer[1];
475+
if (!stack_address_is_valid(fp_addr, stack_min, stack_max)) {
476+
break;
477+
}
478+
if (!frame_pointer_slot_is_valid(frame_pointer,
479+
FRAME_POINTER_NEXT_OFFSET,
480+
stack_min, stack_max)) {
481+
break;
482+
}
483+
if (!frame_pointer_slot_is_valid(frame_pointer,
484+
FRAME_POINTER_RETURN_OFFSET,
485+
stack_min, stack_max)) {
486+
break;
487+
}
488+
uintptr_t *next_fp = (uintptr_t *)frame_pointer[FRAME_POINTER_NEXT_OFFSET];
489+
uintptr_t return_addr = frame_pointer[FRAME_POINTER_RETURN_OFFSET];
361490

362491
PyObject *addr_obj = PyLong_FromUnsignedLongLong(return_addr);
363492
if (addr_obj == NULL) {
@@ -372,22 +501,10 @@ manual_unwind_from_fp(uintptr_t *frame_pointer)
372501
Py_DECREF(addr_obj);
373502
depth++;
374503

375-
uintptr_t *next_fp = (uintptr_t *)frame_pointer[0];
376-
// Stop if the frame pointer is extremely low.
377-
if ((uintptr_t)next_fp < min_frame_pointer_addr) {
504+
if (!next_frame_pointer_is_valid(frame_pointer, next_fp,
505+
stack_min, stack_max)) {
378506
break;
379507
}
380-
uintptr_t next_addr = (uintptr_t)next_fp;
381-
if (stack_grows_down) {
382-
if (next_addr <= fp_addr) {
383-
break;
384-
}
385-
}
386-
else {
387-
if (next_addr >= fp_addr) {
388-
break;
389-
}
390-
}
391508
frame_pointer = next_fp;
392509
}
393510

@@ -3170,6 +3287,12 @@ module_exec(PyObject *module)
31703287
return 1;
31713288
}
31723289

3290+
#ifdef _Py_WITH_FRAME_POINTERS
3291+
if (PyModule_AddIntMacro(module, _Py_WITH_FRAME_POINTERS) < 0) {
3292+
return 1;
3293+
}
3294+
#endif
3295+
31733296
return 0;
31743297
}
31753298

configure

Lines changed: 103 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)