Skip to content

Fix PDB sizing/root parsing, LF_ARRAY sizing, and .pdata functions#33

Open
SaveEditors wants to merge 6 commits intozeroKilo:masterfrom
SaveEditors:codex/pdb-enum-import-fix
Open

Fix PDB sizing/root parsing, LF_ARRAY sizing, and .pdata functions#33
SaveEditors wants to merge 6 commits intozeroKilo:masterfrom
SaveEditors:codex/pdb-enum-import-fix

Conversation

@SaveEditors
Copy link

@SaveEditors SaveEditors commented Mar 23, 2026

Summary

  • fix imported PDB enums so they use the size of their underlying type instead of always using 8 bytes
  • fix root stream page counting so PDBs with sub-page root directories parse correctly
  • fix CodeView LF_ARRAY imports so the stored byte length is converted to an element count using the resolved element type size
  • turn .pdata entries into real Ghidra functions instead of leaving label-only Function_<addr> symbols
  • add headless validation scripts for PDB array-length regression checks and skipped-array auditing

What Was Broken

TPIStream.AddEnumType always created EnumDataType with a size of 8, even when the CodeView LF_ENUM record specified a smaller underlying type.

PDBFile also rounded root stream page counts incorrectly. When the PDB root directory or a root stream was smaller than one page, the loader could compute 0 pages and fail to read the stream layout correctly.

TPIStream.AddArrayType treated the CodeView LF_ARRAY length value as an element count. On real samples that produced oversized arrays whenever the PDB stored total byte length, which is the CodeView behavior.

.pdata processing only created labels. Ghidra would show Function_<addr> symbols without actually creating functions at those entry points.

What Changed

TPIStream now resolves enum storage size from the enum's CodeView underlying type. Primitive backing types map directly to the correct byte width, and non-primitive cases fall back to the resolved Ghidra datatype length.

PDBFile now uses correct page-count rounding for the root directory and root streams, so small modern PDBs are parsed instead of being dropped.

TPIStream now rebuilds arrays using the final resolved element size, converts byte length to element count, clears stale array datatypes before rebuild, and logs skipped-array reasons in aggregate so unresolved cases are no longer silent.

XEXHeader.ProcessPData now attempts disassembly and explicit function creation for each .pdata entry instead of only assigning a label.

Validation

  • reproduced the original enum bug with a regression harness: a byte-backed enum imported as 8 bytes before the fix
  • verified the patched loader imports byte/word/dword/qword enums as 1/2/4/8 bytes
  • verified a synthetic native PDB parses and imports end to end in headless Ghidra
  • Mesh.xex + Mesh.pdb: ValidatePdbArrayLengths.java reported checked=782 skipped=175 mismatches=0
  • Aurora.xex: headless import succeeded and .pdata defined 35421 functions
  • Mesh.xex: headless import succeeded and .pdata defined 3649 functions
  • JRPC2.xex: XeCLI rgh ghidra install-loader --archive <zip> installed this build cleanly into Ghidra 12.0.4, and rgh ghidra analyze then imported/analyzed successfully with .pdata defining 459 functions
  • XDRPC.xex: headless import succeeded and .pdata defined 488 functions
  • xbdm.xex: headless import succeeded and .pdata loaded 259 entries while defining 252 new functions
  • AuditPdbArrayImport.java on Mesh.pdb classified the remaining skipped arrays as 38 zero-length, 119 missing element type, and 18 non-divisible byte lengths; those are broader pre-existing type-resolution gaps, not regressions from this PR

Addresses #26, #30, and #31.

Validations were completed with the help of XeCLI.

@SaveEditors SaveEditors changed the title Fix PDB enum import sizing and root stream parsing Fix PDB sizing/root parsing, LF_ARRAY sizing, and .pdata functions Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant