Skip to content

Release: library system overhaul (v0.4.8)#106

Merged
thiagoralves merged 1 commit into
mainfrom
development
May 14, 2026
Merged

Release: library system overhaul (v0.4.8)#106
thiagoralves merged 1 commit into
mainfrom
development

Conversation

@thiagoralves
Copy link
Copy Markdown
Contributor

Summary

Promotes developmentmain to ship the library system overhaul (PR #105) as v0.4.8.

What's included

  • Library system overhaul (feat(libs): library system overhaul — disk-source pattern, OSCAT 100% V3 import, manifest hierarchy #105) — disk-source-of-truth pattern, OSCAT 100% V3 import parity, manifest folder hierarchy, function-level tree-shaking via per-symbol LibraryChunk slices, library.json metadata + documentation validation.
  • WSTRING literal codegen fix (L"..."u"..." — wchar_t is 32-bit on Linux/AVR).
  • DATE/TOD/DT literal codegen now emits nanosecond int64 values matching the runtime representation.
  • Diagnostic POU context annotation for programmatic consumers (OpenPLC Editor).

Test plan

🤖 Generated with Claude Code

… V3 import, manifest hierarchy (#105)

* feat(libs): add Additional Function Blocks library (RTC, INTEGRAL, DERIVATIVE, PID, RAMP, HYSTERESIS)

Brings IEC 61131-3 Annex E "Additional Function Blocks" over from
MatIEC's lib/*.txt as a new bundled .stlib archive. Sources live under
libs/sources/additional-function-blocks/ for inspection and round-trip
through the standard rebuild-libs script.

The six blocks are ported verbatim from MatIEC where the ST is pure
(INTEGRAL, DERIVATIVE, PID, RAMP, HYSTERESIS) and adapted minimally
where MatIEC used compiler-specific extensions (RTC). RTC's MatIEC
pragma `{__SET_VAR(data__->,CURRENT_TIME,,__CURRENT_TIME)}` is replaced
with a plain call to a new STruC++ runtime function CURRENT_DT(),
keeping the FB body in standard ST.

Runtime additions:
- CURRENT_DT() — wall-clock IEC_DT from std::chrono::system_clock,
  parallel to TIME() (scan-cycle elapsed). Registered in the std
  function registry.

Type checker:
- New IEC §6.6.2.2 date/time arithmetic rules:
    DT/DATE/TOD - DT/DATE/TOD = TIME (duration between instants)
    DT/DATE/TOD ± TIME       = DT/DATE/TOD (instant offset)
    TIME + DT/DATE/TOD       = DT/DATE/TOD (commutative addition)
  Without these, RTC's `OFFSET := PDT - CURRENT_TIME` (and any user
  date arithmetic) would fail type-check. C++ side already worked —
  these are int64_t aliases — so this is type-checker-only.

Build wiring:
- New script scripts/generate-additional-fb.mjs and matching
  npm run build:additional-fb that compiles the six .st sources into
  libs/additional-function-blocks.stlib, in dependency order
  (INTEGRAL/DERIVATIVE before PID).
- rebuild-libs.mjs now rebuilds the new archive between iec-standard-fb
  and OSCAT, and copies it into vscode-extension/bundled-libs.

Tests (24 new, all passing):
- tests/library/additional-fb-library.test.ts (14): manifest contents,
  per-FB signatures, recompilation from embedded sources, end-to-end
  user-program integration for every FB.
- tests/semantic/date-time-arithmetic.test.ts (10): the new DT
  arithmetic rule across all type pairs, plus negative cases that the
  rule must NOT permit (DT → TIME assignment, DT - DT → DT).

Total: 1596 → 1620 tests, 0 regressions.

* fix(libs): make RTC hardware-agnostic (drop CURRENT_DT() coupling)

The previous RTC body called CURRENT_DT() to read wall-clock time on
every scan, which would have prevented the FB from compiling on
hardware-agnostic targets such as Arduino (no std::chrono). The block
is supposed to be self-contained: count from absolute zero internally,
re-anchor on the rising edge of IN to the user-supplied PDT, then
advance from PDT at real-time rate.

New behaviour:

  - Internal state: ANCHOR_DT (DT, defaulted to zero) +
                    ANCHOR_TIME (TIME, defaulted to T#0s).
  - Every scan: CDT := ANCHOR_DT + (TIME() - ANCHOR_TIME).
  - On rising edge of IN: ANCHOR_DT := PDT, ANCHOR_TIME := TIME().
  - Q := IN, mirroring the input as before.

CURRENT_DT() stays in the runtime and the std-function registry — it
remains useful for user code that wants wall-clock time on platforms
that ship one. RTC just doesn't depend on it. Users who want RTC to
reflect wall-clock time can drive PDT externally with `clock(IN := edge,
PDT := CURRENT_DT())` on those platforms.

Tests:
- Updated the existing user-program integration test to exercise RTC
  across the rising edge in two scans (pre-anchor and post-anchor),
  rather than a single FALSE-IN snapshot.
- Added a regression guard that scans the embedded rtc.st (with
  comments stripped) and asserts the executable body never calls
  CURRENT_DT() — catches accidental re-coupling in future edits.

* fix(libs): RTC = MatIEC body verbatim with __CURRENT_TIME → TO_DT(TIME())

Earlier corrected one wrong assumption (CURRENT_DT() coupling), then
overshot in the opposite direction with a ground-up rewrite around an
ANCHOR_DT/ANCHOR_TIME pair. MatIEC's RTC was already hardware-agnostic:
its `__CURRENT_TIME` global is the same monotonic scan-cycle quantity
that STruC++ exposes through TIME(). The right move is a 1:1 substitution.

Body now mirrors lib/rtc.txt directly:

  CURRENT_TIME : DT ;
  ...
  CURRENT_TIME := TO_DT(TIME()) ;          (* was {__SET_VAR(__CURRENT_TIME)} *)
  IF IN THEN
    IF NOT PREV_IN THEN OFFSET := PDT - CURRENT_TIME ; END_IF ;
    CDT := CURRENT_TIME + OFFSET ;
  ELSE
    CDT := CURRENT_TIME ;
  END_IF ;
  Q := IN ;  PREV_IN := IN ;

The TO_DT() wrapper is the smallest accommodation STruC++ needs over
MatIEC: STruC++'s type checker requires an explicit conversion where
MatIEC's pragma did a raw bit-copy. Both reduce to the same int64
underneath, so the runtime semantics are identical.

The earlier-added regression guard (the embedded rtc.st must not call
CURRENT_DT() in its executable body) still applies and continues to
pass. The two-step rising-edge integration test also still passes
unchanged — RTC's external contract didn't change with this revert.

* feat(libs): library.json metadata + per-block documentation

Establishes libs/sources/<lib-name>/ as the canonical, on-disk source
of truth for hand-authored libraries. Each lib's source directory now
ships a library.json carrying every .stlib manifest field that isn't
derivable from ST: name, version, namespace, description, isBuiltin,
plus per-FB / per-function documentation strings surfaced in editor
hover dialogs.

Architecture:

  libs/sources/iec-standard-fb/
    library.json
    edge_detection.st, bistable.st, counter.st, timer.st

  libs/sources/additional-function-blocks/
    library.json
    integral.st, derivative.st, rtc.st, pid.st, ramp.st, hysteresis.st

  libs/<lib-name>.stlib            ← pure build artifact

The build flow:
  1. Read library.json
  2. Read .st files
  3. compileStlib(sources, options-from-config)
  4. Merge library.json's `blocks[name].documentation` (and
     `functions[name].documentation`) into the resulting manifest's
     functionBlocks[] / functions[] entries
  5. Validate every doc'd name appears in the compiled manifest —
     unknown names fail the build (catches typos and stale entries when
     an FB is renamed)

OSCAT stays on the legacy archive-round-trip path because its true
upstream is the CODESYS .library file at
tests/fixtures/codesys/oscat_basic_335_codesys3.library and the
codesys-importer doesn't yet write disk sources at build time. A TODO
in rebuild-libs.mjs notes the asymmetry as a follow-up.

Documentation prose for the 22 standard FBs and the 6 additional FBs
is taken from the openplc-editor's hardcoded library catalog
(src/frontend/data/library/{standard,additional}-function-blocks.ts)
verbatim, so the eventual migration of the editor onto these stlib
archives surfaces identical hover text to existing users.

New module: src/library/library-config.ts (LibraryConfig schema +
loadLibraryConfig + applyLibraryConfigDocumentation, with full
validation that fails loud on malformed JSON, missing required fields,
or doc entries referencing unknown symbols).

Manifest types: LibraryFBEntry and LibraryFunctionEntry gain optional
`documentation: string` fields. The earlier-added per-variable
documentation field on LibraryVarType was removed per the
"forget per-input/output docs" decision — we'll revisit if/when there's
a concrete use case.

Tests (21 new):
- tests/library/library-config.test.ts (18): loader validation,
  documentation merge, unknown-name reporting, case sensitivity.
- tests/library/additional-fb-library.test.ts (+2): every FB has a
  documentation string; RTC's prose matches the editor's verbatim.
- tests/library/std-fb-library.test.ts (+2): every FB has a
  documentation string; TON/TOF/TP prose matches the editor's verbatim.

Total: 1621 → 1642 tests, 0 regressions.

* build(libs): untrack hand-authored .stlib archives, make `npm run build` produce them

The .stlib archives are pure build artefacts now that
libs/sources/<lib-name>/ is the canonical source of truth (.st files
+ library.json). Tracking them in git creates stale-blob churn on
every library or compiler change — every PR that touches semantics or
a doc string would carry a 50-100KB binary diff.

Untracked:
  libs/iec-standard-fb.stlib
  libs/additional-function-blocks.stlib

Tracked exception (documented in .gitignore):
  libs/oscat-basic.stlib — kept until the codesys-importer can
  cleanly re-extract OSCAT from its CODESYS .library fixture. The
  importer currently drops the closing `*)` of trailing block
  comments on 397/559 sources, so a fresh re-import produces an
  archive that fails to recompile. Round-trip-from-archive remains
  the OSCAT path for now (already documented as a follow-up TODO).

Build flow:
  npm install && npm run build   →  generates libs/*.stlib
  vitest globalSetup runs the same script before tests so the test
  suite always sees freshly-built archives.

Build script changes:
  - "build" now runs scripts/rebuild-libs.mjs (which does tsc + lib
    generation in one step). The previous `tsc`-only behaviour is
    preserved as "build:tsc-only" for cases that genuinely just want
    type emission (rare).
  - "prepack" runs the full build so `npm pack` always sees libs.
  - "clean" now also wipes the gitignored .stlib outputs.

vscode-extension/esbuild.mjs: previously did
  fs.cpSync(libs/, bundled-libs/, { recursive: true })
which would have copied the new libs/sources/ subtree into every
.vsix at packaging time. Replaced with a selective copy that only
picks up *.stlib at the top level of libs/. The recursive cp was a
latent bug — sources are 50KB+ today and would only grow.

LibraryConfig schema: added optional `globalConstants?: Record<string,
number>` so libraries with compile-time integer constants (e.g.
OSCAT's STRING_LENGTH=254, LIST_LENGTH=254) can declare them in
library.json. Validator rejects non-finite numbers and array shapes.

Tests: +4 in tests/library/library-config.test.ts covering the new
globalConstants validation paths. Suite: 1642 → 1646 (no regressions).

Note on OSCAT migration follow-up:
  When the codesys-importer regression is fixed, OSCAT can move to
  libs/sources/oscat-basic/ with a library.json (already drafted in
  this branch but reverted along with the half-finished CODESYS-import
  build path), the .library fixture stays at tests/fixtures/codesys/,
  and rebuild-libs.mjs gains a third helper that runs the importer +
  compileStlib at build time. Round-trip path goes away. Until then,
  oscat-basic.stlib stays tracked.

* fix(libs): bring OSCAT onto the disk-source-of-truth model

OSCAT now follows the same pattern as the other libs: the canonical
source-of-truth is on disk under libs/sources/oscat-basic/ (the
binary CODESYS V3 .library file plus library.json), and the .stlib
archive in libs/ is a pure build artefact regenerated by `npm run
build`.

The fresh CODESYS V3 OSCAT 3.35 .library file replaces the older
fixture at tests/fixtures/codesys/oscat_basic_335_codesys3.library.
The V2.3 .lib stays in tests/fixtures/codesys/ since it's only a
v2.3-importer test fixture.

To make this work, three problems in the V3 importer had to be
addressed (the third is worked around, the first two are fixed):

1. Boundary record's column A was being dropped (used `i < boundary`
   instead of `i <= boundary`). For ~390 POUs whose trailing block
   comment closed at exactly the boundary record, the closing `*)`
   went missing and the source failed to recompile.
   Fix: src/library/codesys-import/v3-parser.ts, include the boundary
   record's column A.

2. Long implementations (> ~127 lines) overflow into the boundary
   record itself, with additional impl lines packed at stride-10
   varint positions (varint[0], [10], [20], …). The previous parser
   never read those, leaving DCF77, _ARRAY_SORT, HOLIDAY, SEQUENCE_4/8,
   ESR_MON_B8, CRC_GEN, SUN_POS and several others truncated.
   Fix: read the boundary record's stride-10 overflow entries.

3. The V3 GVL extractor still misses OSCAT's VAR_GLOBAL block (which
   instantiates LANGUAGE/MATH/PHYS/SETUP/LOCATION as instances of
   the CONSTANTS_* struct types). HOLIDAY and SUN_POS fail to compile
   without those globals.
   Workaround: a hand-authored libs/sources/oscat-basic/globals.st
   carries the VAR_GLOBAL block. The build script now appends any
   .st files in the lib's source dir to the imported sources, so
   supplements drop in without script changes. When the V3 GVL
   extractor is fixed, globals.st can be removed.

Two POUs (DCF77, UTC_TO_LTIME) still get truncated mid-revision-history
even with the boundary + stride-10 fixes — the importer drops the
trailing portion entirely. They're filtered out at build time with a
warning; CALENDAR_CALC is filtered transitively because its body calls
UTC_TO_LTIME and would otherwise leave an unresolved C++ symbol.
The new oscat-v3-library.test.ts pins the exclusion list so a future
parser fix surfaces here as a "this test should be removed" signal.

Net change vs the old archive: 8 POUs gained from the fresher v3
source (FLOW_CONTROL, SEQUENCE_64, SRAMP, TMAX, TMIN, TOF_1, TP_1,
TP_1D), 3 lost to the importer regression. The auto-generated OSCAT
g++ instantiation test now passes against the rebuilt archive.

LibraryConfig schema gains optional `globalConstants?: Record<string,
number>`; OSCAT's library.json declares STRING_LENGTH=254 and
LIST_LENGTH=254 there. Validator rejects non-finite or non-numeric
values.

rebuild-libs.mjs:
  - rebuildLibraryFromCodesys() runs the V3 importer over the .library
    file in the lib's source dir, then runs the same compile-and-write
    path the disk-backed libs use.
  - Drops POUs with unbalanced block comments + transitive callers.
  - Appends supplemental .st files (alphabetical) after the imported
    sources so they can reference imported TYPEs.

vscode-extension/esbuild.mjs: previously did
  fs.cpSync(libs/, bundled-libs/, { recursive: true })
which would have copied the new libs/sources/ tree into every .vsix.
Replaced with a selective top-level *.stlib copy.

Tests: +18 in tests/library/oscat-v3-library.test.ts (identity,
known-good FBs/functions/types, the new v3-only POUs, the
intentionally-skipped POUs). codesys-import.test.ts updated to point
the V3 importer test fixture at libs/sources/oscat-basic/oscat_basic_335.library.

Suite: 1646 → 1664 (+18, no regressions).

* fix(libs): improve V3 parser; revert OSCAT migration to round-trip

The previous commit moved OSCAT to libs/sources/oscat-basic/ and tried
to build it from the fresh CODESYS V3 .library file via the
codesys-importer. That worked for ~99.6% of OSCAT (557/559 POUs) once
the V3 parser bugs below were fixed, but covering the remaining 0.4%
required a stack of build-time workarounds (drop list, transitive
filter, hand-authored globals.st supplement) that the user correctly
pushed back on as hacky.

After deeper investigation I confirmed there are two genuine
reverse-engineering gaps in the V3 parser that I can't close without
significantly more time on the binary format. The parser improvements
themselves are real and worth keeping; the OSCAT migration is reverted
until the gaps are closed.

V3 PARSER IMPROVEMENTS (kept):

  src/library/codesys-import/v3-parser.ts

  1. Boundary record off-by-one. The implementation extraction loop
     used `i < boundary` and dropped the column-A varint of the
     boundary record itself, which is frequently the closing `*)` of
     a trailing block comment. Loop is now `i <= boundary`. This alone
     resolves about 390 OSCAT POUs that previously ended mid-comment
     and failed to recompile.

  2. Stride-10 overflow in fat boundary records. When a POU's body is
     longer than ~127 lines, CODESYS packs the overflow into the
     boundary record itself, with additional impl lines at varint
     positions [0], [10], [20], … The parser now reads those stride-10
     positions, recovering ~60-150 additional impl lines per long POU.
     This resolves the bulk of HOLIDAY, SEQUENCE_4/8, ESR_MON_B8,
     CRC_GEN, _ARRAY_SORT, SUN_POS and friends.

KNOWN GAPS (documented in v3-parser.ts module header):

  1. The very longest POUs (DCF77 / UTC_TO_LTIME, both with R62/R97
     boundary records of 1484/616+ varints) transition from stride-10
     to a non-uniform stride (~11) partway through. Pure stride-10
     over-reads (junk shared strings from later rows leak in); pure
     `varint > maxCol0` filtering under-reads (drops legitimate-but-
     -shared lines like `END_IF;`, `ELSE`, `*)`). 2/559 POUs.

  2. VAR_GLOBAL extraction misses OSCAT's GVL that instantiates
     LANGUAGE/MATH/PHYS/SETUP/LOCATION as instances of the CONSTANTS_*
     struct types (the struct TYPEs themselves are extracted cleanly).

OSCAT REVERT:

  libs/oscat-basic.stlib stays tracked again (gitignore exception).
  libs/sources/oscat-basic/ removed; the V3 .library fixture is back
  at tests/fixtures/codesys/oscat_basic_335_codesys3.library and the
  V3-importer tests reference it from there. rebuild-libs.mjs uses
  the simple round-trip-from-archive path for OSCAT (read embedded
  sources, recompile, write back) — no filter list, no transitive
  drop, no hand-authored supplements. The other two libs
  (iec-standard-fb, additional-function-blocks) keep building from
  libs/sources/ as before.

  When the two V3-parser gaps above are closed, OSCAT can move to
  libs/sources/oscat-basic/ alongside the other libs; the gitignore
  exception is removed; and rebuildOscatFromArchive goes away.

The new tests/library/oscat-v3-library.test.ts is dropped — it
codified an exclusion list that no longer applies.

Test suite: 1646 passing (no change from prior commit), all clean.

* fix(codesys-v3): extract VAR_GLOBAL blocks (gap #2 of 2 closed)

The V3 .library format encodes a GVL the same way as a POU at the
structural level — a 4-varint header record carries the section
keyword in column B (varint[1]), and subsequent records hold the body
in column A. The keyword is the only difference: `FUNCTION` /
`FUNCTION_BLOCK` / `PROGRAM` / `TYPE Foo :` / `VAR_GLOBAL` (optionally
CONSTANT, RETAIN, PERSISTENT).

The header-detection loop in extractFromRecords only matched the first
four. GVL objects fell through and were silently dropped; the only
remaining classification path (handleBareGVL) checks for
"VAR_GLOBAL"-prefixed declarations, but at that point the declaration
text has already been built without the header line, so it never
matched either.

Fix: add `GVL_DECL_RE.test(headerStr)` to the header-detection
condition. The classifyPOU GVL branch then picks it up and emits a
proper `GVL_N.gvl.st` source with VAR_GLOBAL header + body lines +
END_VAR — exactly what compileStlib expects.

Recovered for OSCAT 3.35:
- GVL_0: VAR_GLOBAL CONSTANT (STRING_LENGTH=250, LIST_LENGTH=250)
- GVL_1: VAR_GLOBAL (MATH/PHYS/LANGUAGE/SETUP/LOCATION instances of
         the CONSTANTS_* struct types — required for HOLIDAY,
         SUN_POS and the date-localisation helpers to compile)
- GVL_2: VAR_GLOBAL PERSISTENT RETAIN (empty)
- GVL_3: VAR_GLOBAL RETAIN (empty)

Why the V2.3 importer caught these and V3 didn't: V2.3 stores source
text inline in the binary so a regex sweep over the raw bytes
(`/VAR_GLOBAL.../`) finds the GVL block directly. V3 stores source as
indexed entries in a separate string table and references them from
per-POU .object files; the text never appears contiguously in the
binary, so a regex sweep wouldn't work even if we tried it. The V3
parser had to reach the same conclusion structurally — this commit
adds that one missing keyword to the header detector.

The remaining V3 reverse-engineering gap (long-impl boundary records
transitioning from stride-10 to non-uniform stride for DCF77 /
UTC_TO_LTIME) is unchanged. OSCAT's .stlib stays tracked until that's
also resolved.

Tests: +2 in tests/library/codesys-import.test.ts pinning the GVL
extraction (the CONSTANTS_* instances and the STRING_LENGTH /
LIST_LENGTH constants). Suite: 1648 passing (was 1646, no regressions).

* feat(libs): 100% OSCAT V3 import + folder hierarchy in manifests

Two related changes that together bring OSCAT fully onto the
disk-source-of-truth pattern the other bundled libs already use,
with the .library file as canonical source.

V3 codesys-importer:
  - Replace fixed stride-10 boundary-record extraction with a row-
    delimiter algorithm (rows terminate on a 7+ zero run). Long OSCAT
    POUs (DCF77, UTC_TO_LTIME) shift from 10-wide rows to 11-wide
    partway through the trailing comment block; the previous code
    over-read into junk.
  - Use R0[5] as the authoritative impl line count instead of
    inferring from boundary record length.
  - Header detection accepts both n=4 col-1 (FB / PROGRAM / TYPE /
    GVL) and n=3 col-0 (FUNCTION) shapes.
  - VAR_GLOBAL CONSTANT integer blocks promote to compileStlib's
    globalConstants option (constexpr template parameters) instead
    of emitting a runtime GVL the codegen can't use as a template
    argument. Empty PERSISTENT RETAIN GVLs are dropped.
  - Folder hierarchy from .meta files now surfaces as
    ExtractedPOU.category (e.g. "POUs/Time&Date").

Library hierarchy (manifest-only — codegen unchanged):
  - LibraryFunctionEntry / LibraryFBEntry / LibraryTypeEntry gain an
    optional category field (slash-separated path, undefined = root).
    archive.sources entries mirror it so --decompile-lib can recreate
    the folder layout without re-parsing source.
  - compileLibrary regex-derives POU-name → category from input
    sources and tags emitted manifest entries. Sources stay flat in
    the archive; only metadata describes hierarchy.
  - rebuildLibraryFromDisk walks subfolders recursively; a file at
    libs/sources/<lib>/some_category/foo.st gets category
    "some_category". Existing flat libs (iec-standard-fb,
    additional-function-blocks) emit no category fields, so their
    archives stay byte-identical to pre-hierarchy.
  - CLI --compile-lib <dir> derives categories from relative paths;
    --decompile-lib recreates the folder tree on disk. Flat archives
    extract flat (backwards compatible).

OSCAT migration:
  - libs/sources/oscat-basic/ now holds the canonical .library file
    plus library.json (codesysSource + STRING_LENGTH/LIST_LENGTH
    overrides for runtime compatibility). rebuild-libs runs the
    importer at build time.
  - libs/oscat-basic.stlib is no longer tracked; the .gitignore
    exception is removed. Build pipeline (vitest globalSetup, npm
    run build) refreshes it from the V3 source.
  - Removed the 1.2 MB duplicate .library file from
    tests/fixtures/codesys/; tests reference the canonical copy
    under libs/sources/.

Verified: 1653 tests pass, OSCAT extracts with zero warnings, DCF77
and UTC_TO_LTIME compile cleanly, decompile round-trip recreates the
OSCAT folder tree (POUs/Time&Date/DCF77.st, POUs/Buffer Management/
BUFFER_COMP.st, etc.) exactly as CODESYS displays it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(libs): auto-extract inline documentation blocks into manifest

CODESYS V3 .library files don't carry documentation as a separate
field — but every OSCAT POU embeds a structured `(* version X.Y /
programmer / tested by / description … *)` block right after its
VAR sections, and the same convention exists in the .object decl
records. Parse it during compileStlib and surface it as the
manifest entry's `documentation` field, eliminating the need for a
hand-curated `blocks: {}` / `functions: {}` doc map in library.json
for any codesys-imported library.

Library compiler:
  - `extractDocBlock(region)` finds the first `(* … *)` whose body
    contains version/programmer/tested-by trigger words, skipping
    inline variable annotations like `(* Laufvariable Stack *)`
    that some POUs put earlier in the source.
  - `buildDocByPouName` segments multi-POU sources at every top-
    level POU header (counter.st in iec-standard-fb concatenates 15
    counter variants; each region needs its own doc lookup).
  - Both this and `buildCategoryByPouName` now uppercase their map
    keys to bridge the parser's identifier-canonicalization gap —
    OSCAT's `FT_Profile` source name becomes `FT_PROFILE` in the
    manifest, and the lookup must hit either spelling.

library.json's existing `applyLibraryConfigDocumentation` post-step
still runs, so a hand-curated documentation entry there overrides
whatever the auto-extractor pulled from inline source. This keeps
the override path open for translations or curated rewrites without
regressing the auto-extraction win.

Coverage on OSCAT: 545/545 manifest entries (100%) of FUNCTIONs and
FUNCTION_BLOCKs gain documentation. Hand-authored libs without
inline doc blocks (iec-standard-fb, additional-function-blocks) are
unaffected — their library.json docs continue to apply unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codesys-v3): structural doc extraction (drop trigger-word regex)

The previous doc extractor scanned source text for `(* … *)` blocks
containing OSCAT-style trigger words ("version", "programmer",
"tested by"). That heuristic was content-based and fragile: any
hand-authored library that didn't follow OSCAT's convention would
miss out, and a body comment that happened to contain those words
could be mis-tagged as documentation.

CODESYS actually reserves a deterministic structural slot for the
POU's variables-pane comment: the records of the .object decl
sub-object that come AFTER the last `END_VAR` (FBs / FUNCs / PROGs)
or `END_TYPE` (TYPEs). Body comments live in the impl sub-object so
they cannot bleed into this slot, and inline variable annotations
like `(* Laufvariable Stack *)` always sit BEFORE the last END_VAR
so they cannot shadow the doc either. Whatever `(* … *)` block lives
in that trailing decl slot IS the POU's documentation, regardless of
what words it happens to contain.

Switch the V3 importer to extract from this slot at the record level
in `extractDocFromDeclRecords`. Documentation now flows from
ExtractedPOU.documentation → pouToSources → compileLibrary's
`source.documentation` field → manifest entries. The library compiler
no longer scans source text at all; if `source.documentation` isn't
set the entry stays undocumented (and library.json's
`applyLibraryConfigDocumentation` post-step still applies as the
override mechanism for hand-authored libs).

Coverage on OSCAT (auto-extracted, no library.json doc map needed):
  - 373/373 FUNCTIONs
  - 172/172 FUNCTION_BLOCKs
  - 13/14 TYPEs (FRACTION ships with no trailing comment, correctly
    extracted as undocumented)

Tests pin both the structural correctness and the anti-fragility:
BUFFER_COMP starts its body with `(* search for first character
match *)` and we verify that comment does NOT appear in its
documentation field — the V3 record split keeps body comments out
of the doc slot by construction.

LibraryTypeEntry now also carries `documentation?: string` (was a
schema gap — TYPEs had `category` but no doc field).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(libs): add SEMA to iec-standard-fb (parity with MatIEC)

MatIEC's iec_std_FB.h ships 23 standard function blocks but our
iec-standard-fb library only carried 22 — SEMA was overlooked when
the disk-source pattern was first set up. The implementation comes
straight from MatIEC's lib/sema.txt verbatim (it's a 2-line latch
on top of CLAIM / RELEASE inputs producing a BUSY output) and the
documentation string matches what openplc-editor's
standard-function-blocks.ts already exposes for SEMA.

After this commit our standard FB lib matches MatIEC byte-for-byte:
  - bistable.st   → SR, RS                            (2)
  - sema.st       → SEMA                              (1)
  - edge_detection.st → R_TRIG, F_TRIG                (2)
  - counter.st    → CTU/CTD/CTUD × {base,DINT,LINT,
                    UDINT,ULINT}                      (15)
  - timer.st      → TP, TON, TOF                      (3)
                                                  Total: 23

The additional-function-blocks library was already complete
(DERIVATIVE, HYSTERESIS, INTEGRAL, PID, RAMP, RTC) — verified
against MatIEC's *_st.txt + rtc.txt set, all six FBs and their
docs were already present.

Tests updated: archive-FB-count assertion bumped 22→23, SEMA added
to the cross-validation source list, and a new IO-signature test
pins SEMA's CLAIM/RELEASE inputs and BUSY output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(libs): synthesize iec-std-functions.stlib from StdFunctionRegistry

The strucpp compiler bakes IEC 61131-3 standard functions (ADD/SUB/
MUX/SHL/CONCAT/SQRT/...) into a runtime registry — they're intrinsics,
not compiled .stlib content. Editor tooling that wants to render
"Arithmetic" / "BitShift" / "TypeConversion" / etc. as drag-and-drop
library entries previously had to ship its own hand-mirrored copy of
the registry.

This commit makes the registry the canonical source: rebuild-libs
walks `StdFunctionRegistry.getAll()` and emits a metadata-only
.stlib (no sources, no codegen output) carrying every std function
as a `LibraryFunctionEntry`. Tooling reads it through the same path
it uses for compiled libraries — agnostic of where the manifest
came from.

Schema:
  - LibraryFunctionEntry gains `variadic?: { minArgs: number }` so
    variable-arg functions like ADD/MUL (≥ 2 operands) and MUX
    (≥ 2 inputs after a selector) survive the round-trip.
  - Generic-type names (ANY_NUM, ANY_INT, ANY_REAL, ANY_BIT, ANY_STRING,
    ANY_ELEMENTARY, ANY) flow through as bare strings; consumers unify
    by matching identical names across params and return type. The
    library-manifest docstring spells out the contract.

Synthesis (scripts/rebuild-libs.mjs):
  - `synthesizeStdFunctionsLibrary` runs alongside the disk-backed
    and codesys-import paths.
  - Categories from the registry's lowercase identifiers
    (numeric/trig/arithmetic/...) map to capitalized display names
    (Numerical/Arithmetic/...) — both numeric and trig fold into
    "Numerical" since IEC 61131-3 groups them visually; the rest are
    one-to-one.
  - applyLibraryConfigDocumentation runs as a post-step, so
    libs/sources/iec-std-functions/library.json can ship a
    `functions: { ADD: { documentation: "..." } }` map later without
    touching this script.

Output: libs/iec-std-functions.stlib carries 81 functions across 10
display categories (Numerical: 16, Arithmetic: 5, Selection: 6,
Comparison: 6, Bitwise: 4, BitShift: 4, TypeConversion: 20,
CharacterString: 9, Time: 6, System: 5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(libs): add displayName to library manifest

Editor library trees and package-manager UIs need a human-readable
label for each library — kebab-case identifiers like
`iec-standard-fb` or `oscat-basic` work fine for filenames and
dependency declarations but read poorly as folder titles in a
drag-and-drop block picker. Add an optional `displayName` field that
flows from `library.json` straight through to the compiled manifest.

`displayName` is purely advisory metadata: when unset, consumers fall
back to `name`, so existing .stlib archives that don't ship one
serialize identically to the pre-displayName format.

Bundled libraries:
  - iec-standard-fb        → "Standard Function Blocks"
  - additional-function-blocks → "Additional Function Blocks"
  - oscat-basic            → "OSCAT Basic"
  - iec-std-functions      → "Standard Functions"

Wired through both build paths:
  - rebuild-libs.mjs disk-backed and codesys-import paths set
    `manifest.displayName` from `config.displayName` after compile.
  - synthesizeStdFunctionsLibrary picks it up the same way.

LibraryConfig validator accepts the new field as optional string;
malformed values surface at build time alongside the existing schema
errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codesys-v3): strip redundant top-level POUs/ prefix from categories

CODESYS V3 libraries follow a fixed project-explorer convention: every
code object (FUNCTION_BLOCK, FUNCTION, PROGRAM) lives under a top-
level "POUs" folder, types under "Data types", globals under "Global
Variables". When the archive is compiled into a single .stlib, the
library itself becomes the POUs container — re-emitting the "POUs"
folder inside it nests every block one level deeper than downstream
tooling expects.

OSCAT showed the symptom most loudly: the editor's library tree
rendered

  OSCAT Basic
    POUs                            ← redundant
      Buffer Management
      Engineering/automation
      …

instead of

  OSCAT Basic
    Buffer Management
    Engineering/automation
    …

Strip the prefix in the V3 parser when assigning the per-POU
`category`. POUs sitting directly under "POUs" with no subcategory
become uncategorized (root-level under their library); deeper paths
lose their leading segment. Other top-level folders pass through
unchanged — "Data types" and "Global Variables" carry meaningful
classification information that downstream tooling uses to render
TYPEs / GVLs differently.

Tests updated: the OSCAT folder-hierarchy assertion now expects
`Time&Date` / `Buffer Management` / `Mathematical` / `Logic/Others`
rather than the `POUs/`-prefixed versions, and explicitly rejects any
surviving POUs prefix so a regression here surfaces immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codegen): #undef SP in generated header to dodge AVR macro collision

Compiling a project that pulled in additional-function-blocks for an
AVR target failed with

  generated.hpp:169:14: error: expected unqualified-id before 'volatile'
       IEC_REAL SP;
                ^

`<avr/io.h>` defines `SP` as `(*(volatile uint8_t *)(0x3D))` for the
AVR Stack Pointer; once the preprocessor sees PID's `IEC_REAL SP;`
struct member, the field name expands into garbage and avr-gcc bails.
Macros are global by definition and expand before namespace
resolution, so neither namespace wrapping nor renaming the IEC
variable inside the library would help — the user can't be expected
to rename PID's standard `SP` setpoint pin to dodge a target-specific
header.

Add `#undef SP` next to the existing `#undef OVERFLOW` (math.h
collision on macOS / glibc) in the generated header, after the
runtime includes and before the namespace open. The undef is a no-op
on platforms where the macro isn't defined, so it's safe
unconditionally; conditional emission keyed on target keeps showing
up as the wrong layer (codegen doesn't know whether the project will
ultimately compile under avr-gcc or g++).

Trigger: any project pulling in additional-function-blocks via the
strucpp library system saw this on AVR builds because PID's struct
declaration ends up in the project's generated.hpp regardless of
whether PID itself is instantiated. Reported when adding an RTC FB
to an Irrigation Controller project; RTC isn't the offender, but it
caused the additional-function-blocks bundle to be linked, dragging
PID along.

Test added in tests/backend/codegen.test.ts pins the list of undef'd
macros and their position in the header (after the last `#include`,
before the namespace open) so a regression on either dimension
surfaces here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(runtime): make CURRENT_DT() compile on AVR + add platform override

avr-gcc's libstdc++ ships `<chrono>` but omits `system_clock`, so the
unconditional `system_clock::now()` in CURRENT_DT() failed to
type-check on Arduino targets even when nothing in the user's program
called CURRENT_DT(). Inline functions are still parsed, so the body
needs to be valid for every target.

Resolution priority CURRENT_DT() now follows (highest first):

  1. `__CURRENT_DT_NS` when non-zero — the platform integration
     delivered a real wall-clock value (RTC chip, NTP, OpenPLC v4
     runtime syscall wrapper, etc.). Honoured on every target. VPP
     packages targeting hardware with an RTC override (1) by writing
     `__CURRENT_DT_NS` from their platform glue; nothing else in the
     runtime needs to change.
  2. std::chrono::system_clock on hosted targets — REPL, test runner,
     g++ builds without an explicit RTC. Same behaviour as before.
  3. `__CURRENT_TIME_NS` (time since program start) on AVR. The
     scan-cycle clock the runtime already maintains is monotonic and
     advances per scan, so programs that diff two CURRENT_DT()
     readings still see meaningful elapsed time — just measured from
     boot rather than from the Unix epoch. Better than returning
     `IEC_DT(0)` (a fixed Unix epoch) which would make every diff
     come out as zero.

The chrono path is excluded from compilation on AVR via `#ifdef
__AVR__`, so even an `if (false)`-style runtime guard isn't enough —
the body needs to be wholly absent on the target where the symbols
don't exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(types): canonical IEC base-type registry shipped as iec-types.json

Hoist every IEC 61131-3 elementary-type fact (byte size, logical bit
width, signedness, C++ alias, wire format, PLCopen TC6 XML mapping)
into a single hand-authored TS module that's also emitted as
libs/iec-types.json on every npm run build. Downstream tooling — the
OpenPLC Editor's variables table / debugger / XML emitter, the future
xml2st replacement — reads the JSON instead of maintaining its own
type tables, eliminating the multi-source drift that surfaced when
WSTRING was added on the strucpp side but never flowed through.

In-repo, type-registry and type-utils now derive from the same source
(removing two stale duplicates of the elementary-type list); the AST
ELEMENTARY_TYPES map is built from the canonical registry's `bits`
field so BOOL keeps its IEC-spec 1-bit logical width while every other
type collapses to byteSize*8.

Tests: a 38-case pin suite covers entry shape, byte/bit invariants,
PLCopen XML naming convention, lookup helpers, and the on-disk JSON
artefact staying in sync with IEC_BASE_TYPES.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(types): strict STRING/WSTRING separation + nanosecond date literals

Reported scenario: a WSTRING program variable initialised with a
STRING literal (`'foo'`) silently produced broken C++:

    error: no matching function for call to
    'IECWStringVar<254>::IECWStringVar(const char [22])'

Per IEC 61131-3 the two literal kinds are NOT interchangeable —
single quotes mean STRING, double quotes mean WSTRING. The bug had
several layers; this commit closes all of them and adds the strict
checking the standard implies.

Frontend
  * AST builder: WSTRING (`"foo"`), DATE (`D#…`), TIME_OF_DAY
    (`TOD#…`), and DATE_AND_TIME (`DT#…`) literals were lexed but
    not built — they fell through to the default INT 0 branch and
    silently corrupted every downstream type check. Each gets its
    own LiteralExpression case now.

Type-checker
  * New checkVarBlocks pass validates every variable initialiser
    against its declared type using the same `validateAssignment`
    used by AssignmentStatement. Without it, a STRING literal in a
    WSTRING declaration reached codegen unchecked.
  * type-utils canonicalises elementary names through the iec-types
    registry so DT ↔ DATE_AND_TIME and TOD ↔ TIME_OF_DAY are treated
    as the same type for assignability (they are aliases per the
    IEC standard, not separate types).

Codegen
  * STRING literals → `"…"` (const char*), WSTRING literals →
    `u"…"` (const char16_t* — what IECWStringVar's string ctor binds
    to). `L"…"` (wchar_t) was wrong: 32-bit on Linux/AVR.
  * Default WSTRING declaration (no initialiser) emits `u""`.
  * DATE/TOD/DT literals were emitted as the raw IEC string verbatim
    (`TOD#12:00`) — invalid C++ and only "worked" by accident
    because the AST-builder gap fed 0 instead. Now parsed to int64
    nanoseconds via new `parseDate/Tod/Dt LiteralToNs` helpers.

Runtime
  * STRING_TO_WSTRING / WSTRING_TO_STRING (templated, plus aliases
    for the wrapped *Var types) — codepoint-by-codepoint transcoding
    in the BMP. Out-of-range codepoints get the documented lossy
    narrow rather than a hard failure.
  * iec_std_lib.hpp now `#include`s the string headers it depends
    on so the conversion templates are visible without callers
    pulling them in separately.

Tests: a 10-case pin suite covers parser tagging, type-checker
rejection of mismatches, codegen u/no-prefix emission, struct-field
emission, and STRING_TO_WSTRING / WSTRING_TO_STRING resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(diagnostics): annotate CompileError with POU + section context

Errors are emitted from many places (parser, AST builder, project
model, semantic analyzer, codegen) all carrying just (line, column,
file). Programmatic consumers like the OpenPLC Editor need richer
location data: which POU, which section (var-block vs body), and a
body-relative line number that matches what the user sees in Monaco.

A single post-pass walks the AST after parsing/analysis, builds an
interval table per POU, and decorates every error in place with the
new fields. Standalone CLI use is unaffected — formatDiagnostic still
prints the absolute (file, line) — but consumers reading the result
get enough structure to route diagnostics into the right POU tab and
remap body-line numbers without parsing error strings or doing their
own AST walk.

Var-block errors keep the file line directly (the editor's vars-text
view aligns with the per-POU file's line numbering); body errors gain
a bodyLine offset 1-indexed from the first body statement; var-block
errors anchored at a specific declaration also gain variableName so
editors can highlight the exact row.

Tests cover the user-reported WSTRING-init scenario, plain body
mismatches, multi-POU programs (one error per POU), and the
defensive line=0 fall-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(il): apply IL→ST transpilation to every additionalSources entry

Phase 0 IL detection ran only on the primary source. The OpenPLC
Editor's program.st splitter feeds per-POU files via
additionalSources, and any of them can be IL — without this pass an
IL POU like

    FUNCTION_BLOCK State_Display
      VAR State : INT; END_VAR
      LD State
      ST Out
    END_FUNCTION_BLOCK

reaches the ST parser as raw IL and fails with the gigantic chevrotain
"expecting one of these 1400 token sequences but found 'LD'" error
that's almost impossible for users to read.

Each additional source now goes through the same transpileILSource
call as the primary, with errors attributed to the source's own
fileName.

Tests pin the multi-source IL case directly so a future refactor
that re-introduces the gap fails loudly here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(diagnostics): prefer bodyLine over file line in formatDiagnostic

The OpenPLC Editor's console renders a `[POU / body line N]` prefix
above each error using the new POU-context fields, then prints
strucpp's gcc-style snippet underneath.  formatDiagnostic kept using
the raw file line for both the header column (`Manual_Override.st:21`)
and the snippet gutter (`21 |`), so the user saw two different line
numbers for the same error:

    [MANUAL_OVERRIDE / body line 9]
    Manual_Override.st:21:10: error: ...
       21 |   asd := "hello world";

When `error.section === 'body'` and `error.bodyLine` is set, the
header column and gutter both display `bodyLine` — the same number
the user sees in Monaco — while the source content is still read from
`error.line` (where the text physically lives in the per-POU file).
For non-body errors the behaviour is unchanged: `error.line` flows
through as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(diagnostics): make body-line preference opt-in in formatDiagnostic

The previous commit unconditionally swapped to bodyLine for body
diagnostics, which would change line numbers for the strucpp CLI and
the vscode-extension consumers — both of which want the absolute file
line they've always reported.

formatDiagnostic gains an optional third parameter
`{ preferBodyLine?: boolean }`. Default `false` ⇒ existing callers
(CLI, vscode-extension) see no behaviour change; pin tests added to
prove the file-line rendering is unchanged when no option is passed.
The OpenPLC Editor passes `{ preferBodyLine: true }` to get the
Monaco-aligned numbering it already advertises in its bracketed POU
prefix.

Var-block errors are deliberately a no-op even with the flag set:
their `line` already aligns with the editor's vars-text Monaco view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(library): introduce LibraryChunk type for per-symbol tree-shaking

Phase 1 of function-level tree-shaking. Adds the chunked representation
to the library type system so subsequent phases can populate and consume
it incrementally without breaking existing emit paths.

- `LibraryChunk` — one top-level symbol's contribution to the library's
  header + cpp output, plus its dep edges to other symbols.
- `LibraryChunkDep` — a graph edge naming the owning library and the
  target symbol. Same-archive edges use the literal `"this"`.
- `LibraryCompileResult.chunks` and `StlibArchive.chunks` — optional in
  Phase 1; populated from Phase 2, becomes authoritative in Phase 4
  when the legacy `headerCode`/`cppCode` blob fields are retired.

No runtime behaviour change: nothing emits or reads `chunks` yet. The
type addition exists so Phase 2's compiler instrumentation and Phase 3's
codegen shake can be developed against a real type rather than
gradually-typed scaffolding.

Verification: tsc clean, full vitest suite green (1724 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(codegen): emit chunk-boundary markers around top-level declarations

Phase 2a of function-level tree-shaking. Adds an opt-in compile option
that wraps each top-level emitted declaration with comment markers in
both header and cpp output:

  //@chunk:begin:<kind>:<NAME>
  ... declaration body ...
  //@chunk:end:<kind>:<NAME>

Kinds: `type` (struct/enum/alias), `inlineGlobal`, `functionBlock`,
`function`. Programs are marked as `functionBlock` because their emitted
shape matches an FB (class + per-TU implementation).

Plumbed through:
- `CompileOptions.emitChunkMarkers` — public surface flag, off by default
- `CodeGenOptions.emitChunkMarkers` — internal flag forwarded to
  `TypeCodeGenOptions` so type-codegen can wrap each generated type
- `emitHeaderChunkMarker` / `emitCppChunkMarker` — no-op when off

The library compiler (Phase 2b) will turn this on and slice the emitted
header/cpp into per-symbol chunks for the new `LibraryChunk[]` field.
Production callers leave the flag off; their output is identical to
before.

Verification:
- `tsc --noEmit` clean
- Full vitest suite green (1724 passed)
- Smoke test confirms markers appear when flag is on (6 header + 4 cpp
  markers for a small Type+FB+Program fixture) and stay absent when off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(library): extract per-symbol chunks + dep graph at library compile

Phase 2b of function-level tree-shaking. With Phase 2a's chunk markers
in place, the library compiler now slices emitted header/cpp into
per-symbol chunks and computes the cross-symbol dep graph by walking
the AST.

New module `src/library/library-chunks.ts` owns three things:

  - `extractChunkSlices(code, eol)` parses the codegen markers in a
    single output stream into per-symbol text slices and returns the
    marker-stripped text for the legacy headerCode/cppCode blobs.

  - `buildChunks(header, cpp, eol, ast, libName, deps)` merges header
    and cpp slices by `<kind>:<name>` key, preserves declaration
    order, and stitches each chunk together with its dep edges.

  - `computeChunkDeps` walks the chunk's AST subtree collecting
    `FunctionCallExpression`, `VariableExpression`, `TypeReference`,
    `FunctionBlockDeclaration.extends/implements`, and
    `InterfaceDeclaration.extends` references; resolves them through
    a symbol→library ownership map (built from the archive's own AST
    plus each declared dep's manifest); drops anything unresolved
    (built-ins, codegen-injected helpers); sorts deterministically.

`compileLibrary` always enables chunk markers now (the option is
internal to the codegen — production user compiles don't touch this
path). After the compile, it builds the chunks, replaces the
result's `headerCode`/`cppCode` with the marker-stripped text, and
exposes the chunks on the result.

`compileStlib` carries the chunks through into `StlibArchive.chunks`.
Phase 4 will switch the consumer codegen to read these and retire
the legacy blob fields.

Verification:
- 11 new vitest tests in `tests/library/library-chunks.test.ts`
  covering chunk extraction across all four kinds, intra-library
  type/function-call/global-read dep edges, cross-library dep
  edges, deterministic dep ordering, self-dep elimination, and
  marker-stripping of the archive blobs.
- All 1735 tests pass (1724 existing + 11 new); no regressions.
- Bundled libs auto-rebuilt with chunks via `rebuild-libs.mjs`'s
  pre-test setup:
    iec-standard-fb              → 23 chunks (FBs)
    additional-function-blocks   → 6 chunks
    oscat-basic                  → 564 chunks (172 FB + 373 fn +
                                    14 type + 5 inline-global —
                                    MATH/PHYS/LANGUAGE/SETUP/LOCATION)
    iec-std-functions            → 0 chunks (synthetic, no ST source)
- Archive `headerCode`/`cppCode` blobs contain zero chunk markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(codegen): function-level tree-shake — emit only reachable chunks

Phase 3 of function-level tree-shaking. The consumer codegen now
emits only the library chunks reachable from the user's AST,
following the chunk dep graph baked into each archive at library
compile time. Library-level inclusion (today's behaviour: enable
oscat-basic = inline its entire ~7000-line C++ blob) is retired.

New tree-shake driver in `src/index.ts`:

  - `collectUsedSymbols(ast, archives, includeEverything)` replaces
    `collectUsedLibraries`. Walks the user's AST collecting names from
    every cross-symbol reference kind (FunctionCallExpression,
    VariableExpression, TypeReference, FB extends/implements,
    InterfaceDeclaration extends). Resolves each name through a
    symbol→chunk index, BFS-traverses the chunk dep edges, and
    returns `Map<libraryName, Set<chunkName>>`. Test builds bypass
    the shake.

New emission API on `CodeGenerator`:

  - `addLibraryChunks(archive, reachableNames)` replaces
    `addLibraryPreamble(name, headerCode, cppCode)`. Header and cpp
    emission loops iterate per-archive in load order; for each
    archive only chunks whose name appears in `reachable` are
    emitted, in `chunks[]` array order (= declaration order). FB
    chunks get a `class X;` forward decl emitted first so
    intra-library FB-to-FB references resolve regardless of
    declaration order.

`library-loader.ts` carries the `chunks` field through
`loadStlibArchive` (previously dropped during validation).

Verification on bundled libs:

  - Program using only `TON`:
      headerCode = 1782 bytes
      TON: included; DRIVER_1, LANGUAGE, MATH, GEN_SIN: stripped
  - Program using only `DRIVER_1`:
      headerCode = 2268 bytes
      DRIVER_1 + TON (its declared dep): included
      OSCAT inline globals, OSCAT helpers: stripped

This resolves the AVR per-variable-size error that previously
prevented any project enabling OSCAT (`CONSTANTS_LANGUAGE`,
`CONSTANTS_MATH`, etc. — 30–80 KB inline globals) from compiling
on Mega targets, even when nothing referenced those globals.

All 1735 tests pass; no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(library): retire legacy headerCode/cppCode blob fields

Phase 4 of function-level tree-shaking — cleanup pass. With Phase 3
moving every consumer to chunk-based emission, the library-wide
`headerCode` / `cppCode` blob fields on `StlibArchive` no longer
serve any purpose: they're redundant with `chunks` (whose
concatenation yields the same text) and offer no API the codegen
still uses.

Removed:

  - `StlibArchive.headerCode` and `StlibArchive.cppCode` fields. The
    archive carries `chunks` (now required, no longer optional) as
    its single source of compiled C++ output.
  - `compileStlib`'s `extractNamespaceBody` + `stripDependencyPreambles`
    post-processing. Chunks own only their own declaration's text by
    construction — there's no dependency-preamble code to strip
    because nothing inlines dependencies into a child library's
    output anymore.
  - `stripDependencyPreambles` helper from `library-utils.ts` (no
    callers).
  - `headerCode` / `cppCode` validators in `loadStlibArchive`. Replaced
    with a `chunks` array validator so a synthesised `.stlib` that
    forgets the field fails fast at load time instead of silently
    producing empty output downstream.

Updated `scripts/rebuild-libs.mjs` so the synthetic `iec-std-functions`
archive emits `chunks: []` (it has no codegen output — it's a pure
manifest, the runtime ships the C++ implementations).

Test fixtures and assertions migrated from the old blob shape to
chunks (e.g. assert `chunks.length > 0` instead of
`headerCode/cppCode` truthiness; assert concatenated chunk text
contains expected symbols instead of grepping a blob). Tests that
used to validate "rejects missing headerCode" / "rejects missing
cppCode" collapsed into one "rejects missing chunks" check.

The legacy `extractNamespaceBody` export stays — it's part of the
public API and may still be useful for decompile / tooling consumers
that work with the user's compile output (not library archives).

Verification: 1734 tests pass; no regressions. One test count
delta vs. Phase 3 is the headerCode+cppCode validator pair merging
into a single chunks validator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(library): chunk-shake correctness fuzz across all bundled libs

Phase 5 of function-level tree-shaking — correctness gate. For every
FB and every type chunk in every bundled library (oscat-basic,
iec-standard-fb, additional-function-blocks), the fuzz synthesises a
minimal `PROGRAM Main; VAR ... END_VAR; END_PROGRAM` that references
that one symbol in isolation, then compiles with strucpp and asserts
the compile succeeds.

A failure means the chunk's dep edges (computed at library compile
time by `collectReferencedNames`) missed something — most likely an
implicit conversion, an IEC intrinsic dispatch path, or a transitive
type referenced only inside a struct field. The remediation is in
the extractor in `src/library/library-chunks.ts`, not in the fuzz.

Coverage:
  - Every functionBlock chunk: standalone instance declaration
  - Every type chunk: standalone variable declaration
  - Function chunks and inline-global chunks intentionally skipped —
    they're transitively covered by FB / type fuzz (any FB or type
    that pulls a function / global pulls them through dep edges).

Result: 218 fuzz cases pass.
  - 23 iec-standard-fb FBs
  - 6 additional-function-blocks FBs
  - 172 oscat-basic FBs
  - 14 oscat-basic types
  - 3 "library has at least one shake-relevant chunk" sanity assertions

A `STANDALONE_EXEMPT` set is provided for symbols that legitimately
can't be referenced standalone (abstract bases, types needing init
args). Currently empty — every bundled symbol in the current library
set is directly instantiable.

Full suite: 1952 tests pass; no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(codegen): drop #undef SP — no longer reachable post-glue split

`#undef SP` was guarding against `<avr/io.h>` redefining `SP` as the
AVR stack-pointer register. After the Arduino-side runtime glue
split (openplc-editor's arduino_runtime_glue.{cpp,h} + slimmed
Baremetal.ino), no translation unit that parses `generated.hpp`
also includes `<avr/io.h>`:

  - The Arduino .ino has `<Arduino.h>` (→ `<avr/io.h>`) but no
    longer includes `generated.hpp`.
  - Library-side .cpp files include `generated.hpp` but never pull
    `<avr/io.h>` (`openplc.h` is platform-clean; iec_*.hpp don't
    reach for AVR headers).
  - Runtime v4 and host test/REPL builds are Linux/Mac — `<avr/io.h>`
    isn't even on the include path.

`SP` therefore never becomes a macro in any TU we emit into, and the
undef is dead code. `#undef OVERFLOW` stays — it's still hit on every
platform via `<cmath>` → `<math.h>`'s legacy SVID error constant.

Test in `tests/backend/codegen.test.ts` updated to pin only the
OVERFLOW undef + its placement (after includes, before namespace open).

Verification: 1952 tests pass; no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codegen): generic std-fn calls with FB outputs + bare literals

Two related codegen bugs that produced unbuildable C++ from valid
IEC ST in the GARAGEDO_CTL test project:

1. `inferExprType` for `VariableExpression` ignored `fieldAccess`,
   so `CTU_UDINT2.CV` (UDINT output of a CTU_UDINT FB instance)
   inferred as `CTU_UDINT` — the FB type, not the field's element
   type.  The downstream literal-cast in `harmonizeStdFuncArgs`
   then synthesised `static_cast<IEC_CTU_UDINT>(10)`, an alias that
   doesn't exist because FBs (unlike types) don't emit
   `IEC_<NAME>` aliases.  Fix: walk `fieldAccess` through
   `resolveMemberType`, returning the leaf field's type.

2. `harmonizeStdFuncArgs` skipped casting a bare literal whenever
   its inferred IEC type matched the dominant variable type
   (`MUL(int_var, 10)` where both inferred as INT).  C++ template
   deduction still failed: the variable side lowers to
   `IECVar<int>` while a bare literal lowers to raw `int` —
   distinct `T`s.  Fix: always cast bare literals when paired with
   at least one IECVar argument, regardless of IEC-type match.

Together these produce well-typed C++ for every pattern the
GARAGEDO_CTL pou_WELL.cpp surfaced:

  DIV(CTU_UDINT2.CV, static_cast<IEC_UDINT>(10))     ← was IEC_CTU_UDINT
  DIV(CTU3.CV,       static_cast<IEC_INT>(10))       ← was IEC_CTU
  MUL(WATER_METER_DR, static_cast<IEC_INT>(10))      ← was missing entirely

Verification: 4 new regression tests in
`tests/backend/codegen-functions.test.ts` cover both bugs across
INT/UDINT and REAL operand types and both vanilla / library-FB
sources.  Full suite green (1956 passed, +4 from this change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(codesys-import): drop unused headerCol tracking

The header column index was captured and immediately discarded via
`void headerCol;` — leftover from incremental development. The decl
body walk only needs `headerRecIdx`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thiagoralves thiagoralves merged commit 23435b5 into main May 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant