Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
040c25c
Add serialization size stats
josephbirkner Jan 14, 2026
9ec0d0b
Add byte array specialization and processing
Waguramu Feb 4, 2026
1240d9d
Add byte array
Waguramu Feb 4, 2026
5d1bf1e
Remove extraneous output
Waguramu Feb 4, 2026
c50a0f3
Use passkey pattern to make Node construction rights more obvious
josephbirkner Feb 4, 2026
988514e
Remove byteArrayData_ column
Waguramu Feb 5, 2026
b5e58d9
Introduce a generic ADL-based Model::resolve function.
josephbirkner Feb 6, 2026
6b01ea8
Address PR comments.
josephbirkner Feb 10, 2026
74b02fc
Add a Bytes value type which is returned by ValueType4CType<ByteArray…
Waguramu Feb 10, 2026
0d13c00
Merge remote-tracking branch 'origin/improvement/generic-resolve' int…
josephbirkner Feb 10, 2026
bd975b3
Fix mp_key usage.
josephbirkner Feb 10, 2026
2f93401
Merge pull request #135 from Klebert-Engineering/improvement/generic-…
josephbirkner Feb 12, 2026
2484898
Do not use k* prefixes. Use fmt::format instead of custom hex encodin…
Waguramu Feb 12, 2026
f59f6f7
Allow ByteArray conversion for strings, bool and int
Waguramu Feb 12, 2026
9f4b9a5
Fix duplication
Waguramu Feb 12, 2026
85304dc
Fix tests
Waguramu Feb 12, 2026
eb25799
Fix tests
Waguramu Feb 12, 2026
6976c70
Merge pull request #134 from Klebert-Engineering/byte-array
Waguramu Feb 13, 2026
019f2e2
Migrate storage containers to noserde::Buffer
josephbirkner Feb 18, 2026
fecb1ba
Point noserde CPM dependency to josephbirkner fork
josephbirkner Feb 18, 2026
8989316
Remove stale StringRange bitsery serializer
josephbirkner Feb 18, 2026
d667eba
Use vector<uint8_t> instead of stringstream.
josephbirkner Feb 19, 2026
5ed51b5
Enable fast serialization for ArrayArena.
josephbirkner Feb 19, 2026
3922313
Introduce compactHeads_ for arrays.
josephbirkner Feb 20, 2026
ae9f4ea
model: add ModelColumn and tagged type validation
josephbirkner Feb 24, 2026
d337c2e
model: Finish code orga for ModelColumn infrastructure.
josephbirkner Feb 24, 2026
d2f3936
test: migrate complex serialization reads to vector input
josephbirkner Feb 24, 2026
1337f58
Remove struct layout validator.
josephbirkner Mar 4, 2026
3b9c1b0
Simplify ModelColumn serialization wire format
josephbirkner Mar 4, 2026
9e3157f
Move singleton array storage to dedicated feature branch
josephbirkner Mar 4, 2026
d5313cc
Add fixedSize array flag.
josephbirkner Mar 4, 2026
c324b29
Add split TwoPart storage for object fields and array arenas
josephbirkner Mar 5, 2026
c97ea20
model: avoid slicing in model_ptr upcasts
josephbirkner Mar 9, 2026
e9c17b1
Merge remote-tracking branch 'origin/v0.6.3' into sync/noserde
josephbirkner Mar 9, 2026
e4b4ed2
Merge remote-tracking branch 'origin/noserde' into sync/split
josephbirkner Mar 9, 2026
9a3911b
model: address split storage review comments
josephbirkner Mar 9, 2026
f7cdab7
expr: Add Unique Identifier to Expressions
johannes-wolf Mar 10, 2026
75c11e6
diagnostics: Rework Diagnostics
johannes-wolf Mar 10, 2026
a52ceee
expr: Make eval Const Again
johannes-wolf Mar 10, 2026
c6e04e3
diagnostics: Cursor Fixes
johannes-wolf Mar 11, 2026
33c42ac
diagnostics: Remove Environment & AST Dependencies
johannes-wolf Mar 13, 2026
c80b031
Merge pull request #137 from Klebert-Engineering/feature/split-field-…
josephbirkner Mar 16, 2026
1223db0
Merge pull request #140 from Klebert-Engineering/rework-diagnostics-f…
josephbirkner Mar 16, 2026
202d867
model: document split storage and address review issues
josephbirkner Mar 16, 2026
87a0469
Merge pull request #136 from Klebert-Engineering/noserde
josephbirkner Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ add_library(simfil ${LIBRARY_TYPE}
src/value.cpp
src/overlay.cpp
src/exception-handler.cpp
src/expression-visitor.cpp
src/model/model.cpp
src/model/nodes.cpp
src/model/string-pool.cpp)
Expand All @@ -94,8 +95,10 @@ target_sources(simfil PUBLIC
include/simfil/transient.h
include/simfil/simfil.h
include/simfil/exception-handler.h
include/simfil/expression-visitor.h

include/simfil/model/arena.h
include/simfil/model/column.h
include/simfil/model/string-pool.h
include/simfil/model/model.h
include/simfil/model/nodes.h
Expand Down
39 changes: 32 additions & 7 deletions docs/simfil-dev-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,24 @@ Objects and arrays do not embed child nodes directly. Instead, they maintain `Mo

`StringPool` maintains the mapping between strings and the `StringId` integers stored in object fields. The base `Model` interface exposes `lookupStringId` so that serialization code such as `ModelNode::toJson` can recover human-readable field names. `ModelPool::setStrings` allows a pool to adopt a different `StringPool`, populating any missing field names along the way. This operation is used by higher-level components that need to merge data from several pools into a unified string namespace.

### ModelColumn

The primitive storage building block below `ModelPool` and `ArrayArena` is `ModelColumn<T, RecordsPerPage, StoragePolicy>`. A model column stores a single fixed-width record stream and exposes bulk byte operations for serialization and deserialization. The generic implementation accepts three families of types:

- fixed-width scalar types (`bool`, fixed-width integers, fixed-width enums, `float`, `double`)
- explicitly tagged external record types via `MODEL_COLUMN_TYPE(expected_size)`
- other approved native POD records that are trivially copyable and standard-layout

The column implementation assumes little-endian hosts and treats the in-memory representation as the wire representation. `bytes()` returns the canonical payload bytes for the current record stream; `assign_bytes()` and `read_payload_from_bitsery()` perform the inverse operation. For vector-backed columns this is one contiguous bulk copy; for segmented storage the same payload is copied chunk-by-chunk while preserving the same wire layout.

`RecordsPerPage` defines the number of records stored per page, not the page size in bytes. The effective page size is `RecordsPerPage * sizeof(T)`, and segmented storage requires that value to be a multiple of the record size. This keeps page boundaries aligned with record boundaries and lets callers reason about capacity in record counts instead of byte counts.

### Split pair columns with `TwoPart`

`TwoPart<A, B>` is a logical pair type used when a compound record should behave like `{A, B}` in C++ but should not pay struct-padding costs on the wire. `ModelColumn<TwoPart<A, B>>` specializes the generic column by storing the `first()` and `second()` members in two synchronized child columns. Reads and writes still happen through a pair-like ref proxy, but serialization concatenates the dense payload of the first column and the dense payload of the second column.

The main current use is object member storage. `detail::ObjectField` is defined as `TwoPart<StringId, ModelNodeAddress>`, so object fields still behave like `(name, value)` pairs while the wire payload remains dense and deterministic regardless of host padding rules.

### Value representation

`Value` is the runtime carrier for scalar and structured results:
Expand Down Expand Up @@ -127,25 +145,32 @@ classDiagram

`BaseArray<ModelType, ModelNodeType>` provides the generic implementation of array behaviour for model pools. It owns a pointer to an `ArrayArena<ModelNodeAddress, …>` and an `ArrayIndex` into that arena. The base class implements `type()` (always `Array`), `at()`, `size()`, and `iterate()` in terms of the arena. `Array` itself is a thin wrapper over `BaseArray<ModelPool, ModelNode>` that adds convenience overloads for appending scalars, which internally delegate to `ModelPool::newSmallValue` or `ModelPool::newValue` and then record the resulting address in the arena.

`BaseObject<ModelType, ModelNodeType>` plays the same role for object nodes. It stores key–value pairs as `{StringId, ModelNodeAddress}` elements inside an `ArrayArena`. The base class implements `type()` (always `Object`), `get(StringId)`, `keyAt()`, `at()` (interpreting the array as an ordered sequence of fields), and `iterate()`. The concrete `Object` subclass adds convenience `addField` overloads for common scalar types and an `extend` method that copies all fields from another `Object`.
`BaseObject<ModelType, ModelNodeType>` plays the same role for object nodes. It stores key–value pairs as `detail::ObjectField` elements inside an `ArrayArena`; that type is currently `TwoPart<StringId, ModelNodeAddress>`, so names and child addresses are physically stored in split columns while the API still behaves like a logical pair sequence. The base class implements `type()` (always `Object`), `get(StringId)`, `keyAt()`, `at()` (interpreting the array as an ordered sequence of fields), and `iterate()`. The concrete `Object` subclass adds convenience `addField` overloads for common scalar types and an `extend` method that copies all fields from another `Object`.

`ProceduralObject` extends `Object` with a bounded number of synthetic fields. These fields are represented as `std::function<ModelNode::Ptr(LambdaThisType const&)>` callbacks in a `small_vector`. Accessors such as `get`, `at`, `keyAt`, and `iterate` first consult the procedural fields and then fall back to the underlying `Object` storage. This pattern makes it possible to expose computed members alongside stored ones without materialising them permanently in the arena.

`OverlayNode` is an orthogonal mechanism that wraps an arbitrary underlying node and maintains a separate map `<StringId, Value>` of overlay children. Calls to `get` and `iterate` first visit the injected children and then delegate to the wrapped node. The overlay itself derives from `MandatoryDerivedModelNodeBase` and uses an `OverlayNodeStorage` `Model` implementation to resolve access.

### Array arena details

The `ArrayArena` template implements the append-only sequences used by arrays and objects. Conceptually, it manages a collection of logical arrays, each of which may consist of one or more “chunks” backed by a single `segmented_vector<ElementType, PageSize>`. A logical array is identified by an `ArrayIndex`. For each index, the arena stores a head `Chunk` in `heads_` and, if the array grows beyond the head’s capacity, additional continuation chunks in `continuations_`.
The `ArrayArena` template implements the append-only sequences used by arrays and objects. Conceptually, it manages a collection of logical arrays, each of which may use one of two physical representations:

- a regular growable chunk chain backed by `heads_`, `continuations_`, and `data_`
- a singleton handle backed by `singletonValues_` and `singletonOccupied_`

Regular arrays behave like the historical arena implementation. Each logical array is identified by an `ArrayIndex` and starts with a head `Chunk` in `heads_`. If the array grows beyond the head’s capacity, the arena allocates continuation chunks in `continuations_`. Each chunk records an `offset` into `data_`, a `capacity`, and a `size`. For a head chunk, `size` also tracks the total logical length of the array; for continuation chunks, `size` is local to that chunk. The `next` and `last` indices form a singly-linked list from the head to the tail chunk.

`new_array(initialCapacity, fixedSize)` controls which representation is chosen. If `fixedSize` is `false`, even `initialCapacity == 1` creates a regular growable array. If `fixedSize` is `true` and `initialCapacity == 1`, the arena instead returns a singleton handle. That handle represents a 0-or-1 element logical array with no head chunk allocation. This is useful for storage patterns where one-element arrays are common and known not to grow later.

Each `Chunk` records an `offset` into the `data_` vector, a `capacity`, and a `size`. For a head chunk, `size` also tracks the total logical length of the array; for continuation chunks, `size` expresses the number of valid elements in that chunk only. The `next` and `last` indices form a singly-linked list from the head to the tail chunk. `new_array(initialCapacity)` reserves a contiguous region in `data_`, initialises the head chunk with the offset and capacity, and returns a fresh `ArrayIndex`.
When a caller appends an element to a regular array via `push_back` or `emplace_back`, the arena calls `ensure_capacity_and_get_last_chunk_unlocked`. This function locates the current tail chunk (either the head or a continuation). If the tail still has spare capacity, it is returned directly; otherwise, the function allocates a new continuation chunk with capacity doubled relative to the previous tail, extends `data_`, links the new chunk into `continuations_`, and updates the head’s `last` pointer. Singleton handles do not use this growth path; they allow at most one element and reject further appends.

When a caller appends an element via `push_back` or `emplace_back`, the arena calls `ensure_capacity_and_get_last_chunk`. This function locates the current tail chunk (either the head or a continuation). If the tail still has spare capacity, it is returned directly; otherwise, the function allocates a new continuation chunk with capacity doubled relative to the previous tail, extends `data_` accordingly, links the new chunk into `continuations_`, and updates the head’s `last` pointer. This growth strategy guarantees amortised constant time for appends while avoiding large reallocations.
Element access via `at(ArrayIndex, i)` dispatches by representation. Singleton handles resolve directly against `singletonValues_`. Compact arenas resolve against the compact head metadata. Regular arrays walk the chunk list, subtracting full chunk capacities from the requested index until the index falls within the current chunk’s capacity and size. This keeps the public API uniform while allowing denser storage for the common singleton case.

Element access via `at(ArrayIndex, i)` walks the chunk list for the target array. It subtracts full chunk capacities from the requested index until the index falls within the current chunk’s capacity and size, and then returns a reference to `data_[offset + localIndex]`. This guarantees O(number_of_chunks) access in the worst case, but in practice the number of chunks per array remains small because capacities grow geometrically.
The arena also supports a compact serialization mode. In that mode, `compactHeads_` stores only `{offset, size}` metadata for each regular array, while `data_` already contains a dense payload without chunk gaps. Runtime head chunks are materialized lazily from `compactHeads_` when a later mutation requires growable chunk state again. This allows serialized arenas to stay compact without forcing the mutable runtime representation onto the wire.

The arena also provides higher-level iteration facilities. The `begin(array)`/`end(array)` pair yields an iterator over the elements of a specific logical array. The `iterate(ArrayIndex, lambda)` helper executes a callback on every element and supports two signatures: a unary callback receiving a reference to the element, and a binary callback receiving both the element and its global index. This is used by `BaseArray::iterate` to implement `ModelNode::iterate` efficiently without allocating intermediate containers.
The higher-level iteration facilities follow the same dispatch rules. `begin(array)`/`end(array)` iterate one logical array, while the top-level arena iterator skips the sentinel head entry and also yields singleton handles. `iterate(ArrayIndex, lambda)` supports unary callbacks receiving a value and binary callbacks receiving both a value and its logical index. This is used by `BaseArray::iterate` and `BaseObject::iterate` to expose child traversal without materializing temporary containers.

Thread-safety is conditional. If `ARRAY_ARENA_THREAD_SAFE` is defined, the arena uses a shared mutex to protect growth and element access. Appends and `new_array` take an exclusive lock only when allocating new chunks; reads can proceed with shared locks. Simfil itself does not require the arena to be thread-safe as long as model construction happens before concurrent evaluation, but the hooks are present for embedders that need concurrent writers.
Thread-safety is conditional. If `ARRAY_ARENA_THREAD_SAFE` is defined, the arena uses a shared mutex to protect growth and element access. Reads use shared locks, while mutations and compact-to-runtime materialization take an exclusive lock. Simfil itself does not require the arena to be thread-safe as long as model construction happens before concurrent evaluation, but the hooks are present for embedders that need concurrent writers.

## Parser, tokens, and AST

Expand Down
13 changes: 9 additions & 4 deletions docs/simfil-language.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ count(mylist.*)

## Types

Simfil supports the following scalar types: `null`, `bool`, `int`, `float` (double precision), `string` and `re`.
Simfil supports the following scalar types: `null`, `bool`, `int`, `float` (double precision), `string`, `bytes` and `re`.
Additionally, the `model` type represents compound object/array container nodes.
All values but `null` and `false` are considered `true`, implicit boolean conversion takes place for operators
`and` and `or` only.
Expand All @@ -151,6 +151,11 @@ The following types can be target types for a cast:
* `int` - Converts the value to an integer. Returns 0 on failure.
* `float` - Converts the value to a float. Returns 0 on failure.
* `string` - Converts the value to a string. Boolean values are converted to either "true" or "false".
* `bytes` - Converts the value to bytes.

Byte literals are written using the `b` prefix, e.g. `b"hello"` or `b'hello'`.
Escape sequences `\n`, `\r`, `\t`, `\\`, `\"`, and `\'` are supported.
Bytes can also be written explicitly using `\xNN` (hex), e.g. `b"\x41\x00"`.

## Operators

Expand All @@ -161,12 +166,12 @@ The following types can be target types for a cast:
| `[ a ]` | Array/Object subscript, index expression can be of type `int` or `string`. |
| `{ a }` | Sub-Query (inside sub-query `_` represents the value the query is applied to). |
| `. b` or `a . b` | Direct field access; returns the value of field `b` or `null`. |
| `a as b` | Cast a to type b (one of `bool`, `int`, `float` or `string`). |
| `a as b` | Cast a to type b (one of `bool`, `int`, `float`, `string` or `bytes`). |
| `a ?` | Get boolean value of `a` (see ##Types). |
| `a ...` | Unpacks `a` to a list of values (see function `range` under [Functions](#Functions) for example) |
| `typeof a` | Returns the type of the value of its expression (`"null"`, `"bool"`, `"int"`, `"float"` or `"string"`). |
| `typeof a` | Returns the type of the value of its expression (`"null"`, `"bool"`, `"int"`, `"float"`, `"string"` or `"bytes"`). |
| `not a` | Boolean not. |
| `# a` | Returns the length of a string or array value. |
| `# a` | Returns the length of a string, bytes, or array value. |
| `~ a` | Bitwise not. |
| `- a` | Unary minus. |
| `a * b` | Multiplication. |
Expand Down
129 changes: 129 additions & 0 deletions include/simfil/byte-array.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
// Copyright (c) Navigation Data Standard e.V. - See "LICENSE" file.
#pragma once

#include <cstdint>
#include <cstring>
#include <iterator>
#include <optional>
#include <string>
#include <string_view>
#include <utility>

#include <fmt/format.h>

namespace simfil
{

struct ByteArray
{
std::string bytes;

ByteArray() = default;

explicit ByteArray(const char* data)
: bytes(data)
{}

explicit ByteArray(std::string_view data)
: bytes(data)
{}

explicit ByteArray(std::string data)
: bytes(std::move(data))
{}

auto operator==(const ByteArray&) const -> bool = default;

[[nodiscard]] static std::optional<ByteArray> fromHex(std::string_view hex)
{
if (hex.size() % 2 != 0)
return std::nullopt;

std::string decoded;
decoded.reserve(hex.size() / 2);
for (size_t i = 0; i < hex.size(); i += 2) {
const auto upper = decodeHexNibble(hex[i]);
const auto lower = decodeHexNibble(hex[i + 1]);
if (upper < 0 || lower < 0)
return std::nullopt;
decoded.push_back(static_cast<char>((upper << 4) | lower));
}

return ByteArray{std::move(decoded)};
}

[[nodiscard]] std::optional<int64_t> decodeBigEndianI64() const
{
if (bytes.size() > 8) {
for (size_t i = 8; i < bytes.size(); ++i) {
if (static_cast<unsigned char>(bytes[i]) != 0)
return std::nullopt;
}
}

const size_t count = bytes.size() <= 8 ? bytes.size() : 8;
uint64_t value = 0;
for (size_t i = 0; i < count; ++i) {
value = (value << 8) | static_cast<unsigned char>(bytes[i]);
}

int64_t signedValue = 0;
std::memcpy(&signedValue, &value, sizeof(signedValue));

Check warning on line 71 in include/simfil/byte-array.h

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Replace "std::memcpy" invocation with "std::bit_cast".

See more on https://sonarcloud.io/project/issues?id=Klebert-Engineering_simfil&issues=AZxsfkph2jzZZj3JTdz2&open=AZxsfkph2jzZZj3JTdz2&pullRequest=133
return signedValue;
}

[[nodiscard]] std::string toHex(bool uppercase = true) const
{
std::string out;
out.reserve(bytes.size() * 2);

if (uppercase) {
for (unsigned char byte : bytes)
fmt::format_to(std::back_inserter(out), FMT_STRING("{:02X}"), byte);
} else {
for (unsigned char byte : bytes)
fmt::format_to(std::back_inserter(out), FMT_STRING("{:02x}"), byte);
}

return out;
}

[[nodiscard]] std::string toLiteral() const
{
std::string out;
out.reserve(bytes.size() + 3);
out += "b\"";

for (unsigned char byte : bytes) {
switch (byte) {
case '\\': out += "\\\\"; break;

Check warning on line 99 in include/simfil/byte-array.h

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Convert this string literal to a raw string literal.

See more on https://sonarcloud.io/project/issues?id=Klebert-Engineering_simfil&issues=AZxsfkph2jzZZj3JTdz3&open=AZxsfkph2jzZZj3JTdz3&pullRequest=133
case '"': out += "\\\""; break;

Check warning on line 100 in include/simfil/byte-array.h

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Convert this string literal to a raw string literal.

See more on https://sonarcloud.io/project/issues?id=Klebert-Engineering_simfil&issues=AZxsfkph2jzZZj3JTdz4&open=AZxsfkph2jzZZj3JTdz4&pullRequest=133
case '\n': out += "\\n"; break;
case '\r': out += "\\r"; break;
case '\t': out += "\\t"; break;
default:
if (byte < 0x20 || byte >= 0x7f)
fmt::format_to(std::back_inserter(out), FMT_STRING("\\x{:02X}"), byte);
else
out.push_back(static_cast<char>(byte));
break;
}
}

out.push_back('"');
return out;
}

[[nodiscard]] static auto decodeHexNibble(char c) -> int
{
if ('0' <= c && c <= '9')
return c - '0';
if ('a' <= c && c <= 'f')
return c - 'a' + 10;
if ('A' <= c && c <= 'F')
return c - 'A' + 10;
return -1;
}
};

} // namespace simfil
Loading
Loading