nergal-perm · nergal-perm · Jun 30, 2025 · Jun 29, 2025 · Jun 29, 2025 · Jun 29, 2025
diff --git a/.env.example b/.env.example
@@ -0,0 +1 @@
+GEMINI_API_KEY=your_api_key_here
diff --git a/.github/workflows/mvn.yml b/.github/workflows/mvn.yml
@@ -12,7 +12,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
-        os: [ubuntu-20.04, windows-2022, macos-12]
+        os: [ubuntu-latest, windows-latest, macos-latest]
         java: [17]
     steps:
       - uses: actions/checkout@v4

diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
+.env
+
 # IntelliJ IDEA
 .idea
 .attach*

diff --git a/GEMINI.md b/GEMINI.md
@@ -0,0 +1,29 @@
+This is the library that provides an engine for processing decision tables in Java. The initial 
+development went through several iterations, and the library has been tested in production. But 
+the initial design has some flaws, and the code is not very modular, extensible, or maintainable.
+
+I am planning a rewrite of the java-decita library. My main goal is to make it more modular,
+i.e. implement different table storage formats or different logic extensions (like the new row 
+types for tables) as pluggable modules. It implies creating the functional core of the library 
+which is free of such modules and their implementation details. 
+
+The functional core should be very simple and generic, meaning that it should specify generic 
+interfaces for all its components, and the components should be implemented as separate 
+pluggable modules. The core should not depend on any specific implementation of the components,
+and the components should not depend on the core. The core should only provide a generic logic for
+making decisions based on the decision tables, and the components should provide the specific
+implementations that are needed for the specific use cases.
+
+The important thing to note is that the core should be pure, i.e. it should not produce any side 
+effects. It should only provide a way to process the decision tables and return the results. 
+It's the application's responsibility to handle the results and produce side effects if needed.
+
+The second important thing is that the core should use some kind of snapshot of all the loaded 
+decision tables, their rules and conditions. This snapshot should be immutable and should be 
+used as a basis for concrete decision-making sessions. Probably it will be better to represent 
+this snapshot as a graph, where nodes are decision tables, rules, conditions and specific data 
+accessors, and edges are the relationships between them.
+
+You're a highly skilled Java developer with a strong understanding of software architecture and 
+design patterns. Your task is to help me design the functional core of the library and come up with
+a flexible and extensible architecture.
diff --git a/doc/GEMINI.md b/doc/GEMINI.md
@@ -0,0 +1,35 @@
+When working on this project, you must adhere to the following guidelines:
+
+1. **Adhere to Existing Architectural Decision Records (ADRs):** Before making any changes, review
+   the existing ADRs located in the `doc/adr` directory. Your work must be consistent with the
+   decisions documented there, and you should not contradict or ignore any of the accepted ADRs.
+   Superseded ADRs should be ignored, but you should still be aware of them for historical context.
+
+2. **Create New ADRs When Appropriate:** For any significant architectural decision, you must 
+   propose a new ADR, discuss it with user and then create the ADR doc. A decision is considered 
+   "significant" if it affects the overall structure, dependencies, or technical approach of the 
+   project. New ADRs should be created by the user with the help of the `adr` tool. That's because
+   the tool assumes immediate editing of the newly created ADR interactively, which is not supported
+   by gemini-cli.
+
+   A new ADR should be linked to the relevant existing ADRs, if applicable. The linking can be done
+   using the `adr link` command, which allows you to specify the relationship type (e.g., 
+   "amends"). The command to link an ADR is:
+   ```bash
+   adr link SOURCE LINK TARGET REVERSE-LINK
+   ```
+   where `SOURCE` is the number of the ADR you are linking from, `LINK` is the type of link (e.g., 
+   "amends"), `TARGET` is the number of the ADR you are linking to, and `REVERSE-LINK` is the 
+   type of link in the opposite direction (e.g., "is amended by").
+
+3. **ADR Format:** Each new ADR must follow the standard format presented in 
+   `doc/adr/templates/template.md`:
+    * **Status:** One of following: "Accepted" or "Superseded"
+    * **Brief Summary:** A concise description of the decision, typically in the form of "We are 
+      going to do something because we want something else".
+    * **Context:** Describe the problem or the need that requires a decision.
+    * **Options:** Present and analyze at least two (preferably 4-5) potential solutions.
+    * **Decision:** Clearly state the chosen solution. Justify why the chosen solution was 
+      selected over the alternatives.
+    * **Consequences:** Discuss the implications of the decision, including any risks or 
+      challenges that may arise from it.
diff --git a/doc/adr/0001-record-architecture-decisions.md b/doc/adr/0001-record-architecture-decisions.md
@@ -0,0 +1,19 @@
+# 1. Record architecture decisions
+
+Date: 2025-06-29
+
+## Status
+
+Accepted
+
+## Context
+
+We need to record the architectural decisions made on this project.
+
+## Decision
+
+We will use Architecture Decision Records, as [described by Michael Nygard](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).
+
+## Consequences
+
+See Michael Nygard's article, linked above. For a lightweight ADR toolset, see Nat Pryce's [adr-tools](https://github.com/npryce/adr-tools).
diff --git a/doc/adr/0002-separation-of-engine-and-connectors.md b/doc/adr/0002-separation-of-engine-and-connectors.md
@@ -0,0 +1,86 @@
+# 2. Separation of Engine and Connectors
+
+Date: 2025-06-29
+
+## Status
+
+Accepted
+
+is amended by [5. Pure Functional Core](0005-pure-functional-core.md)
+
+is realized by [6. ValueProvider Design for Remote Data and Composability](0006-valueprovider-design-for-remote-data-and-composability.md)
+
+is upheld by [7. Type Discovery at Graph Construction Time](0007-type-discovery-at-graph-construction-time.md)
+
+## Brief summary
+
+We will separate the core decision-making engine from the data connectors to enhance modularity and
+extensibility, allowing for the easy integration of various data sources.
+
+## Context
+
+The current version of the `java-decita` library has a monolithic architecture. The lack of a clear
+separation of concerns also hinders maintainability and testing. The library needs a rewrite to
+improve its flexibility and maintainability. The primary goals are:
+
+1. **Decouple Storage from Logic:** Allow decision tables to be loaded from various sources (e.g.,
+   CSV, JSON, databases) without affecting the core decision-making logic.
+2. **Extensible Core Logic:** Enable the addition of new functionalities, such as custom condition
+   types (e.g., "matches regex") or new result interpreters (like generic side effect producers,
+   i.e. actions changing context state) without modifying the core library.
+
+## Options
+
+1. **Monolithic Architecture:** Continue with the existing design where the engine and data access
+   logic are tightly coupled.
+    * Pros: Simple for the current use case. No major architectural refactoring is required.
+    * Cons: Difficult to extend, hard to maintain, testing is more complex. Fails to meet the
+      primary goals. Adding new storage formats or logic types would require modifying the core
+      library, leading to a brittle and complex codebase over time.
+2. **Abstract Class-based Extension**. Provide abstract base classes for engine's components. Users
+   would extend these classes to implement custom functionality.
+    * Pros: Relatively simple to implement and understand.
+    * Cons: Creates a tighter coupling between the core and its extensions compared to an
+      interface-based approach. It can lead to fragile and complex inheritance hierarchies.
+3. **Configuration-based Modules**. Define modules and their implementations in a central
+   configuration file (e.g., XML or JSON). The core library would read this configuration to wire up
+   the application.
+    * Pros: Makes the module composition very explicit.
+    * Cons: Adds a layer of complexity for both the library maintainers and its users. It can be
+      cumbersome to manage, and runtime errors due to misconfiguration can be hard to debug.
+4. **Modular Architecture with Service Provider Interface (SPI)**. Redesign the library around a
+   minimal core and a set of well-defined Service Provider Interfaces (SPIs). Extensions (plugins)
+   are implemented against these interfaces and discovered at runtime using Java's standard
+   `ServiceLoader`.
+    * Pros:
+        * **Maximum Decoupling:** The core is completely isolated from implementation details.
+        * **High Extensibility:** New features can be added simply by dropping a new JAR onto the
+          classpath.
+        * **Clear Separation of Concerns:** Enforces a clean architecture.
+        * **Standard Approach:** Uses `ServiceLoader`, a standard and well-understood mechanism in
+          the Java ecosystem.
+    * Cons: Requires a more significant upfront design and implementation effort compared to other
+      options.
+
+## Decision
+
+We have decided to adopt the 'Modular Architecture with Service Provider Interface (SPI)' approach.
+The core engine will be responsible for the decision-making logic and will operate on an abstract
+data model. Data connectors will be responsible for loading decision tables from various sources and
+transforming them into the abstract model consumed by the engine. This will be enforced by a defined
+Service Provider Interface (SPI).
+
+This approach directly and effectively addresses all the stated goals. It provides the highest
+degree of decoupling and extensibility, which is crucial for the long-term health and evolution of
+the library. By relying on standard Java mechanisms (`ServiceLoader`), we ensure that the plugin
+model is robust and familiar to Java developers. The initial investment in a clean, modular design
+will pay significant dividends by making future development faster, easier, and safer.
+
+## Consequences
+
+This decision will lead to a more modular and extensible architecture. It will be much easier to add
+support for new data formats in the future without changing the core library. The separation will
+also improve the testability of the core engine and the connectors in isolation. The initial
+development effort might be slightly higher due to the need to define clear interfaces and a data
+abstraction layer. We will need to create at least one connector (e.g., for CSV files) as part of
+the initial implementation.
diff --git a/doc/adr/0003-shared-component-management-and-evaluation-optimization.md b/doc/adr/0003-shared-component-management-and-evaluation-optimization.md
@@ -0,0 +1,117 @@
+# 3. Shared Component Management and Evaluation Optimization
+
+Date: 2025-06-29
+
+## Status
+
+Accepted
+
+is supported by [5. Pure Functional Core](0005-pure-functional-core.md)
+
+is implemented by [6. ValueProvider Design for Remote Data and Composability](0006-valueprovider-design-for-remote-data-and-composability.md)
+
+## Brief summary
+
+We will implement a stateless, immutable `DecisionGraph` to represent the decision logic, which will
+be built once and cached. For each decision-making session, a new, lightweight `ExecutionContext`
+will be created to store the evaluation results, ensuring thread-safety and efficient reuse of
+shared components.
+
+## Context
+
+The library needs to efficiently handle scenarios where components are reused across the system.
+This includes:
+
+1. **Chained Tables:** The outcome of one decision table can be the input for another.
+2. **Shared Conditions:** The same condition (e.g., `customer.age > 21`) can appear in multiple
+   rules, potentially across different tables.
+3. **Shared Data Fragments:** The same piece of data (e.g., the variable `customer.age`) is used in
+   many different conditions.
+
+A naive implementation would create duplicate objects and re-evaluate the same logic repeatedly,
+leading to poor performance and high memory usage. The solution must also be thread-safe to allow
+for simultaneous decision computations in a multi-threaded environment.
+
+## Options
+
+### 1. Centralized Registry/Locator Model (Rejected)
+
+This model uses a central, stateful `ComponentRegistry` to store and dispense unique instances of
+all shared components (tables, conditions, value providers).
+
+* **How it Works:** Components are identified by a canonical key. The registry is queried before
+  creating any new component. If an instance already exists for a key, it's returned; otherwise, a
+  new one is created and stored. Evaluation results are cached in a per-session `ExecutionContext`
+  to avoid re-computation.
+* **Pros:** Memory efficient (Flyweight pattern), simple caching logic.
+* **Cons:** The registry can become a large, stateful "god object," complicating testing and
+  reasoning. The logic for generating canonical keys is complex and potentially brittle.
+
+### 2. Immutable Evaluation Graph Model (Rejected)
+
+This model treats the entire set of tables and their components as a single, immutable Directed
+Acyclic Graph (DAG) of dependencies.
+
+* **How it Works:** The loading phase builds a graph where nodes are `ValueProvider`s and edges
+  represent dependencies. Sharing is inherent, as multiple parent nodes can reference the same child
+  node. The engine traverses this graph, and evaluation results are cached on the nodes themselves.
+* **Pros:** Highly performant due to explicit dependency tracking, which allows for optimal
+  evaluation paths (e.g., pruning dead branches). The graph is immutable and stateless.
+* **Cons:** A naive implementation has two major flaws. First, the graph would need to be rebuilt
+  from scratch for every decision session, which is inefficient. Second, caching results directly on
+  the graph's nodes makes it stateful during evaluation, rendering it **not thread-safe**.
+
+### 3. Stateless Graph Template with a Session Context (Chosen)
+
+This model refines the graph approach to solve its flaws by strictly separating the static structure
+from the per-session execution state.
+
+* **How it Works:**
+    1. **`DecisionGraph` (The Template):** An immutable, thread-safe graph representing all static
+       logic is built **once** from the source files and cached. This is the expensive part, but
+       it's only done once.
+    2. **`ExecutionContext` (The Session State):** For **each** decision-making call, a new,
+       lightweight `ExecutionContext` object is created. This object is a simple map that caches the
+       evaluation results for the nodes of the `DecisionGraph` for that specific session only.
+    3. **`DecisionEngine`:** The engine takes both the shared `DecisionGraph` and the private
+       `ExecutionContext` as arguments. It uses the context to check for cached results before
+       evaluating a node. All mutable state is confined to the thread-local context.
+
+* **Pros:**
+    * **Efficient:** The expensive graph construction is a one-time cost.
+    * **Thread-Safe:** The shared `DecisionGraph` is immutable. The mutable state is confined to the
+      `ExecutionContext`, which is never shared between threads.
+    * **Clean Architecture:** Enforces a clear separation between the static logic (the template)
+      and the dynamic execution state (the context).
+* **Cons:** Higher initial implementation complexity than the registry model, but this is justified
+  by the performance and safety gains.
+
+## Decision
+
+We will adopt the **Stateless Graph Template with a Session Context** model.
+
+This solution provides the best of all worlds. It leverages the performance and explicit dependency
+tracking of the graph model while solving its inherent reusability and concurrency problems. It
+avoids the "god object" issue of the centralized registry and provides a clean, scalable, and
+thread-safe architecture that is essential for a high-performance, modern library. The strict
+separation of the immutable template from the per-session context is the key to achieving all our
+stated goals.
+
+## Consequences
+
+* **What becomes easier:**
+    * Concurrent execution is inherently safe due to the separation of immutable logic and
+      per-session state.
+    * Reasoning about and testing the decision logic is simpler because the `DecisionGraph` is a
+      pure, stateless structure.
+    * The system is more extensible, as new types of nodes can be added to the graph without
+      affecting the core evaluation logic.
+* **What becomes more difficult:**
+    * The initial implementation is more complex compared to a simple stateful model.
+    * Debugging might require inspecting both the static graph and the dynamic execution context,
+      which could be more involved.
+* **Risks to be mitigated:**
+    * The initial construction of the `DecisionGraph` could be a performance bottleneck for very
+      large sets of rules; this may require optimization.
+    * Memory management for the `ExecutionContext` must be handled carefully to avoid leaks,
+      especially in high-throughput scenarios.