Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
GEMINI_API_KEY=your_api_key_here
2 changes: 1 addition & 1 deletion .github/workflows/mvn.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04, windows-2022, macos-12]
os: [ubuntu-latest, windows-latest, macos-latest]
java: [17]
steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.env

# IntelliJ IDEA
.idea
.attach*
Expand Down
29 changes: 29 additions & 0 deletions GEMINI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
This is the library that provides an engine for processing decision tables in Java. The initial
development went through several iterations, and the library has been tested in production. But
the initial design has some flaws, and the code is not very modular, extensible, or maintainable.

I am planning a rewrite of the java-decita library. My main goal is to make it more modular,
i.e. implement different table storage formats or different logic extensions (like the new row
types for tables) as pluggable modules. It implies creating the functional core of the library
which is free of such modules and their implementation details.

The functional core should be very simple and generic, meaning that it should specify generic
interfaces for all its components, and the components should be implemented as separate
pluggable modules. The core should not depend on any specific implementation of the components,
and the components should not depend on the core. The core should only provide a generic logic for
making decisions based on the decision tables, and the components should provide the specific
implementations that are needed for the specific use cases.

The important thing to note is that the core should be pure, i.e. it should not produce any side
effects. It should only provide a way to process the decision tables and return the results.
It's the application's responsibility to handle the results and produce side effects if needed.

The second important thing is that the core should use some kind of snapshot of all the loaded
decision tables, their rules and conditions. This snapshot should be immutable and should be
used as a basis for concrete decision-making sessions. Probably it will be better to represent
this snapshot as a graph, where nodes are decision tables, rules, conditions and specific data
accessors, and edges are the relationships between them.

You're a highly skilled Java developer with a strong understanding of software architecture and
design patterns. Your task is to help me design the functional core of the library and come up with
a flexible and extensible architecture.
35 changes: 35 additions & 0 deletions doc/GEMINI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
When working on this project, you must adhere to the following guidelines:

1. **Adhere to Existing Architectural Decision Records (ADRs):** Before making any changes, review
the existing ADRs located in the `doc/adr` directory. Your work must be consistent with the
decisions documented there, and you should not contradict or ignore any of the accepted ADRs.
Superseded ADRs should be ignored, but you should still be aware of them for historical context.

2. **Create New ADRs When Appropriate:** For any significant architectural decision, you must
propose a new ADR, discuss it with user and then create the ADR doc. A decision is considered
"significant" if it affects the overall structure, dependencies, or technical approach of the
project. New ADRs should be created by the user with the help of the `adr` tool. That's because
the tool assumes immediate editing of the newly created ADR interactively, which is not supported
by gemini-cli.

A new ADR should be linked to the relevant existing ADRs, if applicable. The linking can be done
using the `adr link` command, which allows you to specify the relationship type (e.g.,
"amends"). The command to link an ADR is:
```bash
adr link SOURCE LINK TARGET REVERSE-LINK
```
where `SOURCE` is the number of the ADR you are linking from, `LINK` is the type of link (e.g.,
"amends"), `TARGET` is the number of the ADR you are linking to, and `REVERSE-LINK` is the
type of link in the opposite direction (e.g., "is amended by").

3. **ADR Format:** Each new ADR must follow the standard format presented in
`doc/adr/templates/template.md`:
* **Status:** One of following: "Accepted" or "Superseded"
* **Brief Summary:** A concise description of the decision, typically in the form of "We are
going to do something because we want something else".
* **Context:** Describe the problem or the need that requires a decision.
* **Options:** Present and analyze at least two (preferably 4-5) potential solutions.
* **Decision:** Clearly state the chosen solution. Justify why the chosen solution was
selected over the alternatives.
* **Consequences:** Discuss the implications of the decision, including any risks or
challenges that may arise from it.
19 changes: 19 additions & 0 deletions doc/adr/0001-record-architecture-decisions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 1. Record architecture decisions

Date: 2025-06-29

## Status

Accepted

## Context

We need to record the architectural decisions made on this project.

## Decision

We will use Architecture Decision Records, as [described by Michael Nygard](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).

## Consequences

See Michael Nygard's article, linked above. For a lightweight ADR toolset, see Nat Pryce's [adr-tools](https://github.com/npryce/adr-tools).
86 changes: 86 additions & 0 deletions doc/adr/0002-separation-of-engine-and-connectors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# 2. Separation of Engine and Connectors

Date: 2025-06-29

## Status

Accepted

is amended by [5. Pure Functional Core](0005-pure-functional-core.md)

is realized by [6. ValueProvider Design for Remote Data and Composability](0006-valueprovider-design-for-remote-data-and-composability.md)

is upheld by [7. Type Discovery at Graph Construction Time](0007-type-discovery-at-graph-construction-time.md)

## Brief summary

We will separate the core decision-making engine from the data connectors to enhance modularity and
extensibility, allowing for the easy integration of various data sources.

## Context

The current version of the `java-decita` library has a monolithic architecture. The lack of a clear
separation of concerns also hinders maintainability and testing. The library needs a rewrite to
improve its flexibility and maintainability. The primary goals are:

1. **Decouple Storage from Logic:** Allow decision tables to be loaded from various sources (e.g.,
CSV, JSON, databases) without affecting the core decision-making logic.
2. **Extensible Core Logic:** Enable the addition of new functionalities, such as custom condition
types (e.g., "matches regex") or new result interpreters (like generic side effect producers,
i.e. actions changing context state) without modifying the core library.

## Options

1. **Monolithic Architecture:** Continue with the existing design where the engine and data access
logic are tightly coupled.
* Pros: Simple for the current use case. No major architectural refactoring is required.
* Cons: Difficult to extend, hard to maintain, testing is more complex. Fails to meet the
primary goals. Adding new storage formats or logic types would require modifying the core
library, leading to a brittle and complex codebase over time.
2. **Abstract Class-based Extension**. Provide abstract base classes for engine's components. Users
would extend these classes to implement custom functionality.
* Pros: Relatively simple to implement and understand.
* Cons: Creates a tighter coupling between the core and its extensions compared to an
interface-based approach. It can lead to fragile and complex inheritance hierarchies.
3. **Configuration-based Modules**. Define modules and their implementations in a central
configuration file (e.g., XML or JSON). The core library would read this configuration to wire up
the application.
* Pros: Makes the module composition very explicit.
* Cons: Adds a layer of complexity for both the library maintainers and its users. It can be
cumbersome to manage, and runtime errors due to misconfiguration can be hard to debug.
4. **Modular Architecture with Service Provider Interface (SPI)**. Redesign the library around a
minimal core and a set of well-defined Service Provider Interfaces (SPIs). Extensions (plugins)
are implemented against these interfaces and discovered at runtime using Java's standard
`ServiceLoader`.
* Pros:
* **Maximum Decoupling:** The core is completely isolated from implementation details.
* **High Extensibility:** New features can be added simply by dropping a new JAR onto the
classpath.
* **Clear Separation of Concerns:** Enforces a clean architecture.
* **Standard Approach:** Uses `ServiceLoader`, a standard and well-understood mechanism in
the Java ecosystem.
* Cons: Requires a more significant upfront design and implementation effort compared to other
options.

## Decision

We have decided to adopt the 'Modular Architecture with Service Provider Interface (SPI)' approach.
The core engine will be responsible for the decision-making logic and will operate on an abstract
data model. Data connectors will be responsible for loading decision tables from various sources and
transforming them into the abstract model consumed by the engine. This will be enforced by a defined
Service Provider Interface (SPI).

This approach directly and effectively addresses all the stated goals. It provides the highest
degree of decoupling and extensibility, which is crucial for the long-term health and evolution of
the library. By relying on standard Java mechanisms (`ServiceLoader`), we ensure that the plugin
model is robust and familiar to Java developers. The initial investment in a clean, modular design
will pay significant dividends by making future development faster, easier, and safer.

## Consequences

This decision will lead to a more modular and extensible architecture. It will be much easier to add
support for new data formats in the future without changing the core library. The separation will
also improve the testability of the core engine and the connectors in isolation. The initial
development effort might be slightly higher due to the need to define clear interfaces and a data
abstraction layer. We will need to create at least one connector (e.g., for CSV files) as part of
the initial implementation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# 3. Shared Component Management and Evaluation Optimization

Date: 2025-06-29

## Status

Accepted

is supported by [5. Pure Functional Core](0005-pure-functional-core.md)

is implemented by [6. ValueProvider Design for Remote Data and Composability](0006-valueprovider-design-for-remote-data-and-composability.md)

## Brief summary

We will implement a stateless, immutable `DecisionGraph` to represent the decision logic, which will
be built once and cached. For each decision-making session, a new, lightweight `ExecutionContext`
will be created to store the evaluation results, ensuring thread-safety and efficient reuse of
shared components.

## Context

The library needs to efficiently handle scenarios where components are reused across the system.
This includes:

1. **Chained Tables:** The outcome of one decision table can be the input for another.
2. **Shared Conditions:** The same condition (e.g., `customer.age > 21`) can appear in multiple
rules, potentially across different tables.
3. **Shared Data Fragments:** The same piece of data (e.g., the variable `customer.age`) is used in
many different conditions.

A naive implementation would create duplicate objects and re-evaluate the same logic repeatedly,
leading to poor performance and high memory usage. The solution must also be thread-safe to allow
for simultaneous decision computations in a multi-threaded environment.

## Options

### 1. Centralized Registry/Locator Model (Rejected)

This model uses a central, stateful `ComponentRegistry` to store and dispense unique instances of
all shared components (tables, conditions, value providers).

* **How it Works:** Components are identified by a canonical key. The registry is queried before
creating any new component. If an instance already exists for a key, it's returned; otherwise, a
new one is created and stored. Evaluation results are cached in a per-session `ExecutionContext`
to avoid re-computation.
* **Pros:** Memory efficient (Flyweight pattern), simple caching logic.
* **Cons:** The registry can become a large, stateful "god object," complicating testing and
reasoning. The logic for generating canonical keys is complex and potentially brittle.

### 2. Immutable Evaluation Graph Model (Rejected)

This model treats the entire set of tables and their components as a single, immutable Directed
Acyclic Graph (DAG) of dependencies.

* **How it Works:** The loading phase builds a graph where nodes are `ValueProvider`s and edges
represent dependencies. Sharing is inherent, as multiple parent nodes can reference the same child
node. The engine traverses this graph, and evaluation results are cached on the nodes themselves.
* **Pros:** Highly performant due to explicit dependency tracking, which allows for optimal
evaluation paths (e.g., pruning dead branches). The graph is immutable and stateless.
* **Cons:** A naive implementation has two major flaws. First, the graph would need to be rebuilt
from scratch for every decision session, which is inefficient. Second, caching results directly on
the graph's nodes makes it stateful during evaluation, rendering it **not thread-safe**.

### 3. Stateless Graph Template with a Session Context (Chosen)

This model refines the graph approach to solve its flaws by strictly separating the static structure
from the per-session execution state.

* **How it Works:**
1. **`DecisionGraph` (The Template):** An immutable, thread-safe graph representing all static
logic is built **once** from the source files and cached. This is the expensive part, but
it's only done once.
2. **`ExecutionContext` (The Session State):** For **each** decision-making call, a new,
lightweight `ExecutionContext` object is created. This object is a simple map that caches the
evaluation results for the nodes of the `DecisionGraph` for that specific session only.
3. **`DecisionEngine`:** The engine takes both the shared `DecisionGraph` and the private
`ExecutionContext` as arguments. It uses the context to check for cached results before
evaluating a node. All mutable state is confined to the thread-local context.

* **Pros:**
* **Efficient:** The expensive graph construction is a one-time cost.
* **Thread-Safe:** The shared `DecisionGraph` is immutable. The mutable state is confined to the
`ExecutionContext`, which is never shared between threads.
* **Clean Architecture:** Enforces a clear separation between the static logic (the template)
and the dynamic execution state (the context).
* **Cons:** Higher initial implementation complexity than the registry model, but this is justified
by the performance and safety gains.

## Decision

We will adopt the **Stateless Graph Template with a Session Context** model.

This solution provides the best of all worlds. It leverages the performance and explicit dependency
tracking of the graph model while solving its inherent reusability and concurrency problems. It
avoids the "god object" issue of the centralized registry and provides a clean, scalable, and
thread-safe architecture that is essential for a high-performance, modern library. The strict
separation of the immutable template from the per-session context is the key to achieving all our
stated goals.

## Consequences

* **What becomes easier:**
* Concurrent execution is inherently safe due to the separation of immutable logic and
per-session state.
* Reasoning about and testing the decision logic is simpler because the `DecisionGraph` is a
pure, stateless structure.
* The system is more extensible, as new types of nodes can be added to the graph without
affecting the core evaluation logic.
* **What becomes more difficult:**
* The initial implementation is more complex compared to a simple stateful model.
* Debugging might require inspecting both the static graph and the dynamic execution context,
which could be more involved.
* **Risks to be mitigated:**
* The initial construction of the `DecisionGraph` could be a performance bottleneck for very
large sets of rules; this may require optimization.
* Memory management for the `ExecutionContext` must be handled carefully to avoid leaks,
especially in high-throughput scenarios.
Loading