Skip to content
266 changes: 266 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
<!--
SPDX-FileCopyrightText: 2025 ash_neo4j contributors <https://github.com/diffo-dev/ash_neo4j/graphs.contributors>

SPDX-License-Identifier: MIT
-->

# AGENTS.md — AshNeo4j

AI agent guidance for the AshNeo4j source repository.

## What this project is

AshNeo4j is an `Ash.DataLayer` that stores Ash resources as nodes in a Neo4j graph database.
It is a library published on hex.pm and maintained at `diffo-dev/ash_neo4j`. Its primary consumer
is the Diffo project; upstream bugs found while working in Diffo belong here.

## Before making changes

1. Read `usage-rules.md` — the canonical rules for using AshNeo4j, including naming conventions,
relationship semantics, aggregate kinds, and the test sandbox.
2. Understand the label system (see **Label system** below) — the label concept is
a frequent source of bugs and the most important thing to get right.
3. Run `mix test` before and after your change to confirm nothing regressed.

## Fixing bugs

Before writing any fix, review existing test coverage for the affected behaviour. If the bug
has no test, write the failing test first — this confirms the reproduction and guards the fix
against regression. Only then implement the fix and verify the test passes.

## Project structure

```
lib/
data_layer.ex — Ash.DataLayer behaviour: CRUD, aggregates, calculations,
transaction, enrichments (OPTIONAL MATCH → source attributes)
cypher.ex — Cypher string helpers: node/2, relationship/3, expression/5,
parameterized_node/3, render/1, run/1
cypher/query.ex — Typed clause structs (Match, Where, Return, …) and builder
functions for every query shape used by the data layer
query_helper.ex — Translates Ash.Query (filter, sort, offset, limit) into
a Cypher.Query; entry point is query_nodes/1
resource/info.ex — All DSL introspection: label/1, module_label/1, domain_label/1,
domain_fragment_label/1, all_labels/1, label_pair/1,
mapping/1, relate/1, translations/1, and relationship helpers
resource_mapping.ex — %ResourceMapping{} struct (module, label, module_label,
domain_fragment_label, all_labels, label_pair,
properties, edges, guards, skip)
edge_descriptor.ex — %EdgeDescriptor{} struct (relationship, label, direction,
destination_label)
neo4j_helper.ex — Low-level node/edge operations via Bolty
data_layer/cast.ex — Casts Neo4j return values to Ash types
data_layer/dump.ex — Dumps Ash values to Neo4j-compatible primitives
data_layer/type_classifier.ex — Classifies types as :ash_json (embedded/struct/map) or scalar
sandbox.ex — AshNeo4j.Sandbox: per-test transaction isolation
util.ex — short_name/1, to_camel_case/1, reverse/1
persisters/
persist_labels.ex — Computes and persists domain_label, module_label, label,
domain_fragment_label, all_labels, label_pair
persist_translations.ex — Builds attribute → property name keyword list; excludes
belongs_to source attributes and skip attributes
persist_relate.ex — Merges explicit relate DSL with default auto-generated edges
persist_relationship_attributes.ex — Maps source attributes to relationship names
persist_mapping.ex — Bakes __ash_neo4j_mapping__/0 onto each resource module
verifiers/
verify_labels_pascal_case.ex
verify_relate.ex
verify_guard.ex
verify_properties_camel_case.ex
verify_enrichable.ex
verify_attribute_type.ex

test/
support/resource/ — Test resources (Post, Comment, Author, Specification, …)
support/srm.ex — Test domain (Srm)
blog_test.exs — CRUD, filter, relationship tests
aggregate_test.exs — All aggregate kinds including filtered and expr aggregates
calculation_test.exs — Expression calculations
data_layer/ — Unit tests for Cast, Dump, TypeClassifier, Info
```

## Label system

Every node has several distinct label concepts. Getting them confused is the most common
source of bugs:

| Name | Persisted as | Example | When used |
|---|---|---|---|
| `domain_label` | `:domain_label` | `:Servo` | Written on CREATE; also part of MATCH via `label_pair` |
| `module_label` | `:module_label` | `:ShelfInstance` | Written on CREATE; also part of MATCH via `label_pair` |
| `label` | `:label` | `:Instance` | May differ from `module_label` when a resource fragment declares a base type; written on CREATE only |
| `domain_fragment_label` | `:domain_fragment_label` | `:Telco` | Written on CREATE only — from a domain fragment using `AshNeo4j.DataLayer.Domain`; `nil` when none declared |
| `all_labels` | `:all_labels` | `[:Servo, :ShelfInstance, :Instance, :Telco]` | Full CREATE label list — `[domain_label, module_label, label, domain_fragment_label]` deduped |
| `label_pair` | `:label_pair` | `[:Servo, :ShelfInstance]` | MATCH label list — always `[domain_label, module_label]`; uniquely identifies this resource type |

**Key invariant:** `all_labels` are written on `CREATE`. For `MATCH` / `UPDATE` / `DELETE`,
use `mapping.label_pair` — always `[domain_label, module_label]`. This two-label combination
uniquely identifies the exact resource type and prevents cross-fragment contamination.

`Cypher.node(:s, [:Servo, :ShelfInstance])` produces `"(s:Servo:ShelfInstance)"` — correct.
`Cypher.node(:s, [:Instance])` produces `"(s:Instance)"` — scans every resource extending the same fragment.
`Cypher.node(:s, [:ShelfInstance])` produces `"(s:ShelfInstance)"` — scopes to module but not domain (avoid).

`mapping.label_pair` always holds `[domain_label, module_label]`. Use it for all MATCH patterns.

## Translations (attribute ↔ property name mapping)

`mapping.properties` is a keyword list of `{ash_attribute_name, neo4j_property_name}` pairs
built by `PersistTranslations`. Rules:

- `snake_case` attributes → `camelCase` properties (via `Util.to_camel_case/1`).
- The `:id` attribute is special: its property name is the camelCase of the Ash type's short
name (e.g. `Ash.Type.UUID` → property `:uuid`). This avoids colliding with Neo4j's internal
`id` field.
- `belongs_to` source attributes (e.g. `specification_id`) are **excluded** from translations.
They are not stored as node properties; their values come from `enrichments/3` (reading the
OPTIONAL MATCH destination node). Do not re-add them to translations.
- Attributes listed in the `skip` DSL option are also excluded.

The `convert_node_to_resource_impl/4` loop iterates translations and reads node properties.
Because `belongs_to` source attributes are excluded, the loop does not touch them — their
values must survive intact from the enrichments map that seeds the accumulator.

## Enrichments (OPTIONAL MATCH → source attributes)

After a read query `MATCH (s:Label) OPTIONAL MATCH (s)-[r]-(d) RETURN s, r, d`, `enrichments/3`
in `DataLayer` processes each `{edge, dest_node}` pair and populates:

- `belongs_to` relationships: sets `source_attribute` (e.g. `specification_id`) from
`dest_node.properties[destination_property]`.
- `has_one` reverse relationships: sets `destination_attribute` from source node property.
- `many_to_many` relationships: converts dest_node to a resource struct and appends to a list.

The lookup uses `mapping.edges` (from `mapping.module`). If an edge returned by the OPTIONAL
MATCH has no matching entry in `mapping.edges` (wrong label, wrong direction, or missing relate
entry), `enrichments/3` silently returns `acc` unchanged and the source attribute remains nil.

`edge_direction/2` determines direction by comparing `dest_node.id` with `edge.start` /
`edge.end`:
- `dest_node.id == edge.start` → `:incoming` (destination is the start of the edge)
- `dest_node.id == edge.end` → `:outgoing` (destination is the end of the edge)

## PersistRelate: explicit vs default edges

`PersistRelate` builds `mapping.edges` from two sources:

1. **Explicit entries** — the `relate` list in the resource's `neo4j do` block:
`{relationship_name, edge_label, direction, destination_label}`.
2. **Default entries** — auto-generated for any Ash relationship that has no explicit entry.
Default edge label = `String.upcase(relationship.type)` (e.g. `:BELONGS_TO`), default
destination label = last segment of `relationship.destination` module name.

Explicit entries always take precedence. If a relationship is declared in a fragment's
`neo4j do` block, check whether the extending resource's `relate` DSL correctly merges those
entries — a mismatch between the explicit edge label and the default generates a wrong label
in `mapping.edges`, causing enrichments to silently fail.

## Aggregate execution paths

`run_aggregate_for_ids/6` selects one of four paths based on the aggregate's properties:

| Condition | Path | Description |
|---|---|---|
| `aggregate.field` is an `Ash.Query.Calculation` | expr path | Loads full dest records, evaluates Ash expression per record in Elixir |
| `aggregate_has_filter?(aggregate)` is true | filtered path | Loads full dest records, applies `Ash.Filter.Runtime.filter_matches`, computes aggregate in Elixir |
| field type is `:ash_json` (embedded/struct/map) | embedded path | Runs `collect(d.prop)` in Cypher, casts each raw JSON value via `Cast.cast/3` in Elixir |
| otherwise | Cypher path | Fully pushed down: `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, `collect`, `head(collect(...))` |

`aggregate_has_filter?` treats `%Ash.Filter{expression: true}` as "no filter" (Ash always
attaches a trivial filter to unfiltered aggregates). Do not change this sentinel check.

## Cypher.Query builders

Every query shape used by the data layer has a typed builder in `Cypher.Query`. Builders
return `%Cypher.Query{clauses: [...], params: %{}}` structs that `Cypher.render/1` turns into
a `{cypher_string, params}` tuple for `Cypher.run/1`.

`Cypher.node(variable, labels)` takes a list of label atoms and produces `"(var:L1:L2)"`.
`Cypher.parameterized_node/3` does the same with a property map for parameterized MATCH patterns.

All MATCH/UPDATE/DELETE builders accept `atom() | [atom()]` for source label parameters — pass
`mapping.label_pair` (a list) for all resource operations. Single-atom callers still work for
destination labels (which remain a single label in most patterns).

The aggregate builders (`aggregate_per_record`, `aggregate_total`, `related_nodes`) use a
`labels_string/1` private helper to render `[domain, module]` as `"Domain:Module"` inside
string-interpolated Cypher patterns — `"(s:#{labels_string(label_pair)})"`. When modifying
aggregate builders, use `labels_string/1` for the source pattern, not direct atom interpolation.

## Running tests

Tests require a running Neo4j instance (configured in `config/runtime.exs` via `BOLT_URL`
or similar). `AshNeo4j.Sandbox` wraps each test in a transaction that rolls back on completion.

```sh
mix test # full suite
mix test test/blog_test.exs # single file
mix test test/blog_test.exs:LINE # single test
mix test --max-failures 5 # stop early
```

The sandbox uses `Process` dictionary flags (`ash_neo4j_in_sandbox_tx`,
`ash_neo4j_tx_stack`). Tests that bypass the sandbox or start their own transactions may
interfere with isolation — check the sandbox implementation before adding transaction logic
in tests.

## Raising upstream bugs

When a bug is found in a dependency (Bolty, Ash, Spark), raise a GitHub issue on that
repository. Use **diffo issue #125** as the style reference:

- **## Description** — explain what the system does, what the code path is, and where it
breaks. Include a short Cypher or Elixir snippet if it makes the failure concrete.
- **## What we need** — state the correct behaviour plainly.
- **## Why it matters** — explain the practical impact.

Do not attempt to locate or fix the root cause in the dependency. Add useful hypotheses
as a follow-up comment, then leave it with the upstream maintainers.

## Common agent mistakes

- **Not using `mapping.label_pair` for MATCH.** All read, update, delete, and aggregate queries
must use `mapping.label_pair` (`[domain_label, module_label]`) as the source node pattern.
Using `mapping.label` alone matches every resource that extends the same fragment. Using
`mapping.module_label` alone (without domain) risks collisions across domains.

- **Re-adding `belongs_to` source attributes to translations.** They are intentionally excluded
by `PersistTranslations`. Their values come from enrichments (the OPTIONAL MATCH result).
Including them in translations would cause the property-read loop to overwrite the
enriched value with nil (the attribute has no corresponding node property).

- **Assuming `Verifier.get_option(dsl, [:neo4j], :relate, [])` picks up fragment DSL options.**
`get_entities` picks up entities from fragments; option merging behaviour for `relate` (a
list option) must be verified separately. If a fragment's explicit `relate` entries are not
visible, `PersistRelate` generates default edges with wrong labels (e.g. `:BELONGS_TO`
instead of `:SPECIFIED_BY`), causing enrichments to silently fail.

- **Using a single label in aggregate Cypher builders** (`aggregate_per_record`,
`aggregate_total`, `related_nodes`). These use `"(s:#{labels_string(source_label)})"` with a
`labels_string/1` helper. Always pass `mapping.label_pair` as the source label here too.

- **Registering a transformer under `persisters:`** and expecting `before?`/`after?` ordering
relative to other transformers to be honoured. Persisters always run after ALL transformers.
Ordering declarations that target transformers from a persister are silently ignored.

- **Using `List.delete/2` to filter domain labels** from destination node labels. It removes
only the first occurrence. If the source domain label happens to match a destination node
label, only one instance is removed. Prefer `List.delete_at` or label filtering by explicit
set membership when precision matters.

- **Treating `domain_label` alone as a MATCH label.** The domain label is part of `label_pair`
and is used in MATCH, but always paired with `module_label`. Matching on domain label alone
would return every node in the domain, not just the target resource.

- **Forgetting to update `relation_read` in `Cypher.Query`** when changing MATCH label logic.
The `relationship_read/7` builder emits a separate `MATCH (s:SrcLabel)-[r:EdgeLabel]-(d:DestLabel)`
pattern. It must use the same multi-label source pattern as `node_read`.

- **Changing `aggregate_has_filter?` sentinel without understanding Ash's trivial filter.**
Ash attaches `%Ash.Filter{expression: true}` to every aggregate, even unfiltered ones. The
check `%Ash.Filter{expression: true} -> false` is intentional — it means "no user filter".
Removing or loosening it routes all aggregates through the Elixir path unnecessarily.

- **Modifying `Cypher.render/1` to reorder clauses.** The clause list is ordered; render
outputs them in insertion order. Query correctness depends on this ordering. Always add
clauses in the correct semantic position in the builder, not in render.
Loading
Loading