"Deepening the roots of knowledge in the soil of data to bear the fruits of organically connected wisdom."
Gopedia is a high-efficiency Enterprise Knowledge Graph Platform specializing in Ingestion and RAG (Retrieval-Augmented Generation). It integrates fragmented information into a cohesive "knowledge neural network," providing a foundation for Enterprise Ontology where relationship reasoning and contextual understanding are at the core.
- π Pluggable (Root): Seamlessly connect any external data source at a workspace/project scale.
- π Scalable (Stem): High-throughput pipelines built on gRPC/Protobuf.
- π Relational (Rhizome): Beyond simple storageβbuilding an organic network of knowledge using vector and graph databases.
- π Actionable (Fruit): Transforming retrieved data into decision-ready reports and insights.
Inspired by the Rhizomeβa horizontal, non-hierarchical, and infinitely expandable root systemβGopediaβs architecture ensures that every component is modular yet organically linked.
The "entry point" where nutrients (data) are absorbed. It registers entire Project Workspaces (directories or repos) and defines connection standards for external sources like Databases, APIs, Streams, and File Systems.
The transport system for data, divided into two vital flows:
- Phloem (Ingestion): Root β Stem β Rhizome. Encapsulates raw data, structures it hierarchically (Project β Document β L1/L2/L3), handles NLP tasks (Sentence Splitting, Entity masking), and records them into the Rhizome via Smart Sinks.
- Xylem (RAG): Rhizome β Leaf/Fruit. Retrieves L3 chunks via vector search, optional cross-encoder rerank, and reconstructs parent structural context (L2 sections, tables, code blocks) for rich prompt injection.
Version note (pipelines): The diagrams below match the current mainline code and Rev4 design notes (chunking, atomic L3, retrieval policy). For a reproducible build identifier, run
git describe --tags(pre-release id looks likev0.1.0-β¦-g<hash>). Authoritative, maintained diagrams and gap lists live in Phloem (ingest) and Xylem (RAG); deeper stage-by-stage text is indoc/design/phloem/pipeline.mdanddoc/design/xylem/pipeline.md.
Phloem (Ingestion) β gRPC IngestMarkdown β domain pipeline (wiki / code) β DefaultSink β PostgreSQL, Qdrant, optional Redis (Tuber).
flowchart TB
subgraph clients [Clients]
Root[Root / property.root_props]
end
subgraph grpc [Phloem gRPC]
S[Server.IngestMarkdown]
R[Registry by domain]
S --> R
end
subgraph pipelines [Pipelines]
W[WikiPipeline]
C[CodePipeline]
R --> W
R --> C
end
W --> P1[TOC + heading chunker]
C --> P2[Code TOC + code chunker]
P1 --> Sink[DefaultSink.Write]
P2 --> Sink
subgraph rhizome [Rhizome]
PG[(PostgreSQL)]
QD[(Qdrant)]
RD[(Redis optional)]
end
Sink --> PG
Sink --> QD
Sink -.-> RD
Root -->|IngestRequest| S
Xylem (RAG) β flows.xylem_flow (CLI or GET /api/search via subprocess): query embedding β Qdrant L3 search β optional rerank β fetch_rich_context (PostgreSQL).
flowchart TB
subgraph entry [Entry]
HTTP[GET /api/search]
CLI[flows.xylem_flow.cli search]
end
subgraph xylem [retriever]
RE[retrieve_and_enrich]
PC[project_config]
EMB{embed backend}
EOpenAI[OpenAI query embed]
ELocal[local e5 query embed]
QD[qdrant L3 top-k]
RR{rerank?}
RER[CrossEncoder rerank]
FR[fetch_rich_context]
end
subgraph stores [Stores]
Qdr[(Qdrant)]
PG2[(PostgreSQL)]
end
HTTP --> CLI
CLI --> RE
RE --> PC
RE --> EMB
EMB -->|local| ELocal
EMB -->|openai| EOpenAI
ELocal --> QD
EOpenAI --> QD
QD --> Qdr
RE --> RR
RR -->|no| FR
RR -->|yes| RER
RER --> FR
FR --> PG2
The "Knowledge Soil." This layer handles identity and relationship reasoning using Polyglot Persistence:
- PostgreSQL: For canonical storage, strict structural hierarchy, idempotency hashing, and Tuber entities (
keyword_so). - Qdrant: For semantic vector search.
- TypeDB: For relationship reasoning and deep graph traversal.
- Leaf (Indexing View): Domain-specific views such as Markdown, Code, or Ticket indexes.
- Fruit (Reports): Final templates or generated answers that combine data from multiple Roots and Leaves into a human-readable format.
Gopedia does not just "chunk" data; it categorizes it into a meaningful hierarchy to ensure high-fidelity retrieval and idempotency.
| Level | Entity | Description |
|---|---|---|
| Project | projects |
The root workspace container. Has a globally stable machine_id. |
| Doc | documents |
The logical file anchor within a Project. |
| L1 | knowledge_l1 |
Document snapshot/revision. Holds the Table of Contents and summary. |
| L2 | knowledge_l2 |
The "Skeleton" of data (Sections, Tables, AST structures, Logic flows). |
| L3 | knowledge_l3 |
Atomic content (e.g., sentences) that are vectorized for search. |
| Keyword | keyword_so |
Tuber entity (Tags/Keywords) mapped to a stable machine_id. |
Gardener aggregate metrics (Recall@5, MRR@10, nDCG@10) across saved evaluation reports. The first seven points use the universitas_factual_v1 (44q) definition; the last point mcp* is mcp-2.1.0 osteon 30q β a different dataset, so do not read a rising line from v0.6b to mcp* as a product improvement. Full tables, per-report links, P@3, and why osteon scores can look high are documented in doc/rag-test-reports/README.md (see the IR metrics snapshot section).
xychart-beta
title "IR aggregate (7Γ universitas 44q, last: mcp-2.1.0 osteon 30q*)"
x-axis [v0.2, v0.3, v0.4, "v0.5.0", "v0.5.1", "v0.6a", "v0.6b", "mcp*"]
y-axis "score" 0.0 --> 1.0
line [0.79, 0.64, 0.50, 0.50, 0.61, 0.50, 0.48, 1.00]
line [0.39, 0.56, 0.32, 0.28, 0.37, 0.36, 0.37, 0.95]
line [0.49, 0.58, 0.37, 0.34, 0.45, 0.40, 0.40, 0.96]
β Details (full table, report links, mcp notes): doc/rag-test-reports/README.md#ir-metrics-snapshot
We are currently transitioning into the Rev2 (Growth & Fruition) phase.
- Verify (Germination)
COMPLETED: Validating the flow from Markdown and Code sources into the Rhizome. - Expand (Growth)
IN PROGRESS: Activating distributed processing via Project-level Ingestion, Tubermachine_idmappings, AST parsing, and NLP entity extraction (NER). - Connect (Fruition): Full integration with the GeneSo ecosystem, featuring complex RAG Fruit (Skill Engine) and ReBAC (Relationship-Based Access Control via SpiceDB).
- API server:
go run ./cmd/apiβ listens onGOPEDIA_HTTP_ADDR(default127.0.0.1:8787). Routes:GET /api/health,GET /api/search?q=...,POST /api/ingestwith JSON{"path":"/abs/path"}. - CLI client:
go run ./cmd/gopedia β¦β talks toGOPEDIA_API_URL(defaulthttp://127.0.0.1:8787). Examples:gopedia server,gopedia search "Introduction",gopedia ingest /path/to/project. - Python: the API runs
python3 -m property.root_props.runandpython3 -m flows.xylem_flow.clifrom the repo root. SetGOPEDIA_REPO_ROOTifgo.modis not discoverable from the process cwd.
| Topic | Link |
|---|---|
| Phloem (ingestion) β diagram + gaps | doc/design/phloem/README.md |
| Phloem β pipeline stages (code-aligned) | doc/design/phloem/pipeline.md |
| Xylem (RAG + rerank) β diagram + gaps | doc/design/xylem/README.md |
| Xylem β pipeline stages (code-aligned) | doc/design/xylem/pipeline.md |
| Chunking / L3 / retrieval strategy (Rev4) | doc/design/Rev4/ |
| Rhizome overview (Rev2) | doc/design/Rev2/01-overview.md |
| Run + API | doc/guide/run.md |
| RAG test reports + IR version chart (full table & notes) | doc/rag-test-reports/README.md#ir-metrics-snapshot |
Prerequisites β minimum environment for install (Kubernetes version, CPU/RAM, tools):
- Kubernetes
v1.28+or a Docker Composeβbased dev stack - At least
4 vCPU / 8 GB RAM(for a three-nodeβstyle setup,8 vCPU / 16 GB RAMis recommended) - Required tools:
git,docker,docker compose; optional:go,python,node
Install (under ~5 minutes)
- Copy-pasteable commands are documented for Docker Compose in the guides below.
- For a quick local stack, follow the Compose commands in:
- Details:
doc/guide/install.md - Short version:
doc/guide/quick-install-guide.md
- Details:
Verify it works
- Success if
curl http://127.0.0.1:18787/api/healthreturns JSON. - Success if
GET /api/search?q=testreturns results.
Tear down
docker compose -f docker-compose.dev.yml --env-file .env down -v
First scenario (under ~10 minutes)
- One demo path you can run right after install: create sample notes in an Obsidian vault, ingest them, then check the search API.
- Next, run quality evaluation with gardener_gopedia and reproduce agent-style queries with gopedia_mcp.
Production inquiries: contact@cloudbro.ai (Cloudbro channel).
Korean README: README(kor).md
This project is licensed under the Apache 2.0 License.