Using natural language to explore databases unlocks new opportunities for non-technical users. This tool, built with Neo4j and LangChain, combines semantic search and graph queries to provide intuitive, code-free access to data insights.
docker run \
--name neo4j \
--restart always \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
--mount type=bind,source=$(pwd)/db_data,destination=/data \
neo4j:latestsource: https://neo4j.com/docs/operations-manual/current/docker/introduction/
The project uses a two-step process:
-
Data Loading: Load event data from CSV into Neo4j to:
- Create the graph structure with nodes and relationships
- Generate combined text properties for each event
- Create embeddings for all events using OpenAI's embedding model
- Store embeddings in the Neo4j database
-
Semantic Search: Search the graph database to:
- Perform semantic searches using the pre-generated embeddings
- Generate Cypher queries based on natural language questions
- Return detailed answers based on the graph database
The project provides a command-line interface for easy interaction with the system:
uv venv --python 3.13
uv syncLoad events data from a CSV file and create embeddings:
# Load data using the default sample dataset
events load
# Load data from a custom CSV URL
events load --csv-url https://example.com/events_data.csvSearch for events using natural language queries:
# Basic search
events search --query "Find jazz concerts with more than 50 participants"
# Verbose mode (shows detailed results including Cypher query and raw results)
events search --query "Find outdoor music events" --verbose# Load sample data
events load
# Search for specific events
events search --query "What are the music events that have more than 1 coordinator and more than 50 participants?"
# Get detailed search results
events search --query "Find events in which Alice Jones participated" --verbose(:Event)-[:HAS_TOPIC]->(:Tag)
(:Event)-[:BELONGS_TO]->(:Category)
(:Event)-[:TAKES_PLACE_IN]->(:Location)
(:Event)-[:PART_OF]->(:Project)
(:Coordinator)-[:COORDINATES]->(:Event)
(:Coordinator)-[:COORDINATES]->(:Project)
(:Guest)-[:PARTICIPATES_IN]->(:Event)
| KeyValue | 4:c6515374-8168-481f-a45b-bcfd3d32f193:14 |
|---|---|
| id | "15" |
| name | "call for applications" |
| number_of_participants | 12 |
| start_date | "Sun Aug 01 2021 00:00:00 GMT+0200 (Central European Summer Time)" |
| end_date | "Wed Aug 25 2021 00:00:00 GMT+0200 (Central European Summer Time)" |
| combined_text | <to create embeddings from "project", "event" & "location"> |
| embedding |
- Find all events coordinated by a given coordinator
- Find music outdoor events with more than 100 participants
- Find events in which one person and not the other person participated
- Find events in places untypical for culture events
- Find music workshops with more than one coordinator
- What are the music events that have more than 1 coordinator and more than 50 participants has taken place after 31.01.2021?
- Find events in which Alice Jones participated alone / with other people.
- Find outdoor music events with more than 100 participants.
- Find jazz concerts with more than 50 participants that took place after 31.03.2021
Hybrid Search Flow (Textual Representation)
-
User Query: The process begins with a user submitting a query.
-
Initiate Search: The search process starts.
-
Generate Cypher: A Cypher query is generated based on the user's query.
-
Cypher LLM: A Language Model assists in crafting the Cypher query.
-
Extract Cypher: The generated Cypher query is extracted as text.
-
Execute Graph Query: The Cypher query is executed against the graph database.
-
Filter Results: The initial results from the graph query are filtered.
-
Vector Search Needed?: A decision is made whether to perform vector search or not, based on the graph query results.
-
Based on decision:
- If Yes:
- Execute Vector Search: The vector search process begins.
- Expand Query: The search query is expanded for better vector search.
- Vector Search: The vector search is executed.
- Re-rank: The vector search results are re-ranked.
- Format Vector: The re-ranked vector search results are formatted.
- No:
- Format Graph: The graph search results are formatted.
- If Yes:
-
Compose Answer: Information from the graph and/or vector search is combined to generate an answer.
-
Final Answer: The composed answer is presented.
This textual representation conveys the essential steps and branching logic of the hybrid search flow.
- convert dates to days of week
- I used this tutorial https://huggingface.co/learn/cookbook/en/rag_with_knowledge_graphs_neo4j