GraphForge Tutorial

A step-by-step guide to building, querying, and analyzing graphs with GraphForge.

Installation
Your First Graph
Querying with Cypher
Working with Persistence
Using Transactions
Advanced Queries
Real-World Example: Citation Network
Best Practices
Next Steps

Installation

Install GraphForge using uv (recommended) or pip:

# Using uv
uv add graphforge

# Using pip
pip install graphforge

Verify the installation:

from graphforge import GraphForge
print("GraphForge installed successfully!")

Your First Graph

Let's build a simple social network.

Step 1: Create an In-Memory Graph

from graphforge import GraphForge

# Create an in-memory graph (data is not persisted)
db = GraphForge()

Step 2: Add Nodes

Nodes represent entities in your graph. Use the Python API to create nodes:

# Create people with labels and properties
alice = db.create_node(
    labels=['Person'],           # Node labels
    name='Alice',                # Properties
    age=30,
    city='NYC'
)

bob = db.create_node(
    labels=['Person'],
    name='Bob',
    age=25,
    city='NYC'
)

charlie = db.create_node(
    labels=['Person'],
    name='Charlie',
    age=35,
    city='LA'
)

print(f"Created node with ID: {alice.id}")  # Auto-generated ID

Step 3: Add Relationships

Relationships connect nodes:

# Create directed relationships
knows_bob = db.create_relationship(
    src=alice,
    dst=bob,
    rel_type='KNOWS',
    since=2015,                  # Relationship properties
    strength='strong'
)

knows_charlie = db.create_relationship(
    src=alice,
    dst=charlie,
    rel_type='KNOWS',
    since=2018,
    strength='medium'
)

print(f"Created relationship with ID: {knows_bob.id}")

Step 4: Query the Graph

Use Cypher queries to explore your graph:

# Find all people Alice knows
results = db.execute("""
    MATCH (a:Person)-[r:KNOWS]->(friend:Person)
    WHERE a.name = 'Alice'
    RETURN friend.name AS name, r.since AS since
    ORDER BY r.since
""")

print("Alice knows:")
for row in results:
    name = row['name'].value      # Access .value to get Python types
    since = row['since'].value
    print(f"  - {name} (since {since})")

Output:

Alice knows:
  - Bob (since 2015)
  - Charlie (since 2018)

Querying with Cypher

GraphForge supports the full openCypher language for declarative graph queries.

Basic Pattern Matching

# Match all nodes
results = db.execute("MATCH (n) RETURN n")
print(f"Total nodes: {len(results)}")

# Match nodes by label
results = db.execute("""
    MATCH (p:Person)
    RETURN p.name AS name, p.age AS age
""")

for row in results:
    print(f"{row['name'].value} is {row['age'].value} years old")

Filtering with WHERE

# Find people over 25
results = db.execute("""
    MATCH (p:Person)
    WHERE p.age > 25
    RETURN p.name AS name
""")

# Multiple conditions
results = db.execute("""
    MATCH (p:Person)
    WHERE p.age > 25 AND p.city = 'NYC'
    RETURN p.name AS name
""")

Traversing Relationships

# Find who knows whom
results = db.execute("""
    MATCH (a:Person)-[r:KNOWS]->(b:Person)
    RETURN a.name AS from, b.name AS to, r.since AS since
""")

# Two-hop traversal
results = db.execute("""
    MATCH (a:Person)-[:KNOWS]->(:Person)-[:KNOWS]->(c:Person)
    WHERE a.name = 'Alice'
    RETURN c.name AS friend_of_friend
""")

Aggregations

# Count nodes
results = db.execute("""
    MATCH (p:Person)
    RETURN count(*) AS total
""")
print(f"Total people: {results[0]['total'].value}")

# Group and aggregate
results = db.execute("""
    MATCH (p:Person)
    RETURN p.city AS city, count(*) AS population
    ORDER BY population DESC
""")

for row in results:
    print(f"{row['city'].value}: {row['population'].value} people")

# Multiple aggregations
results = db.execute("""
    MATCH (p:Person)
    RETURN
        count(*) AS total,
        avg(p.age) AS avg_age,
        min(p.age) AS youngest,
        max(p.age) AS oldest
""")

row = results[0]
print(f"Total: {row['total'].value}")
print(f"Average age: {row['avg_age'].value:.1f}")
print(f"Age range: {row['youngest'].value}-{row['oldest'].value}")

Working with Persistence

So far, we've used in-memory graphs that disappear when the program exits. Let's persist data to disk.

Creating a Persistent Graph

from pathlib import Path

# Specify a file path for SQLite storage
db = GraphForge("my-research-graph.db")

# Add data (works the same as in-memory)
db.execute("CREATE (p:Person {name: 'Alice', age: 30})")
db.execute("CREATE (p:Person {name: 'Bob', age: 25})")

# Save to disk
db.close()

Loading an Existing Graph

# Later, in a new session...
db = GraphForge("my-research-graph.db")

# Data is still there!
results = db.execute("MATCH (p:Person) RETURN p.name AS name")
for row in results:
    print(f"Found: {row['name'].value}")

db.close()

Incremental Updates

# Session 1: Initial data
db = GraphForge("knowledge-base.db")
db.execute("CREATE (:Concept {name: 'Graph Databases'})")
db.close()

# Session 2: Add more data
db = GraphForge("knowledge-base.db")
db.execute("CREATE (:Concept {name: 'SQL Databases'})")
db.execute("""
    MATCH (gdb:Concept {name: 'Graph Databases'})
    MATCH (sql:Concept {name: 'SQL Databases'})
    CREATE (gdb)-[:DIFFERENT_FROM]->(sql)
""")
db.close()

# Session 3: Query accumulated data
db = GraphForge("knowledge-base.db")
results = db.execute("MATCH (c:Concept) RETURN count(*) AS count")
print(f"Total concepts: {results[0]['count'].value}")  # 2
db.close()

Using Transactions

Transactions provide ACID guarantees—atomicity, consistency, isolation, and durability.

Basic Transaction

db = GraphForge("my-graph.db")

# Begin a transaction
db.begin()

# Make changes
db.execute("CREATE (p:Person {name: 'Diana', age: 28})")
db.execute("CREATE (p:Person {name: 'Eve', age: 32})")

# Commit to save changes
db.commit()

db.close()

Rollback on Error

db = GraphForge("my-graph.db")

try:
    db.begin()

    db.execute("CREATE (p:Person {name: 'Frank', age: 40})")

    # Some operation that might fail
    if some_validation_fails:
        raise ValueError("Validation failed!")

    db.commit()
except Exception as e:
    print(f"Error: {e}")
    db.rollback()  # Discard all changes since begin()
finally:
    db.close()

Atomic Multi-Step Operations

db = GraphForge("production-db.db")

try:
    db.begin()

    # Multi-step operation that must succeed or fail atomically
    db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
    db.execute("MATCH (p:Person {id: 123})-[r:WORKS_AT]->() DELETE r")
    db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123})")

    db.commit()
    print("User deactivated successfully")
except Exception as e:
    db.rollback()
    print(f"Deactivation failed: {e}")
finally:
    db.close()

Advanced Queries

CREATE: Building Graphs with Cypher

You can use Cypher CREATE instead of the Python API:

# Create nodes
db.execute("CREATE (p:Person {name: 'Alice', age: 30})")

# Create nodes with relationships in one statement
db.execute("""
    CREATE (a:Person {name: 'Alice'})-[:KNOWS {since: 2020}]->(b:Person {name: 'Bob'})
""")

# Create with RETURN
results = db.execute("""
    CREATE (p:Person {name: 'Charlie', age: 35})
    RETURN p.name AS name, p.age AS age
""")
print(f"Created: {results[0]['name'].value}")

SET: Updating Properties

# Update single property
db.execute("""
    MATCH (p:Person {name: 'Alice'})
    SET p.age = 31
""")

# Update multiple properties
db.execute("""
    MATCH (p:Person {name: 'Alice'})
    SET p.age = 31, p.city = 'Boston', p.active = true
""")

# Update relationship properties
db.execute("""
    MATCH (a:Person {name: 'Alice'})-[r:KNOWS]->(b:Person {name: 'Bob'})
    SET r.strength = 'strong'
""")

DELETE: Removing Data

# Delete a node (must have no relationships — use DETACH DELETE to also remove them)
db.execute("""
    MATCH (p:Person {name: 'Charlie'})
    DETACH DELETE p
""")

# Delete only a relationship
db.execute("""
    MATCH (a:Person {name: 'Alice'})-[r:KNOWS]->(b:Person {name: 'Bob'})
    DELETE r
""")

# Delete multiple elements
db.execute("""
    MATCH (p:Person)
    WHERE p.age < 25
    DELETE p
""")

MERGE: Idempotent Creation

MERGE creates nodes if they don't exist, or matches them if they do:

# Safe to run multiple times
db.execute("MERGE (p:Person {name: 'Alice'})")
db.execute("MERGE (p:Person {name: 'Alice'})")  # Matches existing node

# Check result
results = db.execute("MATCH (p:Person {name: 'Alice'}) RETURN count(*) AS count")
print(results[0]['count'].value)  # 1, not 2!

# Useful for ETL pipelines
db.execute("MERGE (p:Person {name: 'Bob', email: 'bob@example.com'})")

Real-World Example: Citation Network

Let's build a realistic citation network graph.

Setup

from graphforge import GraphForge

db = GraphForge("citation-network.db")

Load Papers

papers = [
    {"id": "P1", "title": "Graph Neural Networks", "year": 2021, "citations": 150},
    {"id": "P2", "title": "Deep Learning Fundamentals", "year": 2019, "citations": 500},
    {"id": "P3", "title": "GNN Applications in NLP", "year": 2022, "citations": 80},
    {"id": "P4", "title": "Attention Is All You Need", "year": 2017, "citations": 2000},
]

for paper in papers:
    db.execute(f"""
        CREATE (:Paper {{
            id: '{paper['id']}',
            title: '{paper['title']}',
            year: {paper['year']},
            citations: {paper['citations']}
        }})
    """)

print(f"Loaded {len(papers)} papers")

Add Authors

authors = [
    {"name": "Alice Smith", "affiliation": "MIT"},
    {"name": "Bob Jones", "affiliation": "Stanford"},
    {"name": "Charlie Brown", "affiliation": "MIT"},
]

for author in authors:
    db.execute(f"""
        MERGE (a:Author {{name: '{author['name']}'}})
        SET a.affiliation = '{author['affiliation']}'
    """)

Link Authors to Papers

authorships = [
    ("Alice Smith", "P1"),
    ("Alice Smith", "P3"),
    ("Bob Jones", "P2"),
    ("Charlie Brown", "P1"),
    ("Charlie Brown", "P4"),
]

for author_name, paper_id in authorships:
    db.execute(f"""
        MATCH (a:Author {{name: '{author_name}'}})
        MATCH (p:Paper {{id: '{paper_id}'}})
        CREATE (a)-[:AUTHORED]->(p)
    """)

Add Citation Links

citations = [
    ("P1", "P2"),  # P1 cites P2
    ("P1", "P4"),  # P1 cites P4
    ("P3", "P1"),  # P3 cites P1
    ("P3", "P2"),  # P3 cites P2
]

for citing_id, cited_id in citations:
    db.execute(f"""
        MATCH (citing:Paper {{id: '{citing_id}'}})
        MATCH (cited:Paper {{id: '{cited_id}'}})
        CREATE (citing)-[:CITES]->(cited)
    """)

Analysis 1: Most Prolific Authors

results = db.execute("""
    MATCH (a:Author)-[:AUTHORED]->(p:Paper)
    RETURN a.name AS author, count(p) AS paper_count
    ORDER BY paper_count DESC
""")

print("Most prolific authors:")
for row in results:
    print(f"  {row['author'].value}: {row['paper_count'].value} papers")

Output:

Most prolific authors:
  Alice Smith: 2 papers
  Charlie Brown: 2 papers
  Bob Jones: 1 papers

Analysis 2: Most Cited Papers

results = db.execute("""
    MATCH (p:Paper)<-[:CITES]-(citing:Paper)
    RETURN p.title AS paper, count(citing) AS citation_count
    ORDER BY citation_count DESC
""")

print("\nMost cited papers (in-network):")
for row in results:
    print(f"  {row['paper'].value}: {row['citation_count'].value} citations")

Analysis 3: Collaboration Network

results = db.execute("""
    MATCH (a1:Author)-[:AUTHORED]->(p:Paper)<-[:AUTHORED]-(a2:Author)
    WHERE a1.name < a2.name
    RETURN a1.name AS author1, a2.name AS author2, count(p) AS papers
    ORDER BY papers DESC
""")

print("\nAuthor collaborations:")
for row in results:
    print(f"  {row['author1'].value} & {row['author2'].value}: {row['papers'].value} papers")

Analysis 4: Papers by MIT Authors

results = db.execute("""
    MATCH (a:Author)-[:AUTHORED]->(p:Paper)
    WHERE a.affiliation = 'MIT'
    RETURN p.title AS paper, a.name AS author
""")

print("\nMIT papers:")
for row in results:
    print(f"  {row['paper'].value} by {row['author'].value}")

Cleanup

db.close()

Best Practices

1. Choose Storage Mode Appropriately

Use in-memory graphs for:

Quick exploration and prototyping
Throwaway analyses
Testing

Use persistent graphs for:

Long-running analyses
Incremental graph construction
Shared datasets
Production workflows

# Exploration
db = GraphForge()

# Production
db = GraphForge("production-graph.db")

2. Always Close Persistent Graphs

# Good: Using try-finally
db = GraphForge("my-graph.db")
try:
    # ... work with graph ...
    pass
finally:
    db.close()

# Better: Using context manager (if implemented)
# with GraphForge("my-graph.db") as db:
#     ... work with graph ...

3. Use Transactions for Multi-Step Operations

# Atomic updates
db.begin()
try:
    db.execute("CREATE (p:Person {name: 'Alice'})")
    db.execute("CREATE (p:Person {name: 'Bob'})")
    db.execute("""
        MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
        CREATE (a)-[:KNOWS]->(b)
    """)
    db.commit()
except Exception as e:
    db.rollback()
    raise

4. Use MERGE for Idempotent Operations

# Safe to run multiple times
db.execute("MERGE (p:Person {email: 'alice@example.com'})")
db.execute("MERGE (p:Person {email: 'alice@example.com'})")  # No duplicates

# Avoid this pattern
db.execute("CREATE (p:Person {email: 'alice@example.com'})")
db.execute("CREATE (p:Person {email: 'alice@example.com'})")  # Creates duplicate!

5. Remember to Access `.value` on Results

results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")

for row in results:
    # Correct
    name = row['name'].value
    age = row['age'].value

    # Incorrect (returns CypherValue object)
    # name = row['name']

6. Use WHERE for Complex Filtering

# Prefer this
db.execute("""
    MATCH (p:Person)
    WHERE p.age > 25 AND p.city = 'NYC'
    RETURN p
""")

# Over this (inline property matching is limited)
db.execute("""
    MATCH (p:Person {city: 'NYC'})
    WHERE p.age > 25
    RETURN p
""")

7. Order Results Before Using LIMIT

# Always order when using LIMIT
db.execute("""
    MATCH (p:Person)
    RETURN p.name AS name, p.age AS age
    ORDER BY p.age DESC
    LIMIT 10
""")

# Without ORDER BY, results are non-deterministic

Next Steps

Congratulations! You've learned the fundamentals of GraphForge.

Learn More

API Reference — Complete Python API documentation
Cypher Guide — Full openCypher subset reference
Architecture Overview — System design and internals
README.md (repository root) — Full feature list and examples

Try These Exercises

Social Network: Build a graph of friends and their relationships. Find mutual friends.
Knowledge Graph: Extract entities from a document and link them with relationships.
Time Series: Model temporal data as a graph and query time-based patterns.
Recommendation System: Build a user-item graph and find similar users or items.
Data Lineage: Track transformations in a data pipeline and query dependencies.

Join the Community

Report issues on GitHub
Read the Requirements Document for design rationale
Explore example notebooks (coming soon!)

Happy Graph Forging! 🔨📊

FilesExpand file tree

tutorial.md

Latest commit

History