IceGraph

IceGraph provides an interactive, hierarchical view of Apache Iceberg metadata. It maps the DNA of your tables—from root metadata down to individual data and delete files.

Look at Live Demo! https://yanivzalach.github.io/IceGraph/

Opinionated Design: IceGraph is built exclusively for Spark Connect backends.

Table Version: Currently IceGraph officially supports Table Version 2.

🛠 Features

Read-Only: The application is read-only and does not modify the table.
Time-Travel: View the physical state of your table as of any datetime.
Metadata Inspector: Displaying record counts, stats, and file paths.
Table History: Trace every metadata evolution, from schema changes to snapshot writes, across the full lifetime of the table.
Table File Browser: See your table's files group by partition, just like you use to.
Branches: View all the branches of the table, even when detached from the main branch.

Recommended: In production, use a user with read-only permissions for the Spark Connect server, for extra peace of mind.

Mock Data Example Using Docker

Clone the repo, and in it, go to:

cd docker_demo

Run the docker compose:

docker compose up

Go to http://localhost:5000 and explore table default.events and table default.logging.

Quick Start Using Docker

The easiest way to run IceGraph is via DockerHub

Spark connect 3.5.4

docker run -e SPARK_REMOTE=sc://<spark-connect-ip>:15002 -p 5000:5000 yanivzalach/icegraph:latest

Other Spark Connect versions

Clone the repo, update the Spark Connect version in backend/pyproject.toml, then build from the project root:

docker build -t icegraph .

Then run with the same command:

docker run -e SPARK_REMOTE=sc://<spark-connect-ip>:15002 -p 5000:5000 icegraph

Start Using Source Code

Prerequisites

npm
UV (python)
Python 3.9

1. Setup

Sync the environments:

cd backend
uv sync

cd frontend
npm i

2. Setup your Envs

We will create an .env file in the root of the backend directory:

SPARK_REMOTE=sc://localhost:15002 # Our local testing spark, If you use docker, change it to your ip.

If you want to change the default values of the application, you can set the following environment variables:

MAX_NUMBER_OF_GRAPHS_TO_COMPUTE: The maximum number of graphs to compute in parallel. Default is 15.
MAX_SNAPSHOTS_TO_SHOW: The maximum number of snapshots to show in the snapshot selection page. Default is 2000.
COMPUTE_CLEANUP_TIME_SECONDS: The time to wait before cleaning up the computed graphs. Default is 12.
MAX_DATA_FILES_TO_COLLECT: The maximum number of data files to collect. Default is 5000.
MAX_SNAPSHOTS_TO_COMPUTE: The maximum number of snapshots to compute. Default is 50.

3. Run

Open one terminal in the backend directory and run:

uv run python main.py

Open a second terminal in the front end directory and run:

npm run dev

Go to http://localhost:3000 and explore your tables.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github/workflows		.github/workflows
backend		backend
docker_demo		docker_demo
frontend		frontend
images		images
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IceGraph

🛠 Features

Mock Data Example Using Docker

Quick Start Using Docker

Spark connect 3.5.4

Other Spark Connect versions

Start Using Source Code

Prerequisites

1. Setup

2. Setup your Envs

3. Run

About

Uh oh!

Releases 35

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IceGraph

🛠 Features

Mock Data Example Using Docker

Quick Start Using Docker

Spark connect 3.5.4

Other Spark Connect versions

Start Using Source Code

Prerequisites

1. Setup

2. Setup your Envs

3. Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages