Graph Encoder Embedding

Fast Python implementation of "One-Hot Graph Encoder Embedding" by C. Shen, Q. Wang, and C. E. Priebe.

For ultra-fast C++ implementation using Ligra, see parallel-graph-encoder-embedding-ligra.

Performance

100x speedup over original Python implementation using Numba
1000x speedup available with Ligra C++ implementation

Quick Start

Basic usage with edge list graphs:

from src.DataPreprocess import numba_graph_encoder_embed
import numpy as np

# Load your graph as edge list: [source, target, weight]
edges = np.loadtxt("your_graph.csv", delimiter=",")
labels = np.random.randint(0, 5, size=(n_nodes, 1))  # Your node labels
n_nodes = int(max(edges[:, :2].max(), edges[:, :2].max())) + 1

# Get embeddings
Z, W = numba_graph_encoder_embed(edges, labels, n_nodes)

Command Line Usage

Run embedding on any graph file:

python src/Evaluation.py path/to/graph.csv --weighted --laplacian

The script generates random labels and saves embeddings to embeddings.csv.

Input Formats

Edge list (recommended): CSV with columns [source, target, weight]

0,1,1.0
1,2,0.5

Adjacency matrix: Square matrix where A[i,j] is edge weight between nodes i and j.

Key Functions

numba_graph_encoder_embed() - Fast embedding computation
DataPreprocess() - Handles input preprocessing and format conversion
Clustering() - Node clustering using embeddings
GNN() - Neural network classification on embeddings

Options

Graph preprocessing:

Laplacian=True - Use normalized Laplacian instead of adjacency matrix
DiagA=True - Add self-loops (diagonal augmentation)
Correlation=True - L2-normalize embedding rows

Learning tasks:

Clustering: Unknown cluster assignments
Semi-supervised: Some nodes have known labels
Supervised: Train/test split with fully labeled data

Examples

Clustering with unknown number of clusters:

from src.Run import Run
from utils.create_test_case import Case

case = Case(n=1000).case_10_cluster()  # Generate test graph
Run(case, "c", Y=[2,3,4,5])  # Try 2-5 clusters

Semi-supervised learning with 95% unlabeled nodes:

case = Case(n=1000).case_10()  # 95% nodes unlabeled
Run(case, "se", Learner=2)  # Neural network learning

Dependencies

pip install numpy numba sklearn tensorflow

For maximum performance, use the Ligra implementation for large graphs (>100k nodes).

Paper & Citation

Paper published in IPDPS 2024 | arxiv.

@inproceedings{lubonja2024edge,
  title={Edge-parallel graph encoder embedding},
  author={Lubonja, Ariel and Shen, Cencheng and Priebe, Carey and Burns, Randal},
  booktitle={2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
  pages={482--485},
  year={2024},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 346 Commits
runtime_benchmarks		runtime_benchmarks
src		src
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Encoder Embedding

Performance

Quick Start

Command Line Usage

Input Formats

Key Functions

Options

Examples

Dependencies

Paper & Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graph Encoder Embedding

Performance

Quick Start

Command Line Usage

Input Formats

Key Functions

Options

Examples

Dependencies

Paper & Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages