Fast Python implementation of "One-Hot Graph Encoder Embedding" by C. Shen, Q. Wang, and C. E. Priebe.
For ultra-fast C++ implementation using Ligra, see parallel-graph-encoder-embedding-ligra.
- 100x speedup over original Python implementation using Numba
- 1000x speedup available with Ligra C++ implementation
Basic usage with edge list graphs:
from src.DataPreprocess import numba_graph_encoder_embed
import numpy as np
# Load your graph as edge list: [source, target, weight]
edges = np.loadtxt("your_graph.csv", delimiter=",")
labels = np.random.randint(0, 5, size=(n_nodes, 1)) # Your node labels
n_nodes = int(max(edges[:, :2].max(), edges[:, :2].max())) + 1
# Get embeddings
Z, W = numba_graph_encoder_embed(edges, labels, n_nodes)Run embedding on any graph file:
python src/Evaluation.py path/to/graph.csv --weighted --laplacianThe script generates random labels and saves embeddings to embeddings.csv.
Edge list (recommended): CSV with columns [source, target, weight]
0,1,1.0
1,2,0.5
Adjacency matrix: Square matrix where A[i,j] is edge weight between nodes i and j.
numba_graph_encoder_embed()- Fast embedding computationDataPreprocess()- Handles input preprocessing and format conversionClustering()- Node clustering using embeddingsGNN()- Neural network classification on embeddings
Graph preprocessing:
Laplacian=True- Use normalized Laplacian instead of adjacency matrixDiagA=True- Add self-loops (diagonal augmentation)Correlation=True- L2-normalize embedding rows
Learning tasks:
- Clustering: Unknown cluster assignments
- Semi-supervised: Some nodes have known labels
- Supervised: Train/test split with fully labeled data
Clustering with unknown number of clusters:
from src.Run import Run
from utils.create_test_case import Case
case = Case(n=1000).case_10_cluster() # Generate test graph
Run(case, "c", Y=[2,3,4,5]) # Try 2-5 clustersSemi-supervised learning with 95% unlabeled nodes:
case = Case(n=1000).case_10() # 95% nodes unlabeled
Run(case, "se", Learner=2) # Neural network learningpip install numpy numba sklearn tensorflowFor maximum performance, use the Ligra implementation for large graphs (>100k nodes).
Paper published in IPDPS 2024 | arxiv.
@inproceedings{lubonja2024edge,
title={Edge-parallel graph encoder embedding},
author={Lubonja, Ariel and Shen, Cencheng and Priebe, Carey and Burns, Randal},
booktitle={2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
pages={482--485},
year={2024},
organization={IEEE}
}