Skip to content

HKBU-LAGAS/Awesome-Graph-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

🌟 Awesome Graph Datasets 🌟

Awesome License: MIT

📌 We are actively collecting openly available graph datasets of diverse types for research purposes. This repository will be updated regularly.

Contents

📑 Dataset List

Plain Graphs

Name #nodes #edges #labels Type URL
PPI 3,890 76,584 50 undirected [raw] [raw] [preprocessed]
Blogcatalog3 10,312 333,983 39 undirected [raw] [raw] [preprocessed]
Flickr 80,513 5,899,882 195 undirected [raw] [raw] [preprocessed]
Amazon 334863 925872 100 undirected [raw] [preprocessed]
DBLP 425957 1049866 100 undirected [raw] [preprocessed]
Youtube 1,138,499 2,990,443 47 undirected [raw] [preprocessed]
TWeibo 2,320,895 50,655,143 100 directed [raw] [preprocessed]
Orkut 3,072,441 117,185,084 100 undirected [raw] [preprocessed]
LiveJournal 3997962 34681189 100 undirected [raw] [preprocessed]
In-2004 1,382,908 16,539,643 - directed [raw] [preprocessed]
DBLP 5,425,963 17,298,032 - undirected [raw] [preprocessed]
Pokec 1,632,803 30,622,564 - directed [raw] [preprocessed]
LiveJournal 4,847,571 68,475,391 - directed [raw] [preprocessed]
IT-2004 41,291,594 1,135,718,909 - directed [raw] [preprocessed]
Twitter 41,652,230 1,468,365,182 - directed [raw] [preprocessed]
Friendster-small 7,944,949 447,219,610 100 undirected [raw] [raw] [preprocessed]
Friendster 65,608,366 1,806,067,135 100 undirected [raw] [raw] [preprocessed]
OAG 67,768,244 895,368,962 19 undirected [raw] [preprocessed]
UK-2007 105,896,555 3,738,733,648 - directed [raw][preprocessed]
UK-union 133,633,040 5,475,109,924 - directed [raw] [preprocessed]
ClueWeb12 978,408,098 42,574,107,469 - directed [raw]
ClueWeb09 1,684,868,322 7,939,635,651 - directed [raw] [preprocessed]

Welcome to cite our paper if you publish results based on our preprocessed datasets.

@article{yang13homogeneous,
  title={Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank},
  author={Yang, Renchi and Shi, Jieming and Xiao, Xiaokui and Yang, Yin and Bhowmick, Sourav S},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={5},
  pages={670--683},
  year={2020},
  publisher={VLDB Endowment}
}

@article{shi13realtime,
  title={Realtime Index-Free Single Source SimRank Processing on Web-Scale Graphs},
  author={Shi, Jieming and Jin, Tianyuan and Yang, Renchi and Xiao, Xiaokui and Yang, Yin},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={7},
  pages={966--980},
  year={2020},
  publisher={VLDB Endowment}
}

Temporal Graphs

Signed Graphs

Name #nodes #positive-edges #negative-edges #attributes #labels bipartite Source
Epinions link link
Wikipedia link link
Slashdot link link
Bitcoin link
WikiSigned link
Reddit link link link link
ADJNet link
SCOTUS link
Bonanza Yes link link
U.S. Senate Yes link link
U.S. House Yes link link
Review Yes link link
MovieLens-1M Yes link
Amazon-Book Yes link
Amazon-CDs Yes link
Amazon-Music Yes link
KuaiRec Yes link
KuaiRand Yes link

Attributed Graphs

Name Type #nodes #edges #attributes #labels URL
Wiki directed 2405 17981 4973 19 [raw] [preprocessed]
Cora directed 2708 5429 1433 7 [raw] [preprocessed]
Citeseer directed 3312 4660 3703 6 [raw] [preprocessed]
Pubmed directed 19717 44338 500 3 [raw] [preprocessed]
BlogCatalog undirected 5196 343486 8189 6 [raw] [preprocessed]
PPI undirected 56944 818716 50 121 [raw] [preprocessed]
Flickr undirected 7575 479476 12047 9 [raw] [preprocessed]
Facebook undirected 4039 88234 1283 193 [raw] [preprocessed]
ArXiv undirected 169343 1157799 128 40 [raw] [preprocessed]
Reddit undirected 232,965 11,606,919 602 41 [raw] [preprocessed]
Yelp undirected 716847 6,977,410 300 100 [raw] [preprocessed]
Twitter directed 81306 1768149 216839 4065 [raw] [preprocessed]
Amazon2M undirected 2449029 61859140 100 47 [raw] [raw] [preprocessed]
Google+ directed 107614 13673453 15907 468 [raw] [preprocessed]
TWeibo directed 2320895 50655143 1657 8 [raw] [preprocessed]
MAG directed 59249719 978147253 2000 100 [raw] [preprocessed]
MAG-SC directed 10541560 265219994 2784240 8 [raw] [preprocessed]

Our datasets are also available in Pytorch-Geometric. Node attributes can be loaded as a sparse matrix using the following code

from scipy import sparse
features = sparse.load_npz("attrs.npz")

Welcome to cite our paper if you publish results based on our preprocessed datasets.

@article{yang2020scaling,
  title={Scaling Attributed Network Embedding to Massive Graphs},
  author={Yang, Renchi and Shi, Jieming and Xiao, Xiaokui and Yang, Yin and Liu, Juncheng and Bhowmick, Sourav S},
  journal={Proceedings of the VLDB Endowment},
  volume={14},
  number={1},
  pages={37--49},
  year={2021},
  publisher={VLDB Endowment}
}

Bipartite Graphs

Name |U| |V| |E| URL
Avito 27736 16589 67029 [raw] [preprocessed]
AOL 4811647 1632788 10741954 [raw] [preprocessed]
DBLP 6001 1524 29257 [raw] [preprocessed]
Movielens-1M 6040 3706 1000210 [raw] [preprocessed]
KDDCup2012 255170 1848114 2766394 [raw] [preprocessed]
Last.fm 359349 160168 17559531 [raw] [preprocessed]
Amazon-games 826767 50210 1324754 [raw] [preprocessed]
DBLP 6,001 1,308 29,256 [raw] [preprocessed]
Wikipedia 15,000 3,214 64,095 [raw] [preprocessed]
Pinterest 55,187 9,916 1,500,809 [raw] [preprocessed]
Yelp 31,668 38,048 1,561,406 [raw] [preprocessed]
MovieLens-10M 69,878 10,677 10,000,054 [raw] [preprocessed]
Last.fm 359,349 160,168 17,559,530 [raw] [preprocessed]
MIND 876,956 97,509 18,149,915 [raw] [preprocessed]
Netflix 480,189 17,770 100,480,507 [raw] [preprocessed]
Orkut 2,783,196 8,730,857 327,037,487 [raw] [preprocessed]
MAG 10,541,560 2,784,240 1,095,315,106 [raw] [preprocessed]

Welcome to cite our paper if you publish results based on our preprocessed datasets.

@inproceedings{yang2022efficient,
  title={Efficient and Effective Similarity Search over Bipartite Graphs},
  author={Yang, Renchi},
  booktitle={Proceedings of the ACM Web Conference 2022},
  pages={308--318},
  year={2022}
}

@inproceedings{yang2022scalable,
  title={Scalable and Effective Bipartite Network Embedding},
  author={Yang, Renchi and Shi, Jieming and Huang, Keke and Xiao, Xiaokui},
  booktitle={Proceedings of the 2022 International Conference on Management of Data},
  pages={1977--1991},
  year={2022}
}

Text-Attributed Graphs

Name #nodes #edges #labels Domain Node/Edge Text Task Source
ogbn-arxiv-TA 169,343 1,166,243 40 Academic Node Node Classification URL Paper
Books-Children 76,875 1,554,578 24 E-Commerce Node Node Classification URL Paper
Books-History 41,551 358,574 12 E-Commerce Node Node Classification URL Paper
Ele-Computers 87,229 721,081 10 E-Commerce Node Node Classification URL Paper
Ele-Photo 48,362 500,928 12 E-Commerce Node Node Classification URL Paper
Sports-Fitness 173,055 1,773,500 13 E-Commerce Node Node Classification URL Paper
CitationV8 1,106,759 6,120,897 - Academic Node Link Prediction URL Paper
GoodReads 676,084 8,582,324 - E-Commerce Node Link Prediction URL Paper
Cora 2,708 21,112 Co-citation Node&Edge URL Paper
PubMed 19,717 44,338 Co-citation Node&Edge URL Paper
ArXiv 169,343 1,166,243 Citation Node&Edge URL Paper
WikiCS 11,701 216,123 Wikipedia page Node&Edge URL Paper
Product-subset 54,025 144,638 Co-purchase Node&Edge URL Paper
FB15K237 14,541 310,116 Knowledge graph Node&Edge URL Paper
WN18RR 40,943 93,003 Knowledge graph Node&Edge URL Paper
MovieLens-1M 9,923 2,000,418 Moive rating Node&Edge Recommendation URL Paper
URL Paper

More in paper, paper, paper

Textual-Edge Graphs in paper, paper

Bipartite Textual-Edge Graphs in paper, paper

Heterogeneous Text-Attributed Graphs in paper, paper, paper

Multiplex Text-Attributed Graphs in paper

Text-Attributed Hypergraph paper, paper

Dynamic Text-Attributed Graph paper paper

Will update datasets for anomaly detection, question-answering, etc.

Multi-Modal Graphs

Name #nodes #edges #labels Modality Task Source
Movies 16,672 218,390 20 Text, Vision Node Classification URL Paper
Toys 20,695 126,886 18 Text, Vision Node Classification URL Paper
Grocery 17,074 171,340 20 Text, Vision Node Classification URL Paper
Reddit-S 15,894 566,160 20 Text, Vision Node Classification URL Paper
Reddit-M 99,638 1,167,188 50 Text, Vision Node Classification URL Paper
Goodreads-NC 685,294 7,235,084 - Text, Vision Node Classification URL Paper
Ele-fashion 97,766 199,602 - Text, Vision Node Classification URL Paper
Amazon-Sports 50,250 356,202 - Text, Vision Link Prediction URL Paper
Amazon-Cloth 125,839 951,271 - Text, Vision Link Prediction URL Paper
Goodreads-LP 636,502 3,437,017 - Text, Vision Link Prediction URL Paper

More datasets in paper, paper, paper, paper

Graph-level Datasets

Plain Graphs Text-Attributed Graphs

Dataset Repositories

Name Type Collected by
SNAP Graphs & Networks Stanford
LAW Graphs & Networks UNIMI
BioSNAP Biomedical Networks Stanford
KONECT Graphs & Networks Jérôme Kunegis
Aminer Academic Networks AMiner
UCI Network Data Repository Graphs & Networks UCI Datalab
Network Repository Graphs & Networks -
Open Academic Graph Academic Networks Microsoft
Open Graph Benchmark Graphs & Networks Stanford
TuDatasets Graphs & Networks Christopher Morris, etc.
StreamingGraphs Streaming Graphs Yibo Yao
ARB Graphs & Networks Austin R. Benson
SuiteSparse Matrix Collection Matrix/Graphs TAMU
Web Data Commons Hyperlink Graphs/Web Tables/RDFa University of Mannheim
Yahoo Webscope Datasets Graphs/Ratings/Languages/Advertising Yahoo
UCI Machine Learning Repository Multivariate/Text/Time-Series UCI
Yelp Open Dataset businesses/reviews/user data Yelp
Recommender Systems Datasets graphs/interactions/reviews/ratings UCSD
MIcrosoft News Dataset user behavior logs Microsoft
Search Query Logs query logs Jeff Huang
AOL DS query logs Ricardo Campos
AWS - Amazon
Kaggle Datasets - Kaggle
OpenML - OpenML
Datasets - -
Netzschleuder - -

About

A curated list of graph datasets of various types, including plaingraphs, attributed graphs, bipartite graphs, text-attributed graphs, multi-modal graphs, temporal graphs, etc.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors