| title | GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tags |
|
||||||||||||
| authors |
|
||||||||||||
| affiliations |
|
||||||||||||
| date | 25 Feb 2024 | ||||||||||||
| bibliography | paper.bib |
We introduce GraphSL, a new library for studying the graph source localization problem. graph diffusion and graph source localization are inverse problems in nature: graph diffusion predicts information diffusions from information sources, while graph source localization predicts information sources from information diffusions. GraphSL facilitates the exploration of various graph diffusion models for simulating information diffusions and enables the evaluation of cutting-edge source localization approaches on established benchmark datasets. The source code of GraphSL is made available at Github Repository. Bug reports and feedback can be directed to the Github issues page.
Graph diffusion is a fundamental task in graph learning, which aims to predict future information diffusions given information sources. Its inverse problem is graph source localization, which is an extremely important topic even though rarely explored: it focuses on the detection of information sources given their future information diffusions. As illustrated in \autoref{fig:example}, graph diffusion seeks to predict the information diffusion
Due to its importance, some open-source tools have been developed to support the research of the graph source localization problem. Two recent examples are cosasi [@McCabe2022joss] and RPaSDT [@frkaszczak2022rpasdt]. However, they do not support various simulations of information diffusion, and they also miss real-world benchmark datasets and state-of-the-art source localization approaches. To fill this gap, we propose a new library GraphSL: the first one to include real-world benchmark datasets and recent source localization methods to our knowledge, enabling researchers and practitioners to evaluate novel techniques against appropriate baselines easily. These methods do not require prior assumptions about the source (e.g. single source or multiple sources) and can handle graph source localization based on various diffusion simulation models such as Independent Cascade (IC) and Linear Threshold (LT) [@shakarian2015independent]. Our GraphSL library is standardized: for instance, tests of all source inference methods return a Metric object, which provides five performance metrics (accuracy, precision, recall, F-score, and area under ROC curve) for performance evaluation.
Our proposed GraphSL library targets both developers and practical users: they are free to add algorithms and datasets for personal needs by following the guidelines in the "Contact" section of README.md.
The structure of our GraphSL library is depicted in \autoref{fig:overview}. Existing methods can be categorized into two groups: Prescribed methods and Graph Neural Networks (GNN)-based methods.
Prescribed methods rely on hand-crafted rules and heuristics. For instance, LPSI assumes that nodes surrounded by larger proportions of infected nodes are more likely to be source nodes [@wang2017multiple]. NetSleuth employs the Minimum Description Length principle to identify the optimal set of source nodes and virus propagation ripple [@prakash2012spotting]. OJC identifies a set of nodes (Jordan cover) that cover all observed infected nodes with the minimum radius [@zhu2017catch].
GNN-based methods learn rules from graph data in an end-to-end manner by capturing graph topology and neighboring information. For example, GCNSI utilizes LPSI to enhance input and then applies Graph Convolutional Networks (GCN) for source identification [@dong2019multiple]; IVGD introduces a graph residual scenario to make existing graph diffusion models invertible, and it devises a new set of validity-aware layers to project inferred sources to feasible regions [@IVGD_www22]. SLVAE uses forward diffusion estimation and deep generative models to approximate source distribution, leveraging prior knowledge for generalization under arbitrary diffusion patterns [@ling2022source].
| Dataset | #Node | #Edge |
|---|---|---|
| Karate [@lusseau2003bottlenose] | 34 | 78 |
| Dolphins [@lusseau2003bottlenose] | 62 | 159 |
| Jazz [@gleiser2003community] | 198 | 2,742 |
| Network Science [@newman2006finding] | 1,589 | 2,742 |
| Cora-ML [@mccallum2000automating] | 2,810 | 7,981 |
| Power Grid [@watts1998collective] | 4,941 | 6,594 |
Table: \label{tab:statistics} Six benchmark graph datasets: their numbers of nodes and edges.
Aside from methods, we also release six benchmark graph datasets to facilitate the research of graph source localization, whose statistics are shown in \autoref{tab:statistics}. Information sources and diffusions can be generated by the function diffusion_generation.
GraphSL is available under the MIT License. The library may be cloned from the GitHub repository, or can be installed by pip: pip install GraphSL. Documentation is provided via Read the Docs, including a quickstart introducing major functionality and a detailed API reference. Extensive unit testing is employed throughout the library.

