This repository contains the official implementation of ShallKnowβa framework for improving fact-checking over knowledge graphs by augmenting them with automatically extracted RDF triples ("shallow knowledge") from unstructured text.
ShallKnow enables more effective support or refutation of factual claims by increasing KG coverage with high-utility, external information.
| Step | Command / Notes |
|---|---|
| 1. Clone the repo & go to triple extraction folder | git clone https://github.com/factcheckerr/ShallKnow.gitcd ShallKnow/ cd TripleExtraction/ |
| 2. Start Docker containers (may take a few minutes to load) |
sudo docker compose up -d |
| 3. Start LLM (Ollama) container | 1. List running containers: sudo docker ps 2. Enter Ollama container shell (<CONTAINER_ID> where IMAGE is ollama/ollama:latest): sudo docker exec -it <CONTAINER_ID> bash 3. Inside container, pull and run model: ollama pull deepseek-r1:14b ollama run deepseek-r1:14b (Press Ctrl+D to exit the container shell) |
| 4. See logs (optional) | sudo docker pssudo docker logs <CONTAINER_ID(factchecker/nebulatp:1.1.41)> |
| 5. Test triple extraction API | curl --location --request POST 'http://localhost:5000/extract' --header 'Content-Type: application/x-www-form-urlencoded' --data-urlencode 'query=Edith Frank was married to Otto Frank and born in Frankfurt.' --data-urlencode 'components=triple_extraction' |
| Step | Command / Notes |
|---|---|
| Clone repo & create env | git clone https://github.com/factcheckerr/ShallKnow.gitcd ShallKnow |
python3 -m venv venvsource venv/bin/activatepip install -r requirements.txt |
|
| Install Ollama (for LLMs) | Ollama download & docs |
| Run DeepSeek LLM | ollama pull deepseek-r1:14bollama run deepseek-r1:14b |
| Run Entity-Centric Paragraph Simplification | python scripts/wikipedia_extractor_final.py deepseek-r1:14b |
| (Advanced) Triple Extraction API | See below for Docker-based API extraction and example curl calls |
- Recommended: 64 CPU cores, 64 GB RAM, 1ΓNVIDIA RTX 6000 Ada GPU
- Notes: A GPU is required for LLM and relation extraction (Relik) components.
git clone https://github.com/factcheckerr/ShallKnow.git
cd ShallKnow
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtollama pull deepseek-r1:14b
ollama run deepseek-r1:14b(See Ollama download if needed.)
Run the Entity-Centric Paragraph Simplification script:
python scripts/wikipedia_extractor_final.py deepseek-r1:14bTo extract new triples from unstructured text via API:
cd TripleExtraction
sudo docker compose upThen, run DeepSeek in the Ollama container:
sudo docker ps # Find the Ollama container ID
sudo docker exec -it <container_id> bash
# Inside the container:
ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b-
For a folder of preprocessed articles:
curl --location --request POST 'http://localhost:5000/dextract' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'query=folder:/your/path/to/preprocessed_folder' \ --data-urlencode 'components=triple_extraction'
-
For a single sentence or paragraph:
curl --location --request POST 'http://localhost:5000/extract' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'query=Edith Frank was married to Otto Frank and born in Frankfurt.' \ --data-urlencode 'components=triple_extraction'
Note: Use dextract for batch/folder processing or extract for a single text input.
Alternatively, use the script:
python scripts/extract_triples.pyAdjust the API endpoint in the script if needed (default: http://localhost:5000/extract).
A snapshot of the top properties in our extracted triples:
| Property | Count |
|---|---|
| wdt:P17 | 21,143 |
| wdt:P276 | 8,028 |
| ------------------ | ---------- |
| P-Located_in | 1,407 |
| P-Nationality | 844 |
| ------------------ | ---------- |
Full CSVs and charts are available in /Prediction_files_and_AUROC_graphs.
All datasets are provided on Zenodo.
- KnowledgeStream: Path-based plausibility scoring for RDF triples
- FAVEL: Benchmark fact-checking evaluation platform
- GERBIL: Standardized benchmarking of KG tasks
To reproduce results for all fact-validation approaches over large knowledge graphs, we provide an updated version of the Kstream-Graph-Transformer project. This tool transforms your KG for compatibility with large-scale path-based evaluation frameworks.
Before you begin:
- Download the latest Wikidata RDF dump.
- Append the extracted triples (G* or G** or both) provided in the
/Assertionsfolder to the Wikidata dump. - Specify the location of the combined KG file in the main configuration of the Kstream-Graph-Transformer project.
After transforming the KG, you can use FAVEL together with KnowledgeStream to run and evaluate the following baseline approaches:
- Katz (
katz) - PathEnt (
pathent) - SimRank (
simrank) - Adamic Adar (
adamic_adar) - Jaccard (
jaccard) - Degree Product (
degree_product) - PredPath (
predpath) - PRA (
pra)
For step-by-step instructions, refer to the documentation in each individual repository. The combination of these tools allows for reproducible evaluation and benchmarking in line with the results reported in our paper.
Note: For COPAAL, please refer to the COPAAL documentation for instructions on setting the KG as endpoint and running the approach.
If you use ShallKnow in your research, please cite:
# TODOTo be added later.
We welcome pull requests and issue reports! For questions and further contributions, please open an issue.
This project is licensed under the Creative Commons Attribution 4.0 International License.

