No Need to Be a Know-It-All: Fact Checking with Shallow Knowledge

This repository contains the official implementation of ShallKnow—a framework for improving fact-checking over knowledge graphs by augmenting them with automatically extracted RDF triples ("shallow knowledge") from unstructured text.

ShallKnow enables more effective support or refutation of factual claims by increasing KG coverage with high-utility, external information.

🚀 Quick Try Shallow Knowledge Extraction

Step	Command / Notes
1. Clone the repo & go to triple extraction folder	`git clone https://github.com/factcheckerr/ShallKnow.git` `cd ShallKnow/` `cd TripleExtraction/`
2. Start Docker containers (may take a few minutes to load)	`sudo docker compose up -d`
3. Start LLM (Ollama) container	1. List running containers: `sudo docker ps` 2. Enter Ollama container shell (<CONTAINER_ID> where IMAGE is ollama/ollama:latest): `sudo docker exec -it <CONTAINER_ID> bash` 3. Inside container, pull and run model: `ollama pull deepseek-r1:14b` `ollama run deepseek-r1:14b` (Press Ctrl+D to exit the container shell)
4. See logs (optional)	`sudo docker ps` `sudo docker logs <CONTAINER_ID(factchecker/nebulatp:1.1.41)>`
5. Test triple extraction API	`curl --location --request POST 'http://localhost:5000/extract' --header 'Content-Type: application/x-www-form-urlencoded' --data-urlencode 'query=Edith Frank was married to Otto Frank and born in Frankfurt.' --data-urlencode 'components=triple_extraction'`

🚀 Quick Start complete pipeline

Step	Command / Notes
Clone repo & create env	`git clone https://github.com/factcheckerr/ShallKnow.git` `cd ShallKnow`
	`python3 -m venv venv` `source venv/bin/activate` `pip install -r requirements.txt`
Install Ollama (for LLMs)	Ollama download & docs
Run DeepSeek LLM	`ollama pull deepseek-r1:14b` `ollama run deepseek-r1:14b`
Run Entity-Centric Paragraph Simplification	`python scripts/wikipedia_extractor_final.py deepseek-r1:14b`
(Advanced) Triple Extraction API	See below for Docker-based API extraction and example `curl` calls

💻 Hardware Requirements

Recommended: 64 CPU cores, 64 GB RAM, 1×NVIDIA RTX 6000 Ada GPU
Notes: A GPU is required for LLM and relation extraction (Relik) components.

🔧 Installation

 git clone https://github.com/factcheckerr/ShallKnow.git
 cd ShallKnow  
 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt

🧪 Running Experiments

1. Start LLM (DeepSeek) with Ollama

ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b

(See Ollama download if needed.)

2. Entity-Centric Paragraph Simplification and KG Augmentation

Run the Entity-Centric Paragraph Simplification script:

python scripts/wikipedia_extractor_final.py deepseek-r1:14b

3 🔄 Triple Extraction API (Advanced)

To extract new triples from unstructured text via API:

cd TripleExtraction
sudo docker compose up

Then, run DeepSeek in the Ollama container:

sudo docker ps  # Find the Ollama container ID
sudo docker exec -it <container_id> bash
# Inside the container:
ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b

Calling the Triple Extraction API

For a folder of preprocessed articles:

curl --location --request POST 'http://localhost:5000/dextract' \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'query=folder:/your/path/to/preprocessed_folder' \
  --data-urlencode 'components=triple_extraction'

For a single sentence or paragraph:

curl --location --request POST 'http://localhost:5000/extract' \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'query=Edith Frank was married to Otto Frank and born in Frankfurt.' \
  --data-urlencode 'components=triple_extraction'

Note: Use dextract for batch/folder processing or extract for a single text input.

Example output:

3 Alternate approach

Alternatively, use the script:

python scripts/extract_triples.py

Adjust the API endpoint in the script if needed (default: http://localhost:5000/extract).

📊 Output Stats

A snapshot of the top properties in our extracted triples:

Property	Count
wdt:P17	21,143
wdt:P276	8,028
------------------	----------
P-Located_in	1,407
P-Nationality	844
------------------	----------

Full CSVs and charts are available in /Prediction_files_and_AUROC_graphs.

📚 Additional Resources

Datasets

All datasets are provided on Zenodo.

Supporting Tools

KnowledgeStream: Path-based plausibility scoring for RDF triples
FAVEL: Benchmark fact-checking evaluation platform
GERBIL: Standardized benchmarking of KG tasks

🏆 Reproducing Results for Competing Approaches

To reproduce results for all fact-validation approaches over large knowledge graphs, we provide an updated version of the Kstream-Graph-Transformer project. This tool transforms your KG for compatibility with large-scale path-based evaluation frameworks.

Before you begin:

Download the latest Wikidata RDF dump.
Append the extracted triples (G* or G** or both) provided in the /Assertions folder to the Wikidata dump.
Specify the location of the combined KG file in the main configuration of the Kstream-Graph-Transformer project.

After transforming the KG, you can use FAVEL together with KnowledgeStream to run and evaluate the following baseline approaches:

Katz (katz)
PathEnt (pathent)
SimRank (simrank)
Adamic Adar (adamic_adar)
Jaccard (jaccard)
Degree Product (degree_product)
PredPath (predpath)
PRA (pra)

For step-by-step instructions, refer to the documentation in each individual repository. The combination of these tools allows for reproducible evaluation and benchmarking in line with the results reported in our paper.

Note: For COPAAL, please refer to the COPAAL documentation for instructions on setting the KG as endpoint and running the approach.

📜 Citation

If you use ShallKnow in your research, please cite:

# TODO

🙏 Acknowledgements

To be added later.

🤝 Contributing and Support

We welcome pull requests and issue reports! For questions and further contributions, please open an issue.

License

This project is licensed under the Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Assertions		Assertions
Predicates		Predicates
Prediction_files_and_AUROC_graphs		Prediction_files_and_AUROC_graphs
Prompts		Prompts
StatisticalTests		StatisticalTests
TripleExtraction		TripleExtraction
scripts		scripts
utils		utils
Expert_Human_evaluation_assertions.xlsx		Expert_Human_evaluation_assertions.xlsx
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

No Need to Be a Know-It-All: Fact Checking with Shallow Knowledge

🚀 Quick Try Shallow Knowledge Extraction

🚀 Quick Start complete pipeline

💻 Hardware Requirements

🔧 Installation

🧪 Running Experiments

1. Start LLM (DeepSeek) with Ollama

2. Entity-Centric Paragraph Simplification and KG Augmentation

3 🔄 Triple Extraction API (Advanced)

Calling the Triple Extraction API

3 Alternate approach

📊 Output Stats

📚 Additional Resources

Datasets

Supporting Tools

🏆 Reproducing Results for Competing Approaches

📜 Citation

🙏 Acknowledgements

🤝 Contributing and Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

No Need to Be a Know-It-All: Fact Checking with Shallow Knowledge

🚀 Quick Try Shallow Knowledge Extraction

🚀 Quick Start complete pipeline

💻 Hardware Requirements

🔧 Installation

🧪 Running Experiments

1. Start LLM (DeepSeek) with Ollama

2. Entity-Centric Paragraph Simplification and KG Augmentation

3 🔄 Triple Extraction API (Advanced)

Calling the Triple Extraction API

3 Alternate approach

📊 Output Stats

📚 Additional Resources

Datasets

Supporting Tools

🏆 Reproducing Results for Competing Approaches

📜 Citation

🙏 Acknowledgements

🤝 Contributing and Support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages