CS329s Misinformation Detection App

Summary of Approach:

Train a transformer model to generate embededings from query strings using data from the LIAR dataset, a collection of politifact statements labeled by their truthfulness.
Use this model to generate an embedding space of the LIAR dataset examples
Then, given a query we can generate and embedding, match it to the K-closest embeddings in our embedding space, and the utilize a voting based approach to generate inferences (e.g. if 2/3 votes are 'true' we will return the label true!)

Model Information:

For our embedding model we finetuned a DistilBERT (https://arxiv.org/pdf/1910.01108.pdf) model on the LIAR dataset
For our prediction model, we used a KNN model with K=3 (breaking ties randomly) on the embedding space generated by our embedding model over the training examples in the LIAR dataset

Dataset Information:

We used the LIAR dataset (https://huggingface.co/datasets/liar) for training and evaluating our model
Because the LIAR dataset is quite challenging (highest accuracy reported in the original paper was 27%) we used a modified version of the dataset where we combined the labels as follows: (pants-fire, false) -> false, (barely-true, half-true) -> unsure, (mostly-true, true) -> true
All reported metrics are based off this label-joining scheme

Instructions to Use This Repo

Setup

use python3 -m venv env to create a virtual environment
activate the virtual environment with source env/bin/activate and then pip install -r requirements.txt to install all the python dependencies
- only tested for python 3.6
- you may need to pip install -U pip to upgrade the pip version, as well as pip install wheel
- might need to sudo apt install libomp5 libomp-dev on linux or brew install libomp on mac for faiss
install the project as a package (for imports to work) with pip install -e .
setup wandb account and get API key (https://wandb.ai/), login using the commandwandb login

Training Models

Run python backend/train.py which should create two models embedding_mode.pt and prediction_model.pt in the saved_models directory
- IMPORTANT: you will need these models for the future steps
- use config/config_default to configure your run, these settings will be read by wandb
- to run a sweep, simply configure config/sweep.py, go to the backend directory, run wandb sweep config/sweep.py and then copy and paste the command that is outputted to launch the wandb agent (prepend the agent launch command with nohup if the process is dying after a while).
run python evaluate.py to evaluate the saved models
- run python evaluate.py --model_path PATH_TO_MODEL to evaluate using the embedding model and the FAISS index + majority voter as the prediction model
- run python evaluate.py --artifact ARTIFACT_NAME to run evaluation using a wandb artifact (e.g. daily-tree-15-3-labels:v4)

Running the Backend Application

use ray start --head to start the ray cluster
- Note: if you receive a message about redis failing to start, try this ray-project/ray#6146
use python backend/serve.py to start the application (the application will run until ray stop is called)
- Use the option --detached to have the serve instance run in the background.
use ray stop to kill the ray cluster

Running the Frontend Application

Use the command streamlit run frontend/app.py.

Testing

use ray start --head to start the ray cluster
try some tests.
- e.g. python backend/tests/test.py --test_name throughput -n 100 launches the application and tries to make 100 POST requests to the application and measures the throughput.
use ray stop to kill the ray cluster

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
backend		backend
frontend		frontend
notebooks		notebooks
utils		utils
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS329s Misinformation Detection App

Summary of Approach:

Model Information:

Dataset Information:

Instructions to Use This Repo

Setup

Training Models

Running the Backend Application

Running the Frontend Application

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CS329s Misinformation Detection App

Summary of Approach:

Model Information:

Dataset Information:

Instructions to Use This Repo

Setup

Training Models

Running the Backend Application

Running the Frontend Application

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages