NLP-sequence-classification

BESSTIE: Sentiment & Sarcasm Classification across English Varieties
University of Surrey, Semester 2, 2026

Coursework checklist (tick in-repo): docs/coursework_checklist.md
Report outline: docs/report_outline.md · Trim guide: docs/REPORT_TRIM.md · Main notebook plan: docs/MAIN_NOTEBOOK_PLAN.md

Public mirror

This branch lives on a public mirror so anyone (incl. the marker) can clone without auth: https://github.com/momofahmi/NLP-sequence-classification (default branch: main).

The original team repo momofahmi/NLP-sequence-classification is private and may not be reachable from Colab without a GITHUB_TOKEN. The two repositories share content but the public one is the canonical source for running the notebooks end-to-end.

Colab

Open the canonical entry point on a T4 GPU runtime:

Main pipeline (run-everything): notebooks/main.ipynb
1.1 EDA: notebooks/1.1_EDA_Distributions_Yusrah_Omar.ipynb
2.1 Baseline TF-IDF: notebooks/2.1_Baseline_TFIDF_LogReg_Yusrah_Omar.ipynb
2.2 RoBERTa cross-variety: notebooks/2.2_RoBERTa_CrossVariety_Joel_Fiyin.ipynb
2.3 LoRA: notebooks/2.3_LoRA_Adapters_Mohamed.ipynb
Deployment (Gradio): notebooks/run_deployment_colab.ipynb

The first code cell in each training notebook clones this repo and runs pip install -r requirements.txt. The BESSTIE dataset loads from Hugging Face: surrey-nlp/BESSTIE-CW-26.

To override the clone target (e.g. to test a fork), set REPO_URL and/or REPO_BRANCH env vars before running cell 1. Notebooks 2.2 (RoBERTa) and 2.3 (LoRA) support DEMO_MODE: default is fast demo; set DEMO_MODE=0 before running for full experiments (see checklist).

Local setup

git clone https://github.com/momofahmi/NLP-sequence-classification.git
cd NLP-coursework
pip install -r requirements.txt

from datasets import load_dataset
ds = load_dataset("surrey-nlp/BESSTIE-CW-26")

Run the whole pipeline in one command

Two equivalent entry points:

Format	Command	Time
Notebook	open `notebooks/main.ipynb` (Colab T4 recommended)	~40s default, ~80 min with all flags on
Pure-Python script	`python scripts/main.py`	same

Both load Joel's canonical results from reports/results/roberta_weighted/ and reports/results/roberta_sentiment/all_pool.json to reproduce every figure/table in the report. Set FROM_SCRATCH=True, RUN_ERROR_ANALYSIS=True, and RUN_BENCHMARK=True near the top of either file to retrain and re-run inference on a Colab T4.

Both files inline every helper function used by the team's domain notebooks (tokenize, WeightedTrainer, train_roberta, evaluate_on_testset, train_lora_adapter, etc.) so the marker can read the entire pipeline without jumping between files. Regenerate them with python scripts/build_main_notebook.py && python scripts/build_main_script.py.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
app		app
docs		docs
models		models
notebooks		notebooks
reports		reports
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-sequence-classification

Public mirror

Colab

Local setup

Run the whole pipeline in one command

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP-sequence-classification

Public mirror

Colab

Local setup

Run the whole pipeline in one command

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages