Skip to content

TheFinix13/NLP-coursework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-sequence-classification

BESSTIE: Sentiment & Sarcasm Classification across English Varieties
University of Surrey, Semester 2, 2026

Coursework checklist (tick in-repo): docs/coursework_checklist.md
Report outline: docs/report_outline.md · Trim guide: docs/REPORT_TRIM.md · Main notebook plan: docs/MAIN_NOTEBOOK_PLAN.md

Public mirror

This branch lives on a public mirror so anyone (incl. the marker) can clone without auth: https://github.com/momofahmi/NLP-sequence-classification (default branch: main).

The original team repo momofahmi/NLP-sequence-classification is private and may not be reachable from Colab without a GITHUB_TOKEN. The two repositories share content but the public one is the canonical source for running the notebooks end-to-end.

Colab

Open the canonical entry point on a T4 GPU runtime:

The first code cell in each training notebook clones this repo and runs pip install -r requirements.txt. The BESSTIE dataset loads from Hugging Face: surrey-nlp/BESSTIE-CW-26.

To override the clone target (e.g. to test a fork), set REPO_URL and/or REPO_BRANCH env vars before running cell 1. Notebooks 2.2 (RoBERTa) and 2.3 (LoRA) support DEMO_MODE: default is fast demo; set DEMO_MODE=0 before running for full experiments (see checklist).

Local setup

git clone https://github.com/momofahmi/NLP-sequence-classification.git
cd NLP-coursework
pip install -r requirements.txt
from datasets import load_dataset
ds = load_dataset("surrey-nlp/BESSTIE-CW-26")

Run the whole pipeline in one command

Two equivalent entry points:

Format Command Time
Notebook open notebooks/main.ipynb (Colab T4 recommended) ~40s default, ~80 min with all flags on
Pure-Python script python scripts/main.py same

Both load Joel's canonical results from reports/results/roberta_weighted/ and reports/results/roberta_sentiment/all_pool.json to reproduce every figure/table in the report. Set FROM_SCRATCH=True, RUN_ERROR_ANALYSIS=True, and RUN_BENCHMARK=True near the top of either file to retrain and re-run inference on a Colab T4.

Both files inline every helper function used by the team's domain notebooks (tokenize, WeightedTrainer, train_roberta, evaluate_on_testset, train_lora_adapter, etc.) so the marker can read the entire pipeline without jumping between files. Regenerate them with python scripts/build_main_notebook.py && python scripts/build_main_script.py.

About

Fiyins branch from the NLP Coursework on Fahmi

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors