- Collect a small dataset (FAERS CSV + PubMed abstracts).
- Write script to clean text (lowercase, remove symbols, sentence split with SpaCy).
- Store cleaned text in data/cleaned/.
- Implement NER (SciSpacy/Med7) to detect drugs and medical events.
- Output results as JSON: {drug: "X", event: "Y"}.
- Save outputs in results/extractions.json.
- Build a simple sentence classifier (ADE vs Non-ADE).
- Start with TF-IDF + logistic regression (scikit-learn).
- Expose it as a Python function: classify_sentence(text) -> label, confidence.
-
Set up FastAPI backend.
-
Create endpoints:
- /extract → runs Person 2’s pipeline.
- /classify → runs Person 3’s classifier.
-
Make sure it returns JSON responses.
- Build a simple React/Next.js frontend.
- Text box → send text to backend /extract + /classify.
- Highlight drugs in blue, ADEs in red.
-
Take Person 2’s extraction results.
-
Compute basic stats:
- Frequency of ADEs.
- Top drugs with ADEs.
-
Show results as bar chart / table in frontend.
- Person 1 finishes data prep.
- Person 2 + Person 3 use cleaned data to build models.
- Person 4 integrates extraction + classification into backend.
- Person 5 connects frontend to backend.
- Person 6 adds stats/visualization once extraction results are available.
- Data → NER → Classification → API → UI → Stats.