Skip to content

Fix/pdf unicode and dropdown#579

Open
speedo-345 wants to merge 13 commits intoAOSSIE-Org:mainfrom
speedo-345:fix/pdf-unicode-and-dropdown
Open

Fix/pdf unicode and dropdown#579
speedo-345 wants to merge 13 commits intoAOSSIE-Org:mainfrom
speedo-345:fix/pdf-unicode-and-dropdown

Conversation

@speedo-345
Copy link
Copy Markdown

@speedo-345 speedo-345 commented Mar 15, 2026

Description:

This PR fixes two issues in the Output component:

PDF Unicode ("Tofu") Fix: Embeds the NotoSans font into the PDF worker so foreign languages, math symbols, and complex characters render correctly instead of showing missing character boxes.

Dropdown Bug Fix: Fixes a UI bug where clicking unrelated buttons kept the PDF dropdown open. It scopes the outside-click listener strictly to the pdfTriggerButton ID instead of generically targeting all buttons.
Addressed Issues:

Fixes #(TODO:issue number)

Screenshots/Recordings:

TODO: If applicable, add screenshots or recordings that demonstrate the interface before and after the changes.
Additional Notes:

Replaced the generic !event.target.closest('button') check with a strict ID check and Node verification to prevent DOM errors. Embedding the custom font ensures pdf-lib does not crash when users generate quizzes with non-English text.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • [ x] This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

  • [ x] My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • [ x] My code follows the project's code style and conventions
  • [ x] If applicable, I have made corresponding changes or additions to the documentation
  • [x ] If applicable, I have made corresponding changes or additions to tests
  • [x ] My changes generate no new warnings or errors
  • [x ] I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • [x ] I have read the Contribution Guidelines
  • [ x] Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • [x ] I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • Documentation

    • Recommend Python 3.11 or 3.12 for install (3.13 incompatible with some ML libraries).
  • New Features

    • PDF export supports embedding custom fonts for improved rendering and international characters.
    • File uploader now accepts PDF, DOCX, and TXT.
  • Improvements

    • UI: font-loading feedback, disabled PDF action until ready, and improved dropdown behavior.
    • QA model initialization made more robust with guarded startup.
    • Relaxed torch version requirement (minimum updated).
  • Chores

    • Added a large dataset to .gitignore and added a runtime font library dependency.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a large backend tarball to .gitignore, documents Python 3.11/3.12 requirement, relaxes torch pin, adds a fontkit dependency, replaces generic QA pipeline with explicit transformers model/tokenizer/device initialization and guarded startup, and implements asynchronous font loading + embedding for PDF generation and tightened file input/route names.

Changes

Cohort / File(s) Summary
Config & Dependencies
/.gitignore, README.md, requirements.txt, eduaid_web/package.json
Ignores backend/s2v_reddit_2015_md.tar.gz; README mandates Python 3.11 or 3.12 (notes 3.13 incompatible); relaxes torch==2.5.1torch>=2.8.0; adds @pdf-lib/fontkit.
Backend — QA model init
backend/server.py
Replaces pipeline("question-answering") with explicit AutoTokenizer + AutoModelForQuestionAnswering, imports torch, sets device, constructs guarded pipeline at startup and records qa_model_init_error; safer dict access with output.get("Boolean_Questions", []).
Frontend — PDF font flow
eduaid_web/src/pages/Output.jsx, eduaid_web/src/workers/pdfWorker.js
Output.jsx loads font bytes async, gates PDF button and passes fontBytes to worker; pdfWorker registers fontkit, embeds custom font (falls back to standard fonts on error), and uses embedded font for all text rendering.
Frontend — Routing & Uploads
eduaid_web/src/App.js, eduaid_web/src/pages/TextInput.jsx
Renames imports/usages from Question_Type/Text_Input to QuestionType/TextInput; file input label updated and accept attribute restricted to .pdf,.docx,.txt.
Small edits
/.gitignore
Reformatted .DS_Store line and added single-line ignore entry for the backend tarball.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Output as Output.jsx
    participant FontLoader as Font Loader
    participant Worker as pdfWorker.js
    participant PDFLib as pdf-lib + fontkit

    User->>Output: Click "Generate PDF"
    Output->>FontLoader: async fetch/import font bytes
    FontLoader-->>Output: fontBytes or error

    Output->>Worker: postMessage({qaPairs, mode, logoBytes, fontBytes})
    Worker->>PDFLib: register fontkit & embed fontBytes (or use fallback)
    PDFLib-->>Worker: generated PDF bytes
    Worker-->>Output: PDF Blob
    Output->>User: createObjectURL -> trigger download -> revoke
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Poem

🐰 I fetched the font across the net,
I handed bytes into the worker's set,
Pages hummed with every line,
QA models warmed and shone—so fine,
Hopping joy: a PDF vignette. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix/pdf unicode and dropdown' directly relates to the main changes: PDF Unicode rendering with font embedding and dropdown interaction fixes in the Output component.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
eduaid_web/src/pages/Output.jsx (1)

19-34: Consider providing user feedback if font loading fails.

If the font fails to load (e.g., network error, missing asset), fontBytes remains null and the button stays disabled with "Loading Font..." indefinitely. Users won't understand why PDF generation is unavailable.

Consider adding a fallback or error state:

💡 Proposed enhancement
+  const [fontLoadError, setFontLoadError] = useState(false);

   useEffect(() => {
     const loadFont = async () => {
       try {
         const fontModule = await import("../assets/fonts/NotoSans-Regular.ttf");
         const resp = await fetch(fontModule.default);
         if (!resp.ok) {
           throw new Error(`HTTP ${resp.status}: ${resp.statusText}`);
         }
         const arrayBuffer = await resp.arrayBuffer();
         setFontBytes(arrayBuffer);
       } catch (e) {
         console.error("Failed to load Noto Sans font:", e);
+        setFontLoadError(true);
       }
     };
     loadFont();
   },[]);

Then update the button text (around line 431):

{fontLoadError ? 'Font Error' : !fontBytes ? 'Loading Font...' : 'Generate PDF'}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Output.jsx` around lines 19 - 34, The font loader
useEffect (loadFont) currently swallows failures and leaves fontBytes null; add
a new state like fontLoadError with setter (e.g., [fontLoadError,
setFontLoadError]) and in loadFont's catch setFontLoadError(true) (and
optionally setFontBytes to a fallback or empty ArrayBuffer), then update the UI
where fontBytes is used (the Generate PDF button logic) to show an explicit
error label such as {fontLoadError ? 'Font Error' : !fontBytes ? 'Loading
Font...' : 'Generate PDF'} and optionally expose a retry action that re-invokes
loadFont; reference loadFont, setFontBytes, fontBytes, and the new
fontLoadError/setFontLoadError in your changes.
backend/server.py (1)

50-64: Consider adding GPU support for inference consistency.

The ExtractiveQA class doesn't move the model to GPU when available, unlike MCQGenerator (in backend/Generator/main.py:30) which uses torch.device("cuda" if torch.cuda.is_available() else "cpu"). This may cause performance inconsistency on GPU-enabled systems.

💡 Proposed enhancement for GPU support
 class ExtractiveQA:
     def __init__(self, model_name="deepset/roberta-base-squad2"):
         self.tokenizer = AutoTokenizer.from_pretrained(model_name)
         self.model = AutoModelForQuestionAnswering.from_pretrained(model_name)
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.model.to(self.device)
     
     def __call__(self, question, context):
-        inputs = self.tokenizer(question, context, return_tensors="pt")
+        inputs = self.tokenizer(question, context, return_tensors="pt").to(self.device)
         with torch.no_grad():
             outputs = self.model(**inputs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 50 - 64, The ExtractiveQA class currently
doesn't use GPU; modify its __init__ to set self.device = torch.device("cuda" if
torch.cuda.is_available() else "cpu") and move the model to that device
(self.model.to(self.device)); in __call__, after tokenization, move input
tensors to the device (e.g., inputs = {k: v.to(self.device) for k,v in
inputs.items()}) before forwarding and computing logits, keeping the
torch.no_grad() context; ensure token decoding stays the same so qa_model
continues to work.
requirements.txt (1)

1-1: Consider using a version constraint range instead of open-ended minimum.

PyTorch 2.8.0 is available (latest stable is 2.10.0). However, using torch>=2.8.0 without an upper bound introduces unnecessary risk—major version updates can introduce breaking changes in tensor operations, model behavior, or CUDA compatibility that could silently affect inference results.

Consider either:

  1. Pinning to a specific version: torch==2.8.0
  2. Using a bounded range: torch>=2.8.0,<3.0.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` at line 1, The requirement line currently uses an
open-ended minimum "torch>=2.8.0"; change this to a safer constraint by either
pinning to a specific version or adding an upper bound (for example replace
"torch>=2.8.0" with a pinned "torch==2.8.0" or a bounded range like
"torch>=2.8.0,<3.0.0") so future major releases cannot introduce breaking
changes; update the single requirement line in requirements.txt accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Line 39: Update README.md to state that both Google Docs and Google Forms now
use a single service account file named credentials.json (the code's
SERVICE_ACCOUNT_FILE) and to show the required scopes (ensure FORMS_SCOPES /
auth configuration includes both documents.readonly and forms.body). Replace
references to separate files (service_account_key.json + credentials.json) with
one credentials.json, update any setup/installation steps and examples
accordingly, and note that the single service account must be created with both
Docs and Forms API access.

In `@README.md`:
- Around line 43-44: The Markdown list item is missing a space after the leading
hyphen on the line starting with "Note: Ensure you are using Python 3.11 or
3.12. Python 3.13 is currently incompatible with several required machine
learning libraries."; fix it by inserting a space after the hyphen so the line
becomes a properly formatted list item (leading "- " instead of "-").

---

Nitpick comments:
In `@backend/server.py`:
- Around line 50-64: The ExtractiveQA class currently doesn't use GPU; modify
its __init__ to set self.device = torch.device("cuda" if
torch.cuda.is_available() else "cpu") and move the model to that device
(self.model.to(self.device)); in __call__, after tokenization, move input
tensors to the device (e.g., inputs = {k: v.to(self.device) for k,v in
inputs.items()}) before forwarding and computing logits, keeping the
torch.no_grad() context; ensure token decoding stays the same so qa_model
continues to work.

In `@eduaid_web/src/pages/Output.jsx`:
- Around line 19-34: The font loader useEffect (loadFont) currently swallows
failures and leaves fontBytes null; add a new state like fontLoadError with
setter (e.g., [fontLoadError, setFontLoadError]) and in loadFont's catch
setFontLoadError(true) (and optionally setFontBytes to a fallback or empty
ArrayBuffer), then update the UI where fontBytes is used (the Generate PDF
button logic) to show an explicit error label such as {fontLoadError ? 'Font
Error' : !fontBytes ? 'Loading Font...' : 'Generate PDF'} and optionally expose
a retry action that re-invokes loadFont; reference loadFont, setFontBytes,
fontBytes, and the new fontLoadError/setFontLoadError in your changes.

In `@requirements.txt`:
- Line 1: The requirement line currently uses an open-ended minimum
"torch>=2.8.0"; change this to a safer constraint by either pinning to a
specific version or adding an upper bound (for example replace "torch>=2.8.0"
with a pinned "torch==2.8.0" or a bounded range like "torch>=2.8.0,<3.0.0") so
future major releases cannot introduce breaking changes; update the single
requirement line in requirements.txt accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 34282d80-ba8c-4453-b8a1-a72bb311ba41

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 1cb8b30.

⛔ Files ignored due to path filters (2)
  • eduaid_web/package-lock.json is excluded by !**/package-lock.json
  • eduaid_web/src/assets/fonts/NotoSans-Regular.ttf is excluded by !**/*.ttf
📒 Files selected for processing (9)
  • .gitignore
  • README.md
  • backend/server.py
  • eduaid_web/package.json
  • eduaid_web/src/pages/Output.jsx
  • eduaid_web/src/pages/QuestionType.jsx
  • eduaid_web/src/pages/TextInput.jsx
  • eduaid_web/src/workers/pdfWorker.js
  • requirements.txt

Comment thread backend/server.py Outdated
print("Starting Flask App...")

SERVICE_ACCOUNT_FILE = './service_account_key.json'
SERVICE_ACCOUNT_FILE = './credentials.json'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check README references to credential files
rg -n 'credentials\.json|service_account_key\.json' README.md

Repository: AOSSIE-Org/EduAid

Length of output: 172


🏁 Script executed:

sed -n '70,92p' README.md

Repository: AOSSIE-Org/EduAid

Length of output: 831


🏁 Script executed:

cat -n .gitignore | grep -E 'credentials\.json|service_account_key\.json'

Repository: AOSSIE-Org/EduAid

Length of output: 132


🏁 Script executed:

rg -n 'service_account_key|credentials\.json' backend/ --type py -A 2 -B 2

Repository: AOSSIE-Org/EduAid

Length of output: 1092


🏁 Script executed:

rg -n 'forms\.' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 276


🏁 Script executed:

rg -n 'forms' backend/ --type py

Repository: AOSSIE-Org/EduAid

Length of output: 795


🏁 Script executed:

sed -n '37,45p' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 330


🏁 Script executed:

sed -n '220,250p' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 1502


Update README.md to reflect consolidated credentials file usage.

The code now uses credentials.json for both Google Docs and Google Forms APIs (see SERVICE_ACCOUNT_FILE at line 39, with FORMS_SCOPES applied in the same file). However, README.md (lines 78-85) still documents two separate credential files—service_account_key.json for Google Docs and credentials.json for Google Forms. Update the documentation to clarify that both APIs now use a single credentials.json service account file with the necessary scopes for both Google Docs (documents.readonly) and Google Forms (forms.body).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` at line 39, Update README.md to state that both Google
Docs and Google Forms now use a single service account file named
credentials.json (the code's SERVICE_ACCOUNT_FILE) and to show the required
scopes (ensure FORMS_SCOPES / auth configuration includes both
documents.readonly and forms.body). Replace references to separate files
(service_account_key.json + credentials.json) with one credentials.json, update
any setup/installation steps and examples accordingly, and note that the single
service account must be created with both Docs and Forms API access.

Comment thread README.md Outdated
…licks

- Replaced the generic \!event.target.closest('button')\ check with a specific \id\ check (\pdfTriggerButton\).

- Added an \event.target instanceof Node\ safeguard to prevent errors when the event target is not a standard DOM element.
@speedo-345 speedo-345 force-pushed the fix/pdf-unicode-and-dropdown branch from 1cb8b30 to 8eec6cf Compare March 16, 2026 05:37
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
eduaid_web/src/pages/TextInput.jsx (1)

188-192: Optional cleanup: remove legacy commented-out JSX.

These “Original” commented lines are now obsolete and can be removed to keep the component lean.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/TextInput.jsx` around lines 188 - 192, Remove the
obsolete commented-out JSX in the TextInput component: delete the legacy "<p
className="text-white text-lg">Choose a file (PDF, MP3 supported)</p>" and the
commented "<input type="file" ref={fileInputRef} onChange={handleFileUpload}
..." lines so only the corrected label and the input with the accept attribute
remain; this cleanup targets the commented legacy markup around the file picker
(references: fileInputRef, handleFileUpload) in TextInput.jsx.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Line 20: Remove the stray merge marker ">>>>>>> Stashed changes" (it's a
syntax error) and properly initialize the imported qa_model so it exists when
endpoints use it; locate the import of qa_model and either assign it a valid
model instance at module startup (or set qa_model = None and lazily load/raise a
clear error before use), and ensure get_mcq_answer and get_answer check qa_model
is initialized (or rely on the startup initialization) to avoid NameError at
runtime.
- Around line 18-19: The endpoints get_mcq_answer and get_answer reference
qa_model but it is never defined at module scope; define and initialize qa_model
(and its tokenizer qa_tokenizer) during module import or lazily before first
use: load AutoModelForQuestionAnswering and AutoTokenizer with the chosen model
name, move the model to the appropriate device (CPU/GPU via torch.device), and
handle load errors so the global variables qa_model and qa_tokenizer are
available to both get_mcq_answer and get_answer; alternatively add a small init
function (e.g., init_qa_model) that sets these globals and call it at module
import or at the start of those endpoint functions if not yet initialized.

---

Nitpick comments:
In `@eduaid_web/src/pages/TextInput.jsx`:
- Around line 188-192: Remove the obsolete commented-out JSX in the TextInput
component: delete the legacy "<p className="text-white text-lg">Choose a file
(PDF, MP3 supported)</p>" and the commented "<input type="file"
ref={fileInputRef} onChange={handleFileUpload} ..." lines so only the corrected
label and the input with the accept attribute remain; this cleanup targets the
commented legacy markup around the file picker (references: fileInputRef,
handleFileUpload) in TextInput.jsx.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e05de6a8-788d-4a44-8472-2aa89513a8a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1cb8b30 and 8eec6cf.

⛔ Files ignored due to path filters (2)
  • eduaid_web/package-lock.json is excluded by !**/package-lock.json
  • eduaid_web/src/assets/fonts/NotoSans-Regular.ttf is excluded by !**/*.ttf
📒 Files selected for processing (7)
  • README.md
  • backend/server.py
  • eduaid_web/package.json
  • eduaid_web/src/App.js
  • eduaid_web/src/pages/Output.jsx
  • eduaid_web/src/pages/TextInput.jsx
  • eduaid_web/src/workers/pdfWorker.js
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • eduaid_web/src/workers/pdfWorker.js

Comment thread backend/server.py Outdated
Comment thread backend/server.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/server.py (1)

19-20: ⚠️ Potential issue | 🔴 Critical

Initialize the QA pipeline before first use (current code crashes at import time).

At Line [47], qa_model references undefined symbols (pipeline, qa_model_instance, qa_tokenizer, device). This raises immediately when importing the module, so the server won’t start.

💡 Proposed fix
-from transformers import AutoModelForQuestionAnswering, AutoTokenizer
+from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
 import torch
@@
-qa_model = pipeline("question-answering", model=qa_model_instance, tokenizer=qa_tokenizer, device=device)
+QA_MODEL_NAME = "deepset/roberta-base-squad2"
+qa_tokenizer = AutoTokenizer.from_pretrained(QA_MODEL_NAME)
+qa_model_instance = AutoModelForQuestionAnswering.from_pretrained(QA_MODEL_NAME)
+device = 0 if torch.cuda.is_available() else -1
+qa_model = pipeline(
+    "question-answering",
+    model=qa_model_instance,
+    tokenizer=qa_tokenizer,
+    device=device,
+)
#!/bin/bash
# Verify QA pipeline symbols are defined in backend/server.py
rg -nP 'from transformers import .*pipeline' backend/server.py
rg -nP '^\s*(qa_tokenizer|qa_model_instance|device|qa_model)\s*=' backend/server.py -n
rg -nP '\bqa_model\s*\(' backend/server.py -C2

Also applies to: 47-47

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 19 - 20, The module currently references
undefined symbols (pipeline, qa_model_instance, qa_tokenizer, device, qa_model)
at import time; fix by importing pipeline from transformers and initializing the
QA artifacts lazily or at startup: define device (e.g., via torch.device and GPU
index), create qa_tokenizer using AutoTokenizer.from_pretrained, create
qa_model_instance using AutoModelForQuestionAnswering.from_pretrained, then
build qa_model with pipeline('question-answering', model=qa_model_instance,
tokenizer=qa_tokenizer, device=...) — alternatively move this initialization
into a function like get_qa_model() that performs the above on first call and
caches the result to avoid raising during import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@eduaid_web/src/pages/Output.jsx`:
- Around line 418-443: The dropdown with id "pdfDropdown" is placed outside the
relative wrapper that contains the trigger button (id "pdfTriggerButton"), so
its absolute positioning isn't anchored correctly; move the <div
id="pdfDropdown"> so it lives inside the same relative container that wraps the
button (the div with class "relative w-full sm:w-auto") or extend that wrapper
to include both elements, ensuring the dropdown remains a child of the relative
element so its absolute positioning works as intended.

---

Duplicate comments:
In `@backend/server.py`:
- Around line 19-20: The module currently references undefined symbols
(pipeline, qa_model_instance, qa_tokenizer, device, qa_model) at import time;
fix by importing pipeline from transformers and initializing the QA artifacts
lazily or at startup: define device (e.g., via torch.device and GPU index),
create qa_tokenizer using AutoTokenizer.from_pretrained, create
qa_model_instance using AutoModelForQuestionAnswering.from_pretrained, then
build qa_model with pipeline('question-answering', model=qa_model_instance,
tokenizer=qa_tokenizer, device=...) — alternatively move this initialization
into a function like get_qa_model() that performs the above on first call and
caches the result to avoid raising during import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad76f502-08a8-4cab-8815-e2b5c653220a

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6cf and b89d33d.

📒 Files selected for processing (4)
  • README.md
  • backend/server.py
  • eduaid_web/src/pages/Output.jsx
  • requirements.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • requirements.txt

Comment thread eduaid_web/src/pages/Output.jsx Outdated
Reintroduced LLMQuestionGenerator and configured QA model with tokenizer and device settings.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/server.py (1)

19-20: ⚠️ Potential issue | 🔴 Critical

Import pipeline from transformers before using it at line 57.

The code constructs a QA pipeline at module scope (line 57) using pipeline(...), but pipeline is not imported from transformers on line 19. This causes a NameError during module import and prevents the server from starting.

Suggested fix
-from transformers import AutoModelForQuestionAnswering, AutoTokenizer
+from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 19 - 20, The module currently uses
pipeline(...) at import/module scope (the QA pipeline creation near where the
model/tokenizer are used, e.g., in the code that constructs the QA pipeline) but
never imports pipeline from transformers, causing a NameError; add an import for
pipeline from transformers alongside the existing imports (e.g., include
"pipeline" with AutoModelForQuestionAnswering and AutoTokenizer) so the pipeline
symbol is defined before the QA pipeline creation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 52-62: Wrap the model/tokenizer loading so a failure doesn't abort
import: catch exceptions around AutoTokenizer.from_pretrained(QA_MODEL_NAME) and
AutoModelForQuestionAnswering.from_pretrained(QA_MODEL_NAME) (and the
pipeline(...) call), log the error and set qa_tokenizer, qa_model_instance and
qa_model to None on failure; ensure qa_model is only created when all loads
succeed. Then update the two QA endpoints that use qa_model (the handlers for
/get_mcq_answer and /get_shortq_answer) to check if qa_model is None and return
HTTP 503 Service Unavailable with a clear error message instead of proceeding.
Apply the same defensive pattern (try/except on load and 503 when missing) to
any other routes that rely on optional external models or resources.

---

Duplicate comments:
In `@backend/server.py`:
- Around line 19-20: The module currently uses pipeline(...) at import/module
scope (the QA pipeline creation near where the model/tokenizer are used, e.g.,
in the code that constructs the QA pipeline) but never imports pipeline from
transformers, causing a NameError; add an import for pipeline from transformers
alongside the existing imports (e.g., include "pipeline" with
AutoModelForQuestionAnswering and AutoTokenizer) so the pipeline symbol is
defined before the QA pipeline creation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 330a3ba6-515d-45b7-874d-0b1aa1aa1905

📥 Commits

Reviewing files that changed from the base of the PR and between b89d33d and ce546cd.

📒 Files selected for processing (1)
  • backend/server.py

Comment thread backend/server.py Outdated
Comment on lines +52 to +62
QA_MODEL_NAME = "deepset/roberta-base-squad2"
qa_tokenizer = AutoTokenizer.from_pretrained(QA_MODEL_NAME)
qa_model_instance = AutoModelForQuestionAnswering.from_pretrained(QA_MODEL_NAME)
device = 0 if torch.cuda.is_available() else -1

qa_model = pipeline(
"question-answering",
model=qa_model_instance,
tokenizer=qa_tokenizer,
device=device
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Show the import-time QA initialization and its call sites.
sed -n '52,62p' backend/server.py
rg -nP '\bqa_model\s*\(' backend/server.py -C2

Repository: AOSSIE-Org/EduAid

Length of output: 865


🏁 Script executed:

# Check if qa_model initialization is already wrapped in try-except
cat -n backend/server.py | sed -n '50,70p'

Repository: AOSSIE-Org/EduAid

Length of output: 767


🏁 Script executed:

# Get full context around the qa_model call sites
cat -n backend/server.py | sed -n '205,215p'
cat -n backend/server.py | sed -n '230,245p'

Repository: AOSSIE-Org/EduAid

Length of output: 1342


🏁 Script executed:

# Check if there are any route definitions that use qa_model and how they handle errors
rg -nP '@(app|.*\.route)' backend/server.py -B1 -A5 | head -50

Repository: AOSSIE-Org/EduAid

Length of output: 1623


🏁 Script executed:

# Find the function containing line 210
cat -n backend/server.py | sed -n '180,220p'

Repository: AOSSIE-Org/EduAid

Length of output: 2137


🏁 Script executed:

# Check all routes and which ones use qa_model
rg -n 'def get_' backend/server.py | head -20

Repository: AOSSIE-Org/EduAid

Length of output: 450


🏁 Script executed:

# Search for any error handling around qa_model usage
rg -n 'try:|except:|qa_model' backend/server.py | grep -A2 -B2 'qa_model'

Repository: AOSSIE-Org/EduAid

Length of output: 413


🏁 Script executed:

# Check get_mcq_answer function completely
cat -n backend/server.py | sed -n '197,230p'

Repository: AOSSIE-Org/EduAid

Length of output: 1631


🏁 Script executed:

# Check get_answer and get_boolean_answer functions
cat -n backend/server.py | sed -n '231,262p'

Repository: AOSSIE-Org/EduAid

Length of output: 1226


🏁 Script executed:

# Check if there's any try-except or error handling for qa_model anywhere
rg -n 'get_mcq_answer|get_answer|get_boolean_answer' backend/server.py -A15 | head -60

Repository: AOSSIE-Org/EduAid

Length of output: 1850


Guard QA model initialization so one failed model load doesn't take down the whole API.

Lines 53–57 run from_pretrained() during module import without error handling. If the model is missing from cache or the load fails, backend/server.py never finishes importing and the entire server fails to start—even unrelated routes go down. Additionally, the two endpoints that use qa_model (/get_mcq_answer at line 210 and /get_shortq_answer at line 237) have no error handling and will crash if the model is unavailable.

Catch initialization failures and have the QA endpoints return 503 when qa_model is unavailable. For consistency, apply a similar pattern to other error-prone routes.

Suggested direction
 QA_MODEL_NAME = "deepset/roberta-base-squad2"
-qa_tokenizer = AutoTokenizer.from_pretrained(QA_MODEL_NAME)
-qa_model_instance = AutoModelForQuestionAnswering.from_pretrained(QA_MODEL_NAME)
 device = 0 if torch.cuda.is_available() else -1
-
-qa_model = pipeline(
-    "question-answering",
-    model=qa_model_instance,
-    tokenizer=qa_tokenizer,
-    device=device
-)
+qa_model = None
+qa_model_init_error = None
+
+try:
+    qa_tokenizer = AutoTokenizer.from_pretrained(QA_MODEL_NAME)
+    qa_model_instance = AutoModelForQuestionAnswering.from_pretrained(QA_MODEL_NAME)
+    qa_model = pipeline(
+        "question-answering",
+        model=qa_model_instance,
+        tokenizer=qa_tokenizer,
+        device=device,
+    )
+except Exception as exc:
+    qa_model_init_error = exc
+    app.logger.exception("Failed to initialize QA model: %s", exc)

Then have the two QA answer routes check if qa_model is None and return 503 Service Unavailable with an appropriate error message.

🧰 Tools
🪛 Ruff (0.15.6)

[error] 57-57: Undefined name pipeline

(F821)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 52 - 62, Wrap the model/tokenizer loading so
a failure doesn't abort import: catch exceptions around
AutoTokenizer.from_pretrained(QA_MODEL_NAME) and
AutoModelForQuestionAnswering.from_pretrained(QA_MODEL_NAME) (and the
pipeline(...) call), log the error and set qa_tokenizer, qa_model_instance and
qa_model to None on failure; ensure qa_model is only created when all loads
succeed. Then update the two QA endpoints that use qa_model (the handlers for
/get_mcq_answer and /get_shortq_answer) to check if qa_model is None and return
HTTP 503 Service Unavailable with a clear error message instead of proceeding.
Apply the same defensive pattern (try/except on load and 503 when missing) to
any other routes that rely on optional external models or resources.

Refactor QA model initialization with error handling.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
eduaid_web/src/pages/Output.jsx (1)

162-182: ⚠️ Potential issue | 🔴 Critical

Guard output access and tighten the MCQ source condition.

At Line [163] and Line [175], .forEach can run on an undefined qaPairsFromStorage["output"], causing a runtime crash. Also, Line [162] currently treats output as MCQ whenever output_mcq exists, which can produce mixed/duplicate datasets in non-MCQ flows.

💡 Proposed fix
-      if (qaPairsFromStorage["output_mcq"] || questionType === "get_mcq") {
-        qaPairsFromStorage["output"].forEach((qaPair) => {
+      if (
+        questionType === "get_mcq" &&
+        Array.isArray(qaPairsFromStorage["output"])
+      ) {
+        qaPairsFromStorage["output"].forEach((qaPair) => {
           combinedQaPairs.push({
             question: qaPair.question_statement,
             question_type: "MCQ",
             options: qaPair.options,
             answer: qaPair.answer,
             context: qaPair.context,
           });
         });
       }

-      if (questionType === "get_boolq") {
-        qaPairsFromStorage["output"].forEach((qaPair) => {
+      if (
+        questionType === "get_boolq" &&
+        Array.isArray(qaPairsFromStorage["output"])
+      ) {
+        qaPairsFromStorage["output"].forEach((qaPair) => {
           combinedQaPairs.push({
             question: qaPair,
             question_type: "Boolean",
           });
         });
-      } else if (qaPairsFromStorage["output"] && questionType !== "get_mcq") {
+      } else if (
+        Array.isArray(qaPairsFromStorage["output"]) &&
+        questionType !== "get_mcq"
+      ) {
         qaPairsFromStorage["output"].forEach((qaPair) => {
           combinedQaPairs.push({
             question:
               qaPair.question || qaPair.question_statement || qaPair.Question,
             options: qaPair.options,
             answer: qaPair.answer || qaPair.Answer,
             context: qaPair.context,
             question_type: "Short",
           });
         });
       }

Based on learnings: In eduaid_web/src/pages/Output.jsx, output_mcq and output must be coordinated as separate MCQ sources only in the intended questionType === "get_mcq" flow to avoid duplicates/order inconsistencies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Output.jsx` around lines 162 - 182, Guard access to
qaPairsFromStorage["output"] and only treat MCQ data as coming from "output"
when questionType === "get_mcq": check that qaPairsFromStorage &&
Array.isArray(qaPairsFromStorage["output"]) before calling .forEach, and change
the condition that currently uses qaPairsFromStorage["output_mcq"] to require
questionType === "get_mcq" (i.e. only push MCQ items into combinedQaPairs from
qaPairsFromStorage["output"] when questionType === "get_mcq"); similarly wrap
the boolean and fallback branches so they only iterate when
qaPairsFromStorage["output"] is a defined array, ensuring combinedQaPairs,
qaPairsFromStorage, questionType, output_mcq, and output logic stays consistent
and avoids duplicate/undefined iterations.
♻️ Duplicate comments (1)
backend/server.py (1)

51-69: ⚠️ Potential issue | 🟠 Major

QA initialization is guarded, but QA usage is still unguarded.

If model loading fails, qa_model remains None; calls at Line 216 and Line 243 then crash with a callable error (500) instead of a controlled 503.

🔧 Suggested follow-up guard
@@
 qa_model = None
 qa_model_init_error = None
@@
 except Exception as exc:
     qa_model_init_error = exc
     app.logger.exception("Failed to initialize QA model: %s", exc)
+
+def _qa_unavailable_response():
+    details = str(qa_model_init_error) if qa_model_init_error else "QA model unavailable"
+    return jsonify({"error": "QA model unavailable", "details": details}), 503
@@
 def get_mcq_answer():
+    if qa_model is None:
+        return _qa_unavailable_response()
@@
 def get_answer():
+    if qa_model is None:
+        return _qa_unavailable_response()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 51 - 69, The QA pipeline (qa_model) can be
None if initialization failed (qa_model_init_error set), so before any use of
qa_model in the request handlers replace direct calls with a guard: check
qa_model is not None (or check qa_model_init_error) and if the model is
unavailable return a 503 response with a clear message and log the
qa_model_init_error via app.logger.exception or app.logger.error; update the
handlers that call qa_model (where pipeline(...) is used downstream) to perform
this check and avoid invoking qa_model when None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 27-28: The /generate_gform route currently uses legacy symbols
(file.Storage, client.flow_from_clientsecrets, tools.run_flow, discovery.build,
Http) that are no longer imported and will raise NameError; update the handler
to use the new imports by replacing
file.Storage()/client.flow_from_clientsecrets()/tools.run_flow() flow with
sa.Credentials.from_service_account_file(...) to create credentials, replace
discovery.build(...) with google_build(...), and pass the resulting credentials
via the credentials parameter (e.g., google_build(service, version,
credentials=creds)) instead of using Http() authorization; modify
functions/variables in the route (look for generate_gform handler and any
variables named creds/credentials, storage, flow, or service) accordingly so the
code uses sa.Credentials.from_service_account_file and google_build with
credentials.

In `@eduaid_web/src/pages/Output.jsx`:
- Line 239: The direct DOM access document.getElementById('pdfDropdown') can be
null if the component unmounted before the async callback runs; guard this call
by first retrieving the element into a variable (e.g., const pdfDropdown =
document.getElementById('pdfDropdown')) and only call
pdfDropdown.classList.add('hidden') when pdfDropdown is non-null (truthy).
Update the code around the existing
document.getElementById('pdfDropdown').classList.add('hidden') invocation in
Output.jsx to perform this null-check to avoid runtime exceptions.

---

Outside diff comments:
In `@eduaid_web/src/pages/Output.jsx`:
- Around line 162-182: Guard access to qaPairsFromStorage["output"] and only
treat MCQ data as coming from "output" when questionType === "get_mcq": check
that qaPairsFromStorage && Array.isArray(qaPairsFromStorage["output"]) before
calling .forEach, and change the condition that currently uses
qaPairsFromStorage["output_mcq"] to require questionType === "get_mcq" (i.e.
only push MCQ items into combinedQaPairs from qaPairsFromStorage["output"] when
questionType === "get_mcq"); similarly wrap the boolean and fallback branches so
they only iterate when qaPairsFromStorage["output"] is a defined array, ensuring
combinedQaPairs, qaPairsFromStorage, questionType, output_mcq, and output logic
stays consistent and avoids duplicate/undefined iterations.

---

Duplicate comments:
In `@backend/server.py`:
- Around line 51-69: The QA pipeline (qa_model) can be None if initialization
failed (qa_model_init_error set), so before any use of qa_model in the request
handlers replace direct calls with a guard: check qa_model is not None (or check
qa_model_init_error) and if the model is unavailable return a 503 response with
a clear message and log the qa_model_init_error via app.logger.exception or
app.logger.error; update the handlers that call qa_model (where pipeline(...) is
used downstream) to perform this check and avoid invoking qa_model when None.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 64d8034b-2822-4acb-bd71-121ccb9c5025

📥 Commits

Reviewing files that changed from the base of the PR and between ce546cd and 24b167f.

📒 Files selected for processing (2)
  • backend/server.py
  • eduaid_web/src/pages/Output.jsx

Comment thread backend/server.py
Comment on lines +27 to +28
from googleapiclient.discovery import build as google_build
from google.oauth2 import service_account as sa
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify import/call-site mismatch for Google Forms auth client symbols.
rg -nP '^\s*from\s+googleapiclient\.discovery\s+import\s+build\s+as\s+google_build|^\s*from\s+google\.oauth2\s+import\s+service_account\s+as\s+sa|^\s*from\s+oauth2client\s+import|^\s*from\s+httplib2\s+import\s+Http' backend/server.py

# Show legacy symbols still used by generate_gform.
rg -nP '\b(file\.Storage|client\.flow_from_clientsecrets|tools\.run_flow|discovery\.build|Http\()' backend/server.py -C2

# Confirm usage of newly imported aliases.
rg -nP '\bgoogle_build\s*\(|\bsa\.' backend/server.py -C2

Repository: AOSSIE-Org/EduAid

Length of output: 699


🏁 Script executed:

# Check if the legacy symbols are imported anywhere in the file
rg -n '^\s*(from\s+(oauth2client|httplib2|googleapiclient)\s+import|import\s+(oauth2client|httplib2|googleapiclient))' backend/server.py

# Get more context around the generate_gform function to understand its scope
rg -n 'def generate_gform|@app\.route.*generate_gform' backend/server.py -A 50 | head -80

Repository: AOSSIE-Org/EduAid

Length of output: 2027


🏁 Script executed:

# Check if file, client, tools, discovery, Http are referenced anywhere else in the file
rg -n '\b(file|client|tools|discovery|Http)\b' backend/server.py

# Verify the file structure and imports at the top
head -50 backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 2566


/generate_gform references undefined OAuth symbols—code will crash at runtime.

Lines 294–305 use legacy oauth2client and httplib2 symbols (file, client, tools, discovery, Http) that are not imported. After switching to the new imports at lines 27–28, these symbols are undefined and will raise NameError when the route is hit.

Update the handler to use the new imports:

  • Replace file.Storage(), client.flow_from_clientsecrets(), and tools.run_flow() with sa.Credentials.from_service_account_file()
  • Replace discovery.build() with google_build()
  • Replace the Http() authorization pattern with the credentials parameter
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 27 - 28, The /generate_gform route currently
uses legacy symbols (file.Storage, client.flow_from_clientsecrets,
tools.run_flow, discovery.build, Http) that are no longer imported and will
raise NameError; update the handler to use the new imports by replacing
file.Storage()/client.flow_from_clientsecrets()/tools.run_flow() flow with
sa.Credentials.from_service_account_file(...) to create credentials, replace
discovery.build(...) with google_build(...), and pass the resulting credentials
via the credentials parameter (e.g., google_build(service, version,
credentials=creds)) instead of using Http() authorization; modify
functions/variables in the route (look for generate_gform handler and any
variables named creds/credentials, storage, flow, or service) accordingly so the
code uses sa.Credentials.from_service_account_file and google_build with
credentials.

document.body.removeChild(link);
URL.revokeObjectURL(url);

document.getElementById('pdfDropdown').classList.add('hidden');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Null-check #pdfDropdown before classList.add.

In async worker callbacks, the component may already be unmounted; direct access can throw.

💡 Proposed fix
-      document.getElementById('pdfDropdown').classList.add('hidden');
+      const dropdown = document.getElementById("pdfDropdown");
+      if (dropdown) {
+        dropdown.classList.add("hidden");
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
document.getElementById('pdfDropdown').classList.add('hidden');
const dropdown = document.getElementById("pdfDropdown");
if (dropdown) {
dropdown.classList.add("hidden");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Output.jsx` at line 239, The direct DOM access
document.getElementById('pdfDropdown') can be null if the component unmounted
before the async callback runs; guard this call by first retrieving the element
into a variable (e.g., const pdfDropdown =
document.getElementById('pdfDropdown')) and only call
pdfDropdown.classList.add('hidden') when pdfDropdown is non-null (truthy).
Update the code around the existing
document.getElementById('pdfDropdown').classList.add('hidden') invocation in
Output.jsx to perform this null-check to avoid runtime exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant