Skip to content

Fix: robust timeout handling for ML endpoints with multiprocessing and parallel execution#596

Open
mahek2016 wants to merge 20 commits intoAOSSIE-Org:mainfrom
mahek2016:fix/proper-timeout-handling
Open

Fix: robust timeout handling for ML endpoints with multiprocessing and parallel execution#596
mahek2016 wants to merge 20 commits intoAOSSIE-Org:mainfrom
mahek2016:fix/proper-timeout-handling

Conversation

@mahek2016
Copy link

@mahek2016 mahek2016 commented Mar 17, 2026

Addressed Issues:

Fixes #592


Screenshots/Recordings:

N/A (Backend improvement – no UI changes)


Additional Notes:

This PR introduces robust timeout handling for ML endpoints to prevent indefinite execution and resource exhaustion.

Key improvements include:

  • Replaced threading with multiprocessing to ensure actual termination of long-running tasks
  • Added execute_with_timeout() wrapper for safe execution
  • Converted /get_problems endpoint to parallel execution using ThreadPoolExecutor
  • Implemented proper HTTP status handling (400, 500, 504)
  • Added input validation for text and question parameters
  • Fixed response structure to avoid returning raw model outputs
  • Added handling for partial failures and edge cases

This ensures better performance, stability, and user experience.


AI Usage Disclosure:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools:

  • ChatGPT (OpenAI)

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features

    • YouTube subtitle retrieval and transcript fetching
    • New endpoints for MCQ, short-answer, and boolean question generation, including hard variants
    • Content endpoint accepts document URLs or IDs
  • Improvements

    • Uniform JSON error responses and clearer error messaging
    • Enhanced input validation and global payload size limits
    • Per-endpoint timeouts, parallelized multi-question generation, and request rate limiting

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Warning

Rate limit exceeded

@mahek2016 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 20 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c65e4e13-a179-463e-ac22-18ec43ada43a

📥 Commits

Reviewing files that changed from the base of the PR and between 6704c41 and 1dc9c9b.

📒 Files selected for processing (1)
  • backend/server.py
📝 Walkthrough

Walkthrough

Adds a subprocess timeout executor, per-endpoint rate limiting and 2 MB payload cap, centralized JSON error handlers, robust input validation, timeout-wrapped generator calls (with parallelization for /get_problems), YouTube subtitle retrieval via yt-dlp, and new/updated endpoints returning uniform success/error shapes.

Changes

Cohort / File(s) Summary
Core backend
backend/server.py
Adds execute_with_timeout(func, timeout, ...); integrates Flask-Limiter and 2MB request size limit; centralized JSON error handlers; input validation and silent JSON parsing; wraps model/generator/yt-dlp calls with timeouts and uniform error payloads; parallelizes MCQ/BoolQ/ShortQ in /get_problems; adds hard-variant endpoints and form generation.
Dependencies
requirements.txt
Adds Flask-Limiter==3.5.0 and yt-dlp; normalizes google-auth entry.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Flask as Flask App
    participant Limiter as Rate Limiter
    participant Executor as Timeout Executor
    participant Worker as Subprocess Worker
    participant External as ML Model / yt-dlp / External API

    Client->>Flask: POST /get_mcq or /get_problems
    Flask->>Limiter: check per-endpoint limit
    alt limited
        Limiter-->>Client: 429 (JSON error)
    else allowed
        Flask->>Executor: execute_with_timeout(func, timeout, args...)
        Executor->>Worker: spawn subprocess to run func
        Worker->>External: call model / yt-dlp / external API
        alt completes
            External-->>Worker: result
            Worker-->>Executor: return result
            Executor-->>Flask: (result, None)
            Flask-->>Client: 200 (structured JSON)
        else timeout / error
            Worker-->>Executor: error/timeout
            Executor-->>Flask: (None, error)
            Flask-->>Client: 504/400/500 (structured JSON)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I time my hops with tiny ticking feet,
Spawn subprocesses so no call can sleep.
I fetch subtitles and craft forms with care,
Parallel problems bloom in the spring air.
A rabbit nods—responses neat and fair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: introducing robust timeout handling for ML endpoints with multiprocessing and parallel execution.
Linked Issues check ✅ Passed The PR addresses all key requirements from issue #592: adds backend timeout wrappers via execute_with_timeout, implements input validation, adds proper HTTP 504 status handling, and improves response structure.
Out of Scope Changes check ✅ Passed All changes are aligned with issue #592 objectives. The introduction of Flask-Limiter adds rate limiting (a robustness improvement), yt-dlp enables YouTube subtitle retrieval, and endpoint hardening improvements are in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/server.py (2)

483-487: ⚠️ Potential issue | 🔴 Critical

OAuth interactive flow will fail in server/production context.

tools.run_flow() opens a browser for interactive OAuth consent. This:

  1. Blocks the request thread indefinitely waiting for user interaction
  2. Fails entirely in headless server environments
  3. Creates race conditions if multiple requests trigger auth simultaneously

For server-to-server auth, use a service account or pre-authorize and store refresh tokens.

🔧 Suggested approach using service account
from google.oauth2 import service_account

SCOPES = ["https://www.googleapis.com/auth/forms.body"]
creds = service_account.Credentials.from_service_account_file(
    "service_account_key.json", scopes=SCOPES
)

form_service = discovery.build("forms", "v1", credentials=creds, ...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 483 - 487, The current code uses the
interactive OAuth flow (client.flow_from_clientsecrets and tools.run_flow) with
file.Storage and creds, which will block/fail in server environments; replace
this with non-interactive server auth by loading service account credentials
(use google.oauth2.service_account.Credentials.from_service_account_file with
the SCOPES) and pass that creds into discovery.build when creating the Forms API
client, removing the tools.run_flow/client.flow_from_clientsecrets logic and
file.Storage usage; alternatively, if using a user account, load a
pre-authorized refresh token from secure storage and refresh it programmatically
instead of calling tools.run_flow.

644-672: ⚠️ Potential issue | 🟠 Major

Missing timeout protection and input validation on /get_shortq_hard (and similar _hard endpoints).

These endpoints call qg.generate() directly without execute_with_timeout, contradicting the PR's objective to add timeout protection to all ML endpoints. They also lack the input length validation (10-50000 chars) applied to other endpoints.

Additionally, num_questions=len(input_questions) will be 0 if input_questions is empty, potentially causing unexpected behavior.

🔧 Proposed fix for /get_shortq_hard (apply similarly to _mcq_hard and _boolq_hard)
 def get_shortq_hard():
     data = request.get_json(silent=True)

     if data is None or not isinstance(data, dict):
         return jsonify({...}), 400

     input_text = data.get("input_text", "")
     use_mediawiki = data.get("use_mediawiki", 0)
+    input_questions = data.get("input_question", [])
+
+    if not isinstance(input_text, str):
+        return jsonify({"error": "input_text must be string"}), 400
+
+    input_text = input_text.strip()
+
+    if len(input_text) < 10:
+        return jsonify({"error": "Input too short"}), 400
+
+    if len(input_text) > 50000:
+        return jsonify({"error": "Input too long"}), 400
+
+    if not input_questions:
+        return jsonify({"error": "input_question is required"}), 400

     input_text = process_input_text(input_text, use_mediawiki)
-    input_questions = data.get("input_question", [])

-    output = qg.generate(...)
+    output, error = execute_with_timeout(
+        qg.generate,
+        60,
+        article=input_text,
+        num_questions=len(input_questions),
+        answer_style="sentences"
+    )
+
+    if error == "timeout":
+        return jsonify({"error": "Request timed out"}), 504
+    if error:
+        return jsonify({"error": error}), 500
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 644 - 672, The /get_shortq_hard handler calls
qg.generate directly and lacks input length checks and protection against zero
num_questions; wrap the qg.generate(...) call in the existing
execute_with_timeout(...) helper (use the same timeout semantics used by other
ML endpoints), validate processed input_text length is between 10 and 50000
characters and return a 400 JSON error if outside that range, and ensure
num_questions is never zero (e.g., num_questions = max(1, len(input_questions))
or derive a sensible default) before passing it to qg.generate; preserve
post-processing with make_question_harder and propagate execute_with_timeout
errors the same way other endpoints do.
🧹 Nitpick comments (4)
backend/server.py (3)

580-588: Simplify redundant key check.

The static analyzer flagged this pattern. Using .get() is cleaner.

♻️ Proposed fix
-            if "options" in qapair and qapair["options"]:
-                options = qapair.get("options") or []
+            options = qapair.get("options")
+            if options:
                 valid_options = [opt for opt in options if opt]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 580 - 588, The current code redundantly
checks "if 'options' in qapair and qapair['options']"; replace that pattern by
always reading options = qapair.get("options") or [] and proceed — compute
valid_options = [opt for opt in options if opt], build choices =
[qapair["answer"], *valid_options[:3]] (or ensure answer-only if no options),
shuffle and create choices_list as before; update the block around qapair,
options, valid_options, choices and choices_list to remove the explicit key
presence check and rely on .get().

456-464: Overly broad exception handling obscures root cause.

Catching bare Exception returns a generic 400 error for all failures, but different errors warrant different responses:

  • Authentication/permission errors → 403
  • Document not found → 404
  • Network/server errors → 500
♻️ Proposed improvement
     try:
         content = docs_service.get_document_content(doc_id)
         return jsonify({"content": content})
-
-    except Exception:
+    except HttpError as e:
+        if e.resp.status == 404:
+            return jsonify({"error": "Document not found", "code": "not_found"}), 404
+        elif e.resp.status == 403:
+            return jsonify({"error": "Permission denied", "code": "forbidden"}), 403
+        return jsonify({"error": "Failed to retrieve document", "code": "document_fetch_error"}), e.resp.status
+    except Exception as e:
         return jsonify({
-            "error": "Failed to retrieve document content",
+            "error": f"Failed to retrieve document content: {type(e).__name__}",
             "code": "document_fetch_error"
-        }), 400
+        }), 500

This requires importing HttpError from googleapiclient.errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 456 - 464, Replace the broad except Exception
around the docs_service.get_document_content call with specific handlers: import
HttpError from googleapiclient.errors and catch HttpError first, inspect the
HTTP status (e.g., e.resp.status or e.status_code) to return 403 for 403
responses, 404 for 404 responses, and 500 for other HttpError statuses; also add
explicit except FileNotFoundError to return 404 and except PermissionError to
return 403, finally keep a generic except Exception that returns a 500 and logs
the exception for unexpected errors so the endpoint
(docs_service.get_document_content) returns appropriate status codes and
preserves root-cause details.

365-378: Consider adding timeout protection to QA model calls.

The /get_mcq_answer, /get_shortq_answer, and /get_boolean_answer endpoints call qa_model() and answer.predict_boolean_answer() without timeout protection. While these are typically faster than generation, they can still hang on pathological inputs.

For consistency with the PR's timeout protection goals, consider wrapping these calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 365 - 378, Wrap calls to qa_model(...) and
answer.predict_boolean_answer(...) with a timeout so they cannot hang
indefinitely; specifically, in the loop that calls qa_model (the block using
variables input_questions, input_options and computing options_with_answer) and
in the handlers for the /get_mcq_answer, /get_shortq_answer, and
/get_boolean_answer endpoints, execute the model calls via a timeout-enabled
executor (e.g., concurrent.futures with future.result(timeout=...) or
asyncio.wait_for) and handle TimeoutError by logging the timeout and returning a
safe fallback (skip the question, return a default/empty answer, or a 504
response) so the server remains responsive. Ensure you catch and log the timeout
and other exceptions around qa_model(...) and answer.predict_boolean_answer(...)
and keep existing similarity-selection logic (vectorizer, cosine_similarity)
intact when the call succeeds.
requirements.txt (1)

35-36: Pin yt-dlp to a specific version for reproducibility.

yt-dlp is unpinned while Flask-Limiter==3.5.0 is pinned. Since yt-dlp is actively developed with frequent breaking changes, an unpinned version could cause unexpected failures in production. Given that the latest version is 2026.3.13, consider updating to a recent pinned version:

📦 Proposed fix
 Flask-Limiter==3.5.0
-yt-dlp
+yt-dlp==2026.3.13
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` around lines 35 - 36, The requirements file currently pins
Flask-Limiter but leaves yt-dlp unpinned (the lines containing
"Flask-Limiter==3.5.0" and "yt-dlp"); pin yt-dlp to a specific, recent stable
version (for example yt-dlp==2026.3.13 or another vetted release) to ensure
reproducible installs, update the requirements.txt entry for "yt-dlp" to the
chosen pinned version, and commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 323-336: The current logic assumes mcq_out, bool_out, and
short_out are dicts and calls .get() even when execute_with_timeout may return
None (e.g., returns (None, "unknown_error")), causing AttributeError; update the
checks around the response assembly to defensively handle None/empty outputs by
using a safe accessor (e.g., replace mcq_out.get(...) with (mcq_out or
{}).get(...) or explicitly check isinstance(mcq_out, dict)) and do the same for
bool_out and short_out so that when mcq_err/bool_err/short_err is falsy but the
corresponding *_out is None or missing keys you fall back to empty lists instead
of calling .get() on None; reference the variables mcq_out, bool_out, short_out
and the function execute_with_timeout when making the change.
- Around line 76-79: The limiter is using get_remote_address so behind a reverse
proxy all requests will show the proxy IP; update the rate-limiting key to use
the real client IP by either applying werkzeug.middleware.proxy_fix.ProxyFix to
app.wsgi_app (e.g., wrap app with ProxyFix(x_for=1, ...) and mark your proxy as
trusted) or replace the key_func passed to Limiter with a small function that
returns request.headers.get('X-Forwarded-For', request.remote_addr). Ensure you
modify the Limiter instantiation (the limiter variable) to use the new key
function instead of get_remote_address and document/configure trusted proxy
behavior accordingly.
- Around line 42-68: The current execute_with_timeout spawns a new Process with
a bound method (e.g., MCQGen.generate_mcq, BoolQGen.generate_boolq,
ShortQGen.generate_shortq) which forces pickling of heavy model state
per-request; instead refactor to a worker pool pattern so models are initialized
once per worker process and tasks call those existing instances: create a Pool
or long-lived worker processes with an initializer that constructs generator
instances (MCQGen/BoolQGen/ShortQGen) in-process, then submit jobs via
apply_async or a task queue with a timeout on the result; alternatively, if you
must keep in-process concurrency, replace the per-call Process in
execute_with_timeout with thread-based timeouts or use a dedicated worker
process that receives lightweight task payloads (not bound methods) and returns
results, ensuring p.terminate is no longer used to kill model-bearing processes
to avoid GPU/resource leaks.

---

Outside diff comments:
In `@backend/server.py`:
- Around line 483-487: The current code uses the interactive OAuth flow
(client.flow_from_clientsecrets and tools.run_flow) with file.Storage and creds,
which will block/fail in server environments; replace this with non-interactive
server auth by loading service account credentials (use
google.oauth2.service_account.Credentials.from_service_account_file with the
SCOPES) and pass that creds into discovery.build when creating the Forms API
client, removing the tools.run_flow/client.flow_from_clientsecrets logic and
file.Storage usage; alternatively, if using a user account, load a
pre-authorized refresh token from secure storage and refresh it programmatically
instead of calling tools.run_flow.
- Around line 644-672: The /get_shortq_hard handler calls qg.generate directly
and lacks input length checks and protection against zero num_questions; wrap
the qg.generate(...) call in the existing execute_with_timeout(...) helper (use
the same timeout semantics used by other ML endpoints), validate processed
input_text length is between 10 and 50000 characters and return a 400 JSON error
if outside that range, and ensure num_questions is never zero (e.g.,
num_questions = max(1, len(input_questions)) or derive a sensible default)
before passing it to qg.generate; preserve post-processing with
make_question_harder and propagate execute_with_timeout errors the same way
other endpoints do.

---

Nitpick comments:
In `@backend/server.py`:
- Around line 580-588: The current code redundantly checks "if 'options' in
qapair and qapair['options']"; replace that pattern by always reading options =
qapair.get("options") or [] and proceed — compute valid_options = [opt for opt
in options if opt], build choices = [qapair["answer"], *valid_options[:3]] (or
ensure answer-only if no options), shuffle and create choices_list as before;
update the block around qapair, options, valid_options, choices and choices_list
to remove the explicit key presence check and rely on .get().
- Around line 456-464: Replace the broad except Exception around the
docs_service.get_document_content call with specific handlers: import HttpError
from googleapiclient.errors and catch HttpError first, inspect the HTTP status
(e.g., e.resp.status or e.status_code) to return 403 for 403 responses, 404 for
404 responses, and 500 for other HttpError statuses; also add explicit except
FileNotFoundError to return 404 and except PermissionError to return 403,
finally keep a generic except Exception that returns a 500 and logs the
exception for unexpected errors so the endpoint
(docs_service.get_document_content) returns appropriate status codes and
preserves root-cause details.
- Around line 365-378: Wrap calls to qa_model(...) and
answer.predict_boolean_answer(...) with a timeout so they cannot hang
indefinitely; specifically, in the loop that calls qa_model (the block using
variables input_questions, input_options and computing options_with_answer) and
in the handlers for the /get_mcq_answer, /get_shortq_answer, and
/get_boolean_answer endpoints, execute the model calls via a timeout-enabled
executor (e.g., concurrent.futures with future.result(timeout=...) or
asyncio.wait_for) and handle TimeoutError by logging the timeout and returning a
safe fallback (skip the question, return a default/empty answer, or a 504
response) so the server remains responsive. Ensure you catch and log the timeout
and other exceptions around qa_model(...) and answer.predict_boolean_answer(...)
and keep existing similarity-selection logic (vectorizer, cosine_similarity)
intact when the call succeeds.

In `@requirements.txt`:
- Around line 35-36: The requirements file currently pins Flask-Limiter but
leaves yt-dlp unpinned (the lines containing "Flask-Limiter==3.5.0" and
"yt-dlp"); pin yt-dlp to a specific, recent stable version (for example
yt-dlp==2026.3.13 or another vetted release) to ensure reproducible installs,
update the requirements.txt entry for "yt-dlp" to the chosen pinned version, and
commit the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05c0db9e-a33a-4e5c-9b3a-28c2c3c18e48

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 87ab02b.

📒 Files selected for processing (2)
  • backend/server.py
  • requirements.txt

Comment on lines +42 to +68
def execute_with_timeout(func, timeout, *args, **kwargs):
def wrapper(q, *args, **kwargs):
try:
result = func(*args, **kwargs)
q.put(("success", result))
except Exception as e:
q.put(("error", str(e)))

q = Queue()
p = Process(target=wrapper, args=(q, *args), kwargs=kwargs)
p.start()
p.join(timeout)

# 🚨 REAL timeout handling
if p.is_alive():
p.terminate()
p.join()
return None, "timeout"

if not q.empty():
status, value = q.get()
if status == "success":
return value, None
else:
return None, value

return None, "unknown_error"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the generator classes have unpicklable attributes
ast-grep --pattern $'class MCQGenerator:
    $$$
    def __init__(self):
        $$$
    $$$
}'

Repository: AOSSIE-Org/EduAid

Length of output: 43


🏁 Script executed:

find . -type f -name "*.py" | head -20

Repository: AOSSIE-Org/EduAid

Length of output: 577


🏁 Script executed:

find . -type f -name "*.py" -exec grep -l "MCQGenerator\|MCQGen" {} \;

Repository: AOSSIE-Org/EduAid

Length of output: 140


🏁 Script executed:

cat backend/server.py | head -100

Repository: AOSSIE-Org/EduAid

Length of output: 2560


🏁 Script executed:

grep -n "execute_with_timeout" backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 382


🏁 Script executed:

sed -n '1,20p' backend/Generator/main.py

Repository: AOSSIE-Org/EduAid

Length of output: 915


🏁 Script executed:

wc -l backend/Generator/main.py

Repository: AOSSIE-Org/EduAid

Length of output: 90


🏁 Script executed:

sed -n '20,150p' backend/Generator/main.py

Repository: AOSSIE-Org/EduAid

Length of output: 4991


🏁 Script executed:

grep -A 40 "class MCQGenerator" backend/Generator/main.py

Repository: AOSSIE-Org/EduAid

Length of output: 1680


🏁 Script executed:

sed -n '130,310p' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 5319


🏁 Script executed:

sed -n '280,330p' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 1488


🏁 Script executed:

python3 << 'EOF'
# Check if PyTorch models are picklable
import sys
try:
    import torch
    from transformers import T5ForConditionalGeneration, T5Tokenizer
    import pickle
    
    # Try to pickle a T5 tokenizer
    tokenizer = T5Tokenizer.from_pretrained('t5-small')  # Use small model for test
    pickled = pickle.dumps(tokenizer)
    unpickled = pickle.loads(pickled)
    print("✓ T5Tokenizer is picklable")
    
    # Check model size
    import os
    model_size = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
    print(f"Approximate model memory usage would be significant for t5-large")
    
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: AOSSIE-Org/EduAid

Length of output: 91


🏁 Script executed:

grep -n "BoolQGen\|ShortQGen" backend/server.py | head -10

Repository: AOSSIE-Org/EduAid

Length of output: 296


Critical: Bound methods with heavy ML model state cannot be efficiently pickled for multiprocessing.

The execute_with_timeout function passes bound methods (MCQGen.generate_mcq, BoolQGen.generate_boolq, ShortQGen.generate_shortq) to multiprocessing.Process, which requires pickling the entire object closure. Since each generator loads T5 models, spaCy NLP objects, and Sense2Vec models in __init__, these large objects must be serialized and deserialized for every request.

This causes:

  1. Massive per-request overhead: Serializing multi-GB model weights on every call negates any timeout benefit
  2. Potential pickle failures: PyTorch model objects may fail to pickle depending on the backend (standard pickle vs. dill)
  3. Resource leaks: p.terminate() sends SIGTERM but doesn't guarantee cleanup of GPU memory or file handles

Consider using a worker pool pattern where generator instances are initialized once in worker processes, or use threading with signal-based timeout for CPU-bound operations.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 47-47: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 42 - 68, The current execute_with_timeout
spawns a new Process with a bound method (e.g., MCQGen.generate_mcq,
BoolQGen.generate_boolq, ShortQGen.generate_shortq) which forces pickling of
heavy model state per-request; instead refactor to a worker pool pattern so
models are initialized once per worker process and tasks call those existing
instances: create a Pool or long-lived worker processes with an initializer that
constructs generator instances (MCQGen/BoolQGen/ShortQGen) in-process, then
submit jobs via apply_async or a task queue with a timeout on the result;
alternatively, if you must keep in-process concurrency, replace the per-call
Process in execute_with_timeout with thread-based timeouts or use a dedicated
worker process that receives lightweight task payloads (not bound methods) and
returns results, ensuring p.terminate is no longer used to kill model-bearing
processes to avoid GPU/resource leaks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/server.py (1)

365-378: ⚠️ Potential issue | 🟡 Minor

Missing timeout protection and input validation for QA endpoints.

The /get_mcq_answer, /get_shortq_answer, and /get_boolean_answer endpoints call qa_model() and answer.predict_boolean_answer() directly without timeout protection or input text length validation. A malicious or oversized input could cause these endpoints to hang.

Consider applying consistent input validation and timeout handling across all ML-backed endpoints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 365 - 378, Add input validation and timeout
protection around all ML-backed endpoints (/get_mcq_answer, /get_shortq_answer,
/get_boolean_answer): validate and reject overly large or empty input_text and
input_questions before calling qa_model or answer.predict_boolean_answer, and
wrap calls to qa_model(...) and answer.predict_boolean_answer(...) in a
bounded-time execution (e.g., asyncio.wait_for or concurrent.futures with a
timeout) so slow or malicious inputs raise a timeout and return a 4xx/5xx
response. Update the handlers that call qa_model and
answer.predict_boolean_answer to catch validation errors and timeout exceptions
and return a clear error response instead of hanging.
♻️ Duplicate comments (1)
backend/server.py (1)

42-68: ⚠️ Potential issue | 🔴 Critical

Critical: Multiprocessing approach cannot efficiently handle bound methods with ML model state.

This execute_with_timeout implementation passes bound methods (e.g., MCQGen.generate_mcq) to multiprocessing.Process, which requires pickling the entire generator instance. The MCQGenerator, BoolQGenerator, and ShortQGenerator classes (from backend/Generator/main.py) all load heavy objects in __init__:

  • T5ForConditionalGeneration models
  • spaCy NLP pipelines
  • Sense2Vec models
  • NLTK FreqDist

This causes:

  1. Multi-GB serialization overhead per request — negating timeout benefits
  2. Potential PicklingError — PyTorch models may fail to pickle
  3. Resource leaksp.terminate() (Line 57) sends SIGTERM but doesn't guarantee GPU memory or file handle cleanup

Consider a worker pool pattern where generators are initialized once per worker process, or use thread-based timeouts with signals for CPU-bound operations.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 42 - 68, The current execute_with_timeout
wrapper spawns a new multiprocessing.Process with a bound method (e.g.,
MCQGen.generate_mcq) which forces pickling of heavy model-laden generator
instances (MCQGenerator, BoolQGenerator, ShortQGenerator), causing huge
serialization overhead, PicklingError risks, and unsafe resource cleanup on
terminate; fix by replacing this pattern with a persistent worker pool where
each worker process initializes its generator instances once at startup and
exposes a lightweight task interface (e.g., task IDs or plain data inputs) that
execute_with_timeout (or a new submit_with_timeout) uses to enqueue work, or
alternatively switch to a thread-based timeout/signal approach for CPU-bound
tasks—ensure you stop passing bound instances into Process/wrapper and instead
reference worker-local generators by name, and update
execute_with_timeout/wrapper to accept only serializable data payloads and
return status via IPC (Queue) without pickling model objects.
🧹 Nitpick comments (4)
backend/server.py (4)

460-464: Broad exception catch obscures failure reasons.

Catching bare Exception hides whether the failure was due to authentication, network issues, or an invalid document. Consider logging the actual exception for debugging while returning a generic message to the client:

-    except Exception:
+    except Exception as e:
+        app.logger.error(f"Document fetch failed for doc_id={doc_id}: {e}")
         return jsonify({
             "error": "Failed to retrieve document content",
             "code": "document_fetch_error"
         }), 400
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 460 - 464, The bare except in the document
retrieval handler masks root causes; modify the except Exception block that
returns jsonify({"error":"Failed to retrieve document
content","code":"document_fetch_error"}) to catch/log the real exception: add
logging (e.g., use the existing logger or app.logger) to record the exception
and stacktrace, and optionally narrow catches to known error types
(auth/network/NotFound) while still returning the same generic JSON error to the
client; ensure you reference the same exception handling block around the
document fetch logic so the logged error includes exception info and context
(document id/request params).

47-48: Consider preserving exception type for better diagnostics.

Catching bare Exception loses the exception type and traceback. For debugging production issues, consider capturing more context:

         except Exception as e:
-            q.put(("error", str(e)))
+            q.put(("error", f"{type(e).__name__}: {e}"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 47 - 48, The except block that currently does
`except Exception as e: q.put(("error", str(e)))` should preserve exception type
and traceback for diagnostics: import the traceback module and change the
payload sent to the queue from a single string to include the exception class
name and the formatted traceback (e.g., include type(e).__name__ and
traceback.format_exc()) so callers receiving `q.put(...)` (the error message
producer in backend/server.py) get full context for debugging.

308-316: ThreadPoolExecutor wrapping multiprocessing adds unnecessary overhead.

Each thread just spawns a subprocess via execute_with_timeout. Consider using multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor directly to avoid the extra threading layer. However, this is secondary to the fundamental pickling issue with ML model instances.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 308 - 316, The ThreadPoolExecutor is just
spawning subprocesses via execute_with_timeout, adding unnecessary threading
overhead and exposing pickling problems for ML model instances; replace the
ThreadPoolExecutor block that submits run_mcq/run_boolq/run_shortq with a
process-based executor (e.g., concurrent.futures.ProcessPoolExecutor or
multiprocessing.Pool) and ensure the target call uses execute_with_timeout in
the worker process. Also make the tasks picklable by loading any ML model
instances inside the worker process (move model initialization out of
module/global scope into the body of run_mcq/run_boolq/run_shortq or implement
custom __getstate__/__setstate__ for model objects) so the ProcessPoolExecutor
can serialize the tasks correctly. Ensure you reference/modify run_mcq,
run_boolq, run_shortq, and execute_with_timeout accordingly.

581-588: Simplify redundant key check.

The condition already verifies qapair["options"] is truthy, making the subsequent or [] unnecessary:

-            if "options" in qapair and qapair["options"]:
-                options = qapair.get("options") or []
-                valid_options = [opt for opt in options if opt]
+            options = qapair.get("options")
+            if options:
+                valid_options = [opt for opt in options if opt]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 581 - 588, The code redundantly uses
qapair.get("options") or [] after already checking "options" in qapair and
qapair["options"] is truthy; replace that line with a direct access (options =
qapair["options"]) to simplify the logic and keep behavior identical, leaving
the rest of the block (valid_options, choices, choices_list) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 663-667: The qg.generate(...) call in the hard-question endpoints
(/get_shortq_hard, /get_mcq_hard, /get_boolq_hard) lacks timeout protection and
input-length validation; wrap the qg.generate invocation with the same
execute_with_timeout utility used elsewhere (use the same timeout value) and
enforce the same input text length checks (10-50000 chars) before calling it;
update the handler functions that call qg.generate to validate the incoming
article text and then call execute_with_timeout(() => qg.generate(...)) so the
endpoint returns a timeout/error consistently instead of hanging.
- Around line 456-458: The endpoint currently passes a raw doc_id to
get_document_content which expects a full document URL (since
extract_document_id(url) looks for '/document/d/{id}'); construct the full URL
that matches that pattern before calling get_document_content (e.g. build a
string like f"{<docs host>}/document/d/{doc_id}" or use
docs_service.base_url/format helper if available) and pass that URL to
docs_service.get_document_content(document_url) instead of the raw doc_id.

---

Outside diff comments:
In `@backend/server.py`:
- Around line 365-378: Add input validation and timeout protection around all
ML-backed endpoints (/get_mcq_answer, /get_shortq_answer, /get_boolean_answer):
validate and reject overly large or empty input_text and input_questions before
calling qa_model or answer.predict_boolean_answer, and wrap calls to
qa_model(...) and answer.predict_boolean_answer(...) in a bounded-time execution
(e.g., asyncio.wait_for or concurrent.futures with a timeout) so slow or
malicious inputs raise a timeout and return a 4xx/5xx response. Update the
handlers that call qa_model and answer.predict_boolean_answer to catch
validation errors and timeout exceptions and return a clear error response
instead of hanging.

---

Duplicate comments:
In `@backend/server.py`:
- Around line 42-68: The current execute_with_timeout wrapper spawns a new
multiprocessing.Process with a bound method (e.g., MCQGen.generate_mcq) which
forces pickling of heavy model-laden generator instances (MCQGenerator,
BoolQGenerator, ShortQGenerator), causing huge serialization overhead,
PicklingError risks, and unsafe resource cleanup on terminate; fix by replacing
this pattern with a persistent worker pool where each worker process initializes
its generator instances once at startup and exposes a lightweight task interface
(e.g., task IDs or plain data inputs) that execute_with_timeout (or a new
submit_with_timeout) uses to enqueue work, or alternatively switch to a
thread-based timeout/signal approach for CPU-bound tasks—ensure you stop passing
bound instances into Process/wrapper and instead reference worker-local
generators by name, and update execute_with_timeout/wrapper to accept only
serializable data payloads and return status via IPC (Queue) without pickling
model objects.

---

Nitpick comments:
In `@backend/server.py`:
- Around line 460-464: The bare except in the document retrieval handler masks
root causes; modify the except Exception block that returns
jsonify({"error":"Failed to retrieve document
content","code":"document_fetch_error"}) to catch/log the real exception: add
logging (e.g., use the existing logger or app.logger) to record the exception
and stacktrace, and optionally narrow catches to known error types
(auth/network/NotFound) while still returning the same generic JSON error to the
client; ensure you reference the same exception handling block around the
document fetch logic so the logged error includes exception info and context
(document id/request params).
- Around line 47-48: The except block that currently does `except Exception as
e: q.put(("error", str(e)))` should preserve exception type and traceback for
diagnostics: import the traceback module and change the payload sent to the
queue from a single string to include the exception class name and the formatted
traceback (e.g., include type(e).__name__ and traceback.format_exc()) so callers
receiving `q.put(...)` (the error message producer in backend/server.py) get
full context for debugging.
- Around line 308-316: The ThreadPoolExecutor is just spawning subprocesses via
execute_with_timeout, adding unnecessary threading overhead and exposing
pickling problems for ML model instances; replace the ThreadPoolExecutor block
that submits run_mcq/run_boolq/run_shortq with a process-based executor (e.g.,
concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool) and ensure the
target call uses execute_with_timeout in the worker process. Also make the tasks
picklable by loading any ML model instances inside the worker process (move
model initialization out of module/global scope into the body of
run_mcq/run_boolq/run_shortq or implement custom __getstate__/__setstate__ for
model objects) so the ProcessPoolExecutor can serialize the tasks correctly.
Ensure you reference/modify run_mcq, run_boolq, run_shortq, and
execute_with_timeout accordingly.
- Around line 581-588: The code redundantly uses qapair.get("options") or []
after already checking "options" in qapair and qapair["options"] is truthy;
replace that line with a direct access (options = qapair["options"]) to simplify
the logic and keep behavior identical, leaving the rest of the block
(valid_options, choices, choices_list) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9af4dee4-f1dc-430f-8a38-bf6c40d7a8c9

📥 Commits

Reviewing files that changed from the base of the PR and between 87ab02b and 5893fd2.

📒 Files selected for processing (1)
  • backend/server.py

Comment on lines 663 to 667
output = qg.generate(
article=input_text, num_questions=input_questions, answer_style="sentences"
article=input_text,
num_questions=len(input_questions),
answer_style="sentences"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing timeout protection for qg.generate() call.

The /get_shortq_hard, /get_mcq_hard, and /get_boolq_hard endpoints call qg.generate() directly without timeout protection, while other ML endpoints use execute_with_timeout. This inconsistency could allow these endpoints to hang indefinitely.

Additionally, these endpoints lack input text length validation (unlike /get_mcq, /get_boolq, /get_shortq which validate 10-50000 chars).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 663 - 667, The qg.generate(...) call in the
hard-question endpoints (/get_shortq_hard, /get_mcq_hard, /get_boolq_hard) lacks
timeout protection and input-length validation; wrap the qg.generate invocation
with the same execute_with_timeout utility used elsewhere (use the same timeout
value) and enforce the same input text length checks (10-50000 chars) before
calling it; update the handler functions that call qg.generate to validate the
incoming article text and then call execute_with_timeout(() => qg.generate(...))
so the endpoint returns a timeout/error consistently instead of hanging.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/server.py (1)

42-52: ⚠️ Potential issue | 🔴 Critical

Per-request multiprocessing with bound model methods is still a high-risk execution model.

execute_with_timeout spins a new Process for each request while passing model-bound callables. This is expensive and can break under spawn-based start methods. Prefer long-lived worker processes (initialized once with models) and submit lightweight jobs with per-task timeouts.

#!/bin/bash
set -euo pipefail

echo "1) Locate timeout executor and process primitives"
rg -n "def execute_with_timeout|Process\(|Queue\(|set_start_method|get_context" backend/server.py

echo
echo "2) Locate generator singletons and timeout callsites"
rg -n "MCQGen =|BoolQGen =|ShortQGen =|qg =|execute_with_timeout\(" backend/server.py

echo
echo "3) AST check: nested function used as Process target"
python - <<'PY'
import ast, pathlib
p = pathlib.Path("backend/server.py")
tree = ast.parse(p.read_text())
for fn in [n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef) and n.name == "execute_with_timeout"]:
    nested = [n.name for n in fn.body if isinstance(n, ast.FunctionDef)]
    print("nested defs in execute_with_timeout:", nested)
    for n in ast.walk(fn):
        if isinstance(n, ast.Call) and isinstance(n.func, ast.Name) and n.func.id == "Process":
            print("Process() call line:", n.lineno)
PY

Also applies to: 141-145, 186-190, 232-236, 288-306, 677-683, 737-743, 796-802

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 42 - 52, The current execute_with_timeout
creates a new Process per request using a nested wrapper and passes bound model
methods (wrapper, Process, Queue), which is expensive and unsafe with
spawn-based start; refactor to a long-lived worker pool or persistent worker
Process(es) that are initialized once with model instances and accept
lightweight task messages (use a manager Queue or
multiprocessing.Pool/ProcessPoolExecutor with apply_async) instead of spawning
per call, remove the nested wrapper as a Process target and instead send task
descriptors (function name + args) or callables that the worker resolves against
its local model, and update callsites (MCQGen, BoolQGen, ShortQGen, qg and all
execute_with_timeout usages) to submit tasks to the worker pool and enforce
per-task timeouts via the pool’s timeout support or Future.result(timeout=...)
rather than creating new Process objects each request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 47-48: Replace instances where the code enqueues raw exception
text via q.put(("error", str(e))) with a stable public error indicator (for
example q.put(("error", {"code":"internal_server_error","message":"An internal
error occurred"}))) and log the full exception details server-side using
logging.exception or logger.exception so stack traces stay in server logs only;
locate all occurrences of q.put(("error", str(e))) in server.py (the q variable
and its error enqueue sites) and apply the same mapping so API responses never
contain raw exception strings but the server logs retain the detailed exception
information.
- Around line 461-465: The error handler in the /get_content endpoint currently
catches all exceptions and returns a 400, which hides server errors; update the
exception handling in the function that serves /get_content (the try/except
block around document retrieval) to explicitly catch expected client/validation
errors (e.g., ValueError, werkzeug.exceptions.BadRequest) and return a 400 with
the existing error JSON, and for all other unexpected exceptions log the full
exception (including stack trace) and return a 500 response with code
"internal_server_error" using jsonify; ensure you reference the same response
shape and use the existing logger instead of swallowing the exception.
- Around line 338-343: The handler that returns the "All generators failed"
response currently always uses HTTP 500; update the logic in the /get_problems
response where mcq_err, bool_err, and short_err are checked so that if all three
indicate a timeout (e.g., mcq_err == "timeout" && bool_err == "timeout" &&
short_err == "timeout" or using your TIMEOUT_ERROR constant), return the JSON
error with HTTP 504 instead of 500; otherwise keep the existing 500 return path.
Reference variables: mcq_err, bool_err, short_err and the existing
jsonify({...}) return.

---

Duplicate comments:
In `@backend/server.py`:
- Around line 42-52: The current execute_with_timeout creates a new Process per
request using a nested wrapper and passes bound model methods (wrapper, Process,
Queue), which is expensive and unsafe with spawn-based start; refactor to a
long-lived worker pool or persistent worker Process(es) that are initialized
once with model instances and accept lightweight task messages (use a manager
Queue or multiprocessing.Pool/ProcessPoolExecutor with apply_async) instead of
spawning per call, remove the nested wrapper as a Process target and instead
send task descriptors (function name + args) or callables that the worker
resolves against its local model, and update callsites (MCQGen, BoolQGen,
ShortQGen, qg and all execute_with_timeout usages) to submit tasks to the worker
pool and enforce per-task timeouts via the pool’s timeout support or
Future.result(timeout=...) rather than creating new Process objects each
request.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93b490f6-a47e-4892-86ab-a1224f592603

📥 Commits

Reviewing files that changed from the base of the PR and between 5893fd2 and 6704c41.

📒 Files selected for processing (1)
  • backend/server.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GOOD FIRST ISSUE]: ML Model Endpoints Lack Timeout Protection

1 participant