Fix: robust timeout handling for ML endpoints with multiprocessing and parallel execution#596
Fix: robust timeout handling for ML endpoints with multiprocessing and parallel execution#596mahek2016 wants to merge 20 commits intoAOSSIE-Org:mainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds a subprocess timeout executor, per-endpoint rate limiting and 2 MB payload cap, centralized JSON error handlers, robust input validation, timeout-wrapped generator calls (with parallelization for /get_problems), YouTube subtitle retrieval via yt-dlp, and new/updated endpoints returning uniform success/error shapes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Flask as Flask App
participant Limiter as Rate Limiter
participant Executor as Timeout Executor
participant Worker as Subprocess Worker
participant External as ML Model / yt-dlp / External API
Client->>Flask: POST /get_mcq or /get_problems
Flask->>Limiter: check per-endpoint limit
alt limited
Limiter-->>Client: 429 (JSON error)
else allowed
Flask->>Executor: execute_with_timeout(func, timeout, args...)
Executor->>Worker: spawn subprocess to run func
Worker->>External: call model / yt-dlp / external API
alt completes
External-->>Worker: result
Worker-->>Executor: return result
Executor-->>Flask: (result, None)
Flask-->>Client: 200 (structured JSON)
else timeout / error
Worker-->>Executor: error/timeout
Executor-->>Flask: (None, error)
Flask-->>Client: 504/400/500 (structured JSON)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/server.py (2)
483-487:⚠️ Potential issue | 🔴 CriticalOAuth interactive flow will fail in server/production context.
tools.run_flow()opens a browser for interactive OAuth consent. This:
- Blocks the request thread indefinitely waiting for user interaction
- Fails entirely in headless server environments
- Creates race conditions if multiple requests trigger auth simultaneously
For server-to-server auth, use a service account or pre-authorize and store refresh tokens.
🔧 Suggested approach using service account
from google.oauth2 import service_account SCOPES = ["https://www.googleapis.com/auth/forms.body"] creds = service_account.Credentials.from_service_account_file( "service_account_key.json", scopes=SCOPES ) form_service = discovery.build("forms", "v1", credentials=creds, ...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 483 - 487, The current code uses the interactive OAuth flow (client.flow_from_clientsecrets and tools.run_flow) with file.Storage and creds, which will block/fail in server environments; replace this with non-interactive server auth by loading service account credentials (use google.oauth2.service_account.Credentials.from_service_account_file with the SCOPES) and pass that creds into discovery.build when creating the Forms API client, removing the tools.run_flow/client.flow_from_clientsecrets logic and file.Storage usage; alternatively, if using a user account, load a pre-authorized refresh token from secure storage and refresh it programmatically instead of calling tools.run_flow.
644-672:⚠️ Potential issue | 🟠 MajorMissing timeout protection and input validation on
/get_shortq_hard(and similar_hardendpoints).These endpoints call
qg.generate()directly withoutexecute_with_timeout, contradicting the PR's objective to add timeout protection to all ML endpoints. They also lack the input length validation (10-50000 chars) applied to other endpoints.Additionally,
num_questions=len(input_questions)will be 0 ifinput_questionsis empty, potentially causing unexpected behavior.🔧 Proposed fix for /get_shortq_hard (apply similarly to _mcq_hard and _boolq_hard)
def get_shortq_hard(): data = request.get_json(silent=True) if data is None or not isinstance(data, dict): return jsonify({...}), 400 input_text = data.get("input_text", "") use_mediawiki = data.get("use_mediawiki", 0) + input_questions = data.get("input_question", []) + + if not isinstance(input_text, str): + return jsonify({"error": "input_text must be string"}), 400 + + input_text = input_text.strip() + + if len(input_text) < 10: + return jsonify({"error": "Input too short"}), 400 + + if len(input_text) > 50000: + return jsonify({"error": "Input too long"}), 400 + + if not input_questions: + return jsonify({"error": "input_question is required"}), 400 input_text = process_input_text(input_text, use_mediawiki) - input_questions = data.get("input_question", []) - output = qg.generate(...) + output, error = execute_with_timeout( + qg.generate, + 60, + article=input_text, + num_questions=len(input_questions), + answer_style="sentences" + ) + + if error == "timeout": + return jsonify({"error": "Request timed out"}), 504 + if error: + return jsonify({"error": error}), 500🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 644 - 672, The /get_shortq_hard handler calls qg.generate directly and lacks input length checks and protection against zero num_questions; wrap the qg.generate(...) call in the existing execute_with_timeout(...) helper (use the same timeout semantics used by other ML endpoints), validate processed input_text length is between 10 and 50000 characters and return a 400 JSON error if outside that range, and ensure num_questions is never zero (e.g., num_questions = max(1, len(input_questions)) or derive a sensible default) before passing it to qg.generate; preserve post-processing with make_question_harder and propagate execute_with_timeout errors the same way other endpoints do.
🧹 Nitpick comments (4)
backend/server.py (3)
580-588: Simplify redundant key check.The static analyzer flagged this pattern. Using
.get()is cleaner.♻️ Proposed fix
- if "options" in qapair and qapair["options"]: - options = qapair.get("options") or [] + options = qapair.get("options") + if options: valid_options = [opt for opt in options if opt]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 580 - 588, The current code redundantly checks "if 'options' in qapair and qapair['options']"; replace that pattern by always reading options = qapair.get("options") or [] and proceed — compute valid_options = [opt for opt in options if opt], build choices = [qapair["answer"], *valid_options[:3]] (or ensure answer-only if no options), shuffle and create choices_list as before; update the block around qapair, options, valid_options, choices and choices_list to remove the explicit key presence check and rely on .get().
456-464: Overly broad exception handling obscures root cause.Catching bare
Exceptionreturns a generic 400 error for all failures, but different errors warrant different responses:
- Authentication/permission errors → 403
- Document not found → 404
- Network/server errors → 500
♻️ Proposed improvement
try: content = docs_service.get_document_content(doc_id) return jsonify({"content": content}) - - except Exception: + except HttpError as e: + if e.resp.status == 404: + return jsonify({"error": "Document not found", "code": "not_found"}), 404 + elif e.resp.status == 403: + return jsonify({"error": "Permission denied", "code": "forbidden"}), 403 + return jsonify({"error": "Failed to retrieve document", "code": "document_fetch_error"}), e.resp.status + except Exception as e: return jsonify({ - "error": "Failed to retrieve document content", + "error": f"Failed to retrieve document content: {type(e).__name__}", "code": "document_fetch_error" - }), 400 + }), 500This requires importing
HttpErrorfromgoogleapiclient.errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 456 - 464, Replace the broad except Exception around the docs_service.get_document_content call with specific handlers: import HttpError from googleapiclient.errors and catch HttpError first, inspect the HTTP status (e.g., e.resp.status or e.status_code) to return 403 for 403 responses, 404 for 404 responses, and 500 for other HttpError statuses; also add explicit except FileNotFoundError to return 404 and except PermissionError to return 403, finally keep a generic except Exception that returns a 500 and logs the exception for unexpected errors so the endpoint (docs_service.get_document_content) returns appropriate status codes and preserves root-cause details.
365-378: Consider adding timeout protection to QA model calls.The
/get_mcq_answer,/get_shortq_answer, and/get_boolean_answerendpoints callqa_model()andanswer.predict_boolean_answer()without timeout protection. While these are typically faster than generation, they can still hang on pathological inputs.For consistency with the PR's timeout protection goals, consider wrapping these calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 365 - 378, Wrap calls to qa_model(...) and answer.predict_boolean_answer(...) with a timeout so they cannot hang indefinitely; specifically, in the loop that calls qa_model (the block using variables input_questions, input_options and computing options_with_answer) and in the handlers for the /get_mcq_answer, /get_shortq_answer, and /get_boolean_answer endpoints, execute the model calls via a timeout-enabled executor (e.g., concurrent.futures with future.result(timeout=...) or asyncio.wait_for) and handle TimeoutError by logging the timeout and returning a safe fallback (skip the question, return a default/empty answer, or a 504 response) so the server remains responsive. Ensure you catch and log the timeout and other exceptions around qa_model(...) and answer.predict_boolean_answer(...) and keep existing similarity-selection logic (vectorizer, cosine_similarity) intact when the call succeeds.requirements.txt (1)
35-36: Pinyt-dlpto a specific version for reproducibility.
yt-dlpis unpinned whileFlask-Limiter==3.5.0is pinned. Sinceyt-dlpis actively developed with frequent breaking changes, an unpinned version could cause unexpected failures in production. Given that the latest version is 2026.3.13, consider updating to a recent pinned version:📦 Proposed fix
Flask-Limiter==3.5.0 -yt-dlp +yt-dlp==2026.3.13🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@requirements.txt` around lines 35 - 36, The requirements file currently pins Flask-Limiter but leaves yt-dlp unpinned (the lines containing "Flask-Limiter==3.5.0" and "yt-dlp"); pin yt-dlp to a specific, recent stable version (for example yt-dlp==2026.3.13 or another vetted release) to ensure reproducible installs, update the requirements.txt entry for "yt-dlp" to the chosen pinned version, and commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 323-336: The current logic assumes mcq_out, bool_out, and
short_out are dicts and calls .get() even when execute_with_timeout may return
None (e.g., returns (None, "unknown_error")), causing AttributeError; update the
checks around the response assembly to defensively handle None/empty outputs by
using a safe accessor (e.g., replace mcq_out.get(...) with (mcq_out or
{}).get(...) or explicitly check isinstance(mcq_out, dict)) and do the same for
bool_out and short_out so that when mcq_err/bool_err/short_err is falsy but the
corresponding *_out is None or missing keys you fall back to empty lists instead
of calling .get() on None; reference the variables mcq_out, bool_out, short_out
and the function execute_with_timeout when making the change.
- Around line 76-79: The limiter is using get_remote_address so behind a reverse
proxy all requests will show the proxy IP; update the rate-limiting key to use
the real client IP by either applying werkzeug.middleware.proxy_fix.ProxyFix to
app.wsgi_app (e.g., wrap app with ProxyFix(x_for=1, ...) and mark your proxy as
trusted) or replace the key_func passed to Limiter with a small function that
returns request.headers.get('X-Forwarded-For', request.remote_addr). Ensure you
modify the Limiter instantiation (the limiter variable) to use the new key
function instead of get_remote_address and document/configure trusted proxy
behavior accordingly.
- Around line 42-68: The current execute_with_timeout spawns a new Process with
a bound method (e.g., MCQGen.generate_mcq, BoolQGen.generate_boolq,
ShortQGen.generate_shortq) which forces pickling of heavy model state
per-request; instead refactor to a worker pool pattern so models are initialized
once per worker process and tasks call those existing instances: create a Pool
or long-lived worker processes with an initializer that constructs generator
instances (MCQGen/BoolQGen/ShortQGen) in-process, then submit jobs via
apply_async or a task queue with a timeout on the result; alternatively, if you
must keep in-process concurrency, replace the per-call Process in
execute_with_timeout with thread-based timeouts or use a dedicated worker
process that receives lightweight task payloads (not bound methods) and returns
results, ensuring p.terminate is no longer used to kill model-bearing processes
to avoid GPU/resource leaks.
---
Outside diff comments:
In `@backend/server.py`:
- Around line 483-487: The current code uses the interactive OAuth flow
(client.flow_from_clientsecrets and tools.run_flow) with file.Storage and creds,
which will block/fail in server environments; replace this with non-interactive
server auth by loading service account credentials (use
google.oauth2.service_account.Credentials.from_service_account_file with the
SCOPES) and pass that creds into discovery.build when creating the Forms API
client, removing the tools.run_flow/client.flow_from_clientsecrets logic and
file.Storage usage; alternatively, if using a user account, load a
pre-authorized refresh token from secure storage and refresh it programmatically
instead of calling tools.run_flow.
- Around line 644-672: The /get_shortq_hard handler calls qg.generate directly
and lacks input length checks and protection against zero num_questions; wrap
the qg.generate(...) call in the existing execute_with_timeout(...) helper (use
the same timeout semantics used by other ML endpoints), validate processed
input_text length is between 10 and 50000 characters and return a 400 JSON error
if outside that range, and ensure num_questions is never zero (e.g.,
num_questions = max(1, len(input_questions)) or derive a sensible default)
before passing it to qg.generate; preserve post-processing with
make_question_harder and propagate execute_with_timeout errors the same way
other endpoints do.
---
Nitpick comments:
In `@backend/server.py`:
- Around line 580-588: The current code redundantly checks "if 'options' in
qapair and qapair['options']"; replace that pattern by always reading options =
qapair.get("options") or [] and proceed — compute valid_options = [opt for opt
in options if opt], build choices = [qapair["answer"], *valid_options[:3]] (or
ensure answer-only if no options), shuffle and create choices_list as before;
update the block around qapair, options, valid_options, choices and choices_list
to remove the explicit key presence check and rely on .get().
- Around line 456-464: Replace the broad except Exception around the
docs_service.get_document_content call with specific handlers: import HttpError
from googleapiclient.errors and catch HttpError first, inspect the HTTP status
(e.g., e.resp.status or e.status_code) to return 403 for 403 responses, 404 for
404 responses, and 500 for other HttpError statuses; also add explicit except
FileNotFoundError to return 404 and except PermissionError to return 403,
finally keep a generic except Exception that returns a 500 and logs the
exception for unexpected errors so the endpoint
(docs_service.get_document_content) returns appropriate status codes and
preserves root-cause details.
- Around line 365-378: Wrap calls to qa_model(...) and
answer.predict_boolean_answer(...) with a timeout so they cannot hang
indefinitely; specifically, in the loop that calls qa_model (the block using
variables input_questions, input_options and computing options_with_answer) and
in the handlers for the /get_mcq_answer, /get_shortq_answer, and
/get_boolean_answer endpoints, execute the model calls via a timeout-enabled
executor (e.g., concurrent.futures with future.result(timeout=...) or
asyncio.wait_for) and handle TimeoutError by logging the timeout and returning a
safe fallback (skip the question, return a default/empty answer, or a 504
response) so the server remains responsive. Ensure you catch and log the timeout
and other exceptions around qa_model(...) and answer.predict_boolean_answer(...)
and keep existing similarity-selection logic (vectorizer, cosine_similarity)
intact when the call succeeds.
In `@requirements.txt`:
- Around line 35-36: The requirements file currently pins Flask-Limiter but
leaves yt-dlp unpinned (the lines containing "Flask-Limiter==3.5.0" and
"yt-dlp"); pin yt-dlp to a specific, recent stable version (for example
yt-dlp==2026.3.13 or another vetted release) to ensure reproducible installs,
update the requirements.txt entry for "yt-dlp" to the chosen pinned version, and
commit the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 05c0db9e-a33a-4e5c-9b3a-28c2c3c18e48
📒 Files selected for processing (2)
backend/server.pyrequirements.txt
| def execute_with_timeout(func, timeout, *args, **kwargs): | ||
| def wrapper(q, *args, **kwargs): | ||
| try: | ||
| result = func(*args, **kwargs) | ||
| q.put(("success", result)) | ||
| except Exception as e: | ||
| q.put(("error", str(e))) | ||
|
|
||
| q = Queue() | ||
| p = Process(target=wrapper, args=(q, *args), kwargs=kwargs) | ||
| p.start() | ||
| p.join(timeout) | ||
|
|
||
| # 🚨 REAL timeout handling | ||
| if p.is_alive(): | ||
| p.terminate() | ||
| p.join() | ||
| return None, "timeout" | ||
|
|
||
| if not q.empty(): | ||
| status, value = q.get() | ||
| if status == "success": | ||
| return value, None | ||
| else: | ||
| return None, value | ||
|
|
||
| return None, "unknown_error" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the generator classes have unpicklable attributes
ast-grep --pattern $'class MCQGenerator:
$$$
def __init__(self):
$$$
$$$
}'Repository: AOSSIE-Org/EduAid
Length of output: 43
🏁 Script executed:
find . -type f -name "*.py" | head -20Repository: AOSSIE-Org/EduAid
Length of output: 577
🏁 Script executed:
find . -type f -name "*.py" -exec grep -l "MCQGenerator\|MCQGen" {} \;Repository: AOSSIE-Org/EduAid
Length of output: 140
🏁 Script executed:
cat backend/server.py | head -100Repository: AOSSIE-Org/EduAid
Length of output: 2560
🏁 Script executed:
grep -n "execute_with_timeout" backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 382
🏁 Script executed:
sed -n '1,20p' backend/Generator/main.pyRepository: AOSSIE-Org/EduAid
Length of output: 915
🏁 Script executed:
wc -l backend/Generator/main.pyRepository: AOSSIE-Org/EduAid
Length of output: 90
🏁 Script executed:
sed -n '20,150p' backend/Generator/main.pyRepository: AOSSIE-Org/EduAid
Length of output: 4991
🏁 Script executed:
grep -A 40 "class MCQGenerator" backend/Generator/main.pyRepository: AOSSIE-Org/EduAid
Length of output: 1680
🏁 Script executed:
sed -n '130,310p' backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 5319
🏁 Script executed:
sed -n '280,330p' backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 1488
🏁 Script executed:
python3 << 'EOF'
# Check if PyTorch models are picklable
import sys
try:
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
import pickle
# Try to pickle a T5 tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-small') # Use small model for test
pickled = pickle.dumps(tokenizer)
unpickled = pickle.loads(pickled)
print("✓ T5Tokenizer is picklable")
# Check model size
import os
model_size = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
print(f"Approximate model memory usage would be significant for t5-large")
except Exception as e:
print(f"Error: {e}")
EOFRepository: AOSSIE-Org/EduAid
Length of output: 91
🏁 Script executed:
grep -n "BoolQGen\|ShortQGen" backend/server.py | head -10Repository: AOSSIE-Org/EduAid
Length of output: 296
Critical: Bound methods with heavy ML model state cannot be efficiently pickled for multiprocessing.
The execute_with_timeout function passes bound methods (MCQGen.generate_mcq, BoolQGen.generate_boolq, ShortQGen.generate_shortq) to multiprocessing.Process, which requires pickling the entire object closure. Since each generator loads T5 models, spaCy NLP objects, and Sense2Vec models in __init__, these large objects must be serialized and deserialized for every request.
This causes:
- Massive per-request overhead: Serializing multi-GB model weights on every call negates any timeout benefit
- Potential pickle failures: PyTorch model objects may fail to pickle depending on the backend (standard pickle vs. dill)
- Resource leaks:
p.terminate()sends SIGTERM but doesn't guarantee cleanup of GPU memory or file handles
Consider using a worker pool pattern where generator instances are initialized once in worker processes, or use threading with signal-based timeout for CPU-bound operations.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 47-47: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 42 - 68, The current execute_with_timeout
spawns a new Process with a bound method (e.g., MCQGen.generate_mcq,
BoolQGen.generate_boolq, ShortQGen.generate_shortq) which forces pickling of
heavy model state per-request; instead refactor to a worker pool pattern so
models are initialized once per worker process and tasks call those existing
instances: create a Pool or long-lived worker processes with an initializer that
constructs generator instances (MCQGen/BoolQGen/ShortQGen) in-process, then
submit jobs via apply_async or a task queue with a timeout on the result;
alternatively, if you must keep in-process concurrency, replace the per-call
Process in execute_with_timeout with thread-based timeouts or use a dedicated
worker process that receives lightweight task payloads (not bound methods) and
returns results, ensuring p.terminate is no longer used to kill model-bearing
processes to avoid GPU/resource leaks.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/server.py (1)
365-378:⚠️ Potential issue | 🟡 MinorMissing timeout protection and input validation for QA endpoints.
The
/get_mcq_answer,/get_shortq_answer, and/get_boolean_answerendpoints callqa_model()andanswer.predict_boolean_answer()directly without timeout protection or input text length validation. A malicious or oversized input could cause these endpoints to hang.Consider applying consistent input validation and timeout handling across all ML-backed endpoints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 365 - 378, Add input validation and timeout protection around all ML-backed endpoints (/get_mcq_answer, /get_shortq_answer, /get_boolean_answer): validate and reject overly large or empty input_text and input_questions before calling qa_model or answer.predict_boolean_answer, and wrap calls to qa_model(...) and answer.predict_boolean_answer(...) in a bounded-time execution (e.g., asyncio.wait_for or concurrent.futures with a timeout) so slow or malicious inputs raise a timeout and return a 4xx/5xx response. Update the handlers that call qa_model and answer.predict_boolean_answer to catch validation errors and timeout exceptions and return a clear error response instead of hanging.
♻️ Duplicate comments (1)
backend/server.py (1)
42-68:⚠️ Potential issue | 🔴 CriticalCritical: Multiprocessing approach cannot efficiently handle bound methods with ML model state.
This
execute_with_timeoutimplementation passes bound methods (e.g.,MCQGen.generate_mcq) tomultiprocessing.Process, which requires pickling the entire generator instance. TheMCQGenerator,BoolQGenerator, andShortQGeneratorclasses (frombackend/Generator/main.py) all load heavy objects in__init__:
- T5ForConditionalGeneration models
- spaCy NLP pipelines
- Sense2Vec models
- NLTK FreqDist
This causes:
- Multi-GB serialization overhead per request — negating timeout benefits
- Potential
PicklingError— PyTorch models may fail to pickle- Resource leaks —
p.terminate()(Line 57) sends SIGTERM but doesn't guarantee GPU memory or file handle cleanupConsider a worker pool pattern where generators are initialized once per worker process, or use thread-based timeouts with signals for CPU-bound operations.
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 42 - 68, The current execute_with_timeout wrapper spawns a new multiprocessing.Process with a bound method (e.g., MCQGen.generate_mcq) which forces pickling of heavy model-laden generator instances (MCQGenerator, BoolQGenerator, ShortQGenerator), causing huge serialization overhead, PicklingError risks, and unsafe resource cleanup on terminate; fix by replacing this pattern with a persistent worker pool where each worker process initializes its generator instances once at startup and exposes a lightweight task interface (e.g., task IDs or plain data inputs) that execute_with_timeout (or a new submit_with_timeout) uses to enqueue work, or alternatively switch to a thread-based timeout/signal approach for CPU-bound tasks—ensure you stop passing bound instances into Process/wrapper and instead reference worker-local generators by name, and update execute_with_timeout/wrapper to accept only serializable data payloads and return status via IPC (Queue) without pickling model objects.
🧹 Nitpick comments (4)
backend/server.py (4)
460-464: Broad exception catch obscures failure reasons.Catching bare
Exceptionhides whether the failure was due to authentication, network issues, or an invalid document. Consider logging the actual exception for debugging while returning a generic message to the client:- except Exception: + except Exception as e: + app.logger.error(f"Document fetch failed for doc_id={doc_id}: {e}") return jsonify({ "error": "Failed to retrieve document content", "code": "document_fetch_error" }), 400🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 460 - 464, The bare except in the document retrieval handler masks root causes; modify the except Exception block that returns jsonify({"error":"Failed to retrieve document content","code":"document_fetch_error"}) to catch/log the real exception: add logging (e.g., use the existing logger or app.logger) to record the exception and stacktrace, and optionally narrow catches to known error types (auth/network/NotFound) while still returning the same generic JSON error to the client; ensure you reference the same exception handling block around the document fetch logic so the logged error includes exception info and context (document id/request params).
47-48: Consider preserving exception type for better diagnostics.Catching bare
Exceptionloses the exception type and traceback. For debugging production issues, consider capturing more context:except Exception as e: - q.put(("error", str(e))) + q.put(("error", f"{type(e).__name__}: {e}"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 47 - 48, The except block that currently does `except Exception as e: q.put(("error", str(e)))` should preserve exception type and traceback for diagnostics: import the traceback module and change the payload sent to the queue from a single string to include the exception class name and the formatted traceback (e.g., include type(e).__name__ and traceback.format_exc()) so callers receiving `q.put(...)` (the error message producer in backend/server.py) get full context for debugging.
308-316: ThreadPoolExecutor wrapping multiprocessing adds unnecessary overhead.Each thread just spawns a subprocess via
execute_with_timeout. Consider usingmultiprocessing.Poolorconcurrent.futures.ProcessPoolExecutordirectly to avoid the extra threading layer. However, this is secondary to the fundamental pickling issue with ML model instances.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 308 - 316, The ThreadPoolExecutor is just spawning subprocesses via execute_with_timeout, adding unnecessary threading overhead and exposing pickling problems for ML model instances; replace the ThreadPoolExecutor block that submits run_mcq/run_boolq/run_shortq with a process-based executor (e.g., concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool) and ensure the target call uses execute_with_timeout in the worker process. Also make the tasks picklable by loading any ML model instances inside the worker process (move model initialization out of module/global scope into the body of run_mcq/run_boolq/run_shortq or implement custom __getstate__/__setstate__ for model objects) so the ProcessPoolExecutor can serialize the tasks correctly. Ensure you reference/modify run_mcq, run_boolq, run_shortq, and execute_with_timeout accordingly.
581-588: Simplify redundant key check.The condition already verifies
qapair["options"]is truthy, making the subsequentor []unnecessary:- if "options" in qapair and qapair["options"]: - options = qapair.get("options") or [] - valid_options = [opt for opt in options if opt] + options = qapair.get("options") + if options: + valid_options = [opt for opt in options if opt]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 581 - 588, The code redundantly uses qapair.get("options") or [] after already checking "options" in qapair and qapair["options"] is truthy; replace that line with a direct access (options = qapair["options"]) to simplify the logic and keep behavior identical, leaving the rest of the block (valid_options, choices, choices_list) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 663-667: The qg.generate(...) call in the hard-question endpoints
(/get_shortq_hard, /get_mcq_hard, /get_boolq_hard) lacks timeout protection and
input-length validation; wrap the qg.generate invocation with the same
execute_with_timeout utility used elsewhere (use the same timeout value) and
enforce the same input text length checks (10-50000 chars) before calling it;
update the handler functions that call qg.generate to validate the incoming
article text and then call execute_with_timeout(() => qg.generate(...)) so the
endpoint returns a timeout/error consistently instead of hanging.
- Around line 456-458: The endpoint currently passes a raw doc_id to
get_document_content which expects a full document URL (since
extract_document_id(url) looks for '/document/d/{id}'); construct the full URL
that matches that pattern before calling get_document_content (e.g. build a
string like f"{<docs host>}/document/d/{doc_id}" or use
docs_service.base_url/format helper if available) and pass that URL to
docs_service.get_document_content(document_url) instead of the raw doc_id.
---
Outside diff comments:
In `@backend/server.py`:
- Around line 365-378: Add input validation and timeout protection around all
ML-backed endpoints (/get_mcq_answer, /get_shortq_answer, /get_boolean_answer):
validate and reject overly large or empty input_text and input_questions before
calling qa_model or answer.predict_boolean_answer, and wrap calls to
qa_model(...) and answer.predict_boolean_answer(...) in a bounded-time execution
(e.g., asyncio.wait_for or concurrent.futures with a timeout) so slow or
malicious inputs raise a timeout and return a 4xx/5xx response. Update the
handlers that call qa_model and answer.predict_boolean_answer to catch
validation errors and timeout exceptions and return a clear error response
instead of hanging.
---
Duplicate comments:
In `@backend/server.py`:
- Around line 42-68: The current execute_with_timeout wrapper spawns a new
multiprocessing.Process with a bound method (e.g., MCQGen.generate_mcq) which
forces pickling of heavy model-laden generator instances (MCQGenerator,
BoolQGenerator, ShortQGenerator), causing huge serialization overhead,
PicklingError risks, and unsafe resource cleanup on terminate; fix by replacing
this pattern with a persistent worker pool where each worker process initializes
its generator instances once at startup and exposes a lightweight task interface
(e.g., task IDs or plain data inputs) that execute_with_timeout (or a new
submit_with_timeout) uses to enqueue work, or alternatively switch to a
thread-based timeout/signal approach for CPU-bound tasks—ensure you stop passing
bound instances into Process/wrapper and instead reference worker-local
generators by name, and update execute_with_timeout/wrapper to accept only
serializable data payloads and return status via IPC (Queue) without pickling
model objects.
---
Nitpick comments:
In `@backend/server.py`:
- Around line 460-464: The bare except in the document retrieval handler masks
root causes; modify the except Exception block that returns
jsonify({"error":"Failed to retrieve document
content","code":"document_fetch_error"}) to catch/log the real exception: add
logging (e.g., use the existing logger or app.logger) to record the exception
and stacktrace, and optionally narrow catches to known error types
(auth/network/NotFound) while still returning the same generic JSON error to the
client; ensure you reference the same exception handling block around the
document fetch logic so the logged error includes exception info and context
(document id/request params).
- Around line 47-48: The except block that currently does `except Exception as
e: q.put(("error", str(e)))` should preserve exception type and traceback for
diagnostics: import the traceback module and change the payload sent to the
queue from a single string to include the exception class name and the formatted
traceback (e.g., include type(e).__name__ and traceback.format_exc()) so callers
receiving `q.put(...)` (the error message producer in backend/server.py) get
full context for debugging.
- Around line 308-316: The ThreadPoolExecutor is just spawning subprocesses via
execute_with_timeout, adding unnecessary threading overhead and exposing
pickling problems for ML model instances; replace the ThreadPoolExecutor block
that submits run_mcq/run_boolq/run_shortq with a process-based executor (e.g.,
concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool) and ensure the
target call uses execute_with_timeout in the worker process. Also make the tasks
picklable by loading any ML model instances inside the worker process (move
model initialization out of module/global scope into the body of
run_mcq/run_boolq/run_shortq or implement custom __getstate__/__setstate__ for
model objects) so the ProcessPoolExecutor can serialize the tasks correctly.
Ensure you reference/modify run_mcq, run_boolq, run_shortq, and
execute_with_timeout accordingly.
- Around line 581-588: The code redundantly uses qapair.get("options") or []
after already checking "options" in qapair and qapair["options"] is truthy;
replace that line with a direct access (options = qapair["options"]) to simplify
the logic and keep behavior identical, leaving the rest of the block
(valid_options, choices, choices_list) unchanged.
backend/server.py
Outdated
| output = qg.generate( | ||
| article=input_text, num_questions=input_questions, answer_style="sentences" | ||
| article=input_text, | ||
| num_questions=len(input_questions), | ||
| answer_style="sentences" | ||
| ) |
There was a problem hiding this comment.
Missing timeout protection for qg.generate() call.
The /get_shortq_hard, /get_mcq_hard, and /get_boolq_hard endpoints call qg.generate() directly without timeout protection, while other ML endpoints use execute_with_timeout. This inconsistency could allow these endpoints to hang indefinitely.
Additionally, these endpoints lack input text length validation (unlike /get_mcq, /get_boolq, /get_shortq which validate 10-50000 chars).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 663 - 667, The qg.generate(...) call in the
hard-question endpoints (/get_shortq_hard, /get_mcq_hard, /get_boolq_hard) lacks
timeout protection and input-length validation; wrap the qg.generate invocation
with the same execute_with_timeout utility used elsewhere (use the same timeout
value) and enforce the same input text length checks (10-50000 chars) before
calling it; update the handler functions that call qg.generate to validate the
incoming article text and then call execute_with_timeout(() => qg.generate(...))
so the endpoint returns a timeout/error consistently instead of hanging.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
backend/server.py (1)
42-52:⚠️ Potential issue | 🔴 CriticalPer-request multiprocessing with bound model methods is still a high-risk execution model.
execute_with_timeoutspins a newProcessfor each request while passing model-bound callables. This is expensive and can break under spawn-based start methods. Prefer long-lived worker processes (initialized once with models) and submit lightweight jobs with per-task timeouts.#!/bin/bash set -euo pipefail echo "1) Locate timeout executor and process primitives" rg -n "def execute_with_timeout|Process\(|Queue\(|set_start_method|get_context" backend/server.py echo echo "2) Locate generator singletons and timeout callsites" rg -n "MCQGen =|BoolQGen =|ShortQGen =|qg =|execute_with_timeout\(" backend/server.py echo echo "3) AST check: nested function used as Process target" python - <<'PY' import ast, pathlib p = pathlib.Path("backend/server.py") tree = ast.parse(p.read_text()) for fn in [n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef) and n.name == "execute_with_timeout"]: nested = [n.name for n in fn.body if isinstance(n, ast.FunctionDef)] print("nested defs in execute_with_timeout:", nested) for n in ast.walk(fn): if isinstance(n, ast.Call) and isinstance(n.func, ast.Name) and n.func.id == "Process": print("Process() call line:", n.lineno) PYAlso applies to: 141-145, 186-190, 232-236, 288-306, 677-683, 737-743, 796-802
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 42 - 52, The current execute_with_timeout creates a new Process per request using a nested wrapper and passes bound model methods (wrapper, Process, Queue), which is expensive and unsafe with spawn-based start; refactor to a long-lived worker pool or persistent worker Process(es) that are initialized once with model instances and accept lightweight task messages (use a manager Queue or multiprocessing.Pool/ProcessPoolExecutor with apply_async) instead of spawning per call, remove the nested wrapper as a Process target and instead send task descriptors (function name + args) or callables that the worker resolves against its local model, and update callsites (MCQGen, BoolQGen, ShortQGen, qg and all execute_with_timeout usages) to submit tasks to the worker pool and enforce per-task timeouts via the pool’s timeout support or Future.result(timeout=...) rather than creating new Process objects each request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 47-48: Replace instances where the code enqueues raw exception
text via q.put(("error", str(e))) with a stable public error indicator (for
example q.put(("error", {"code":"internal_server_error","message":"An internal
error occurred"}))) and log the full exception details server-side using
logging.exception or logger.exception so stack traces stay in server logs only;
locate all occurrences of q.put(("error", str(e))) in server.py (the q variable
and its error enqueue sites) and apply the same mapping so API responses never
contain raw exception strings but the server logs retain the detailed exception
information.
- Around line 461-465: The error handler in the /get_content endpoint currently
catches all exceptions and returns a 400, which hides server errors; update the
exception handling in the function that serves /get_content (the try/except
block around document retrieval) to explicitly catch expected client/validation
errors (e.g., ValueError, werkzeug.exceptions.BadRequest) and return a 400 with
the existing error JSON, and for all other unexpected exceptions log the full
exception (including stack trace) and return a 500 response with code
"internal_server_error" using jsonify; ensure you reference the same response
shape and use the existing logger instead of swallowing the exception.
- Around line 338-343: The handler that returns the "All generators failed"
response currently always uses HTTP 500; update the logic in the /get_problems
response where mcq_err, bool_err, and short_err are checked so that if all three
indicate a timeout (e.g., mcq_err == "timeout" && bool_err == "timeout" &&
short_err == "timeout" or using your TIMEOUT_ERROR constant), return the JSON
error with HTTP 504 instead of 500; otherwise keep the existing 500 return path.
Reference variables: mcq_err, bool_err, short_err and the existing
jsonify({...}) return.
---
Duplicate comments:
In `@backend/server.py`:
- Around line 42-52: The current execute_with_timeout creates a new Process per
request using a nested wrapper and passes bound model methods (wrapper, Process,
Queue), which is expensive and unsafe with spawn-based start; refactor to a
long-lived worker pool or persistent worker Process(es) that are initialized
once with model instances and accept lightweight task messages (use a manager
Queue or multiprocessing.Pool/ProcessPoolExecutor with apply_async) instead of
spawning per call, remove the nested wrapper as a Process target and instead
send task descriptors (function name + args) or callables that the worker
resolves against its local model, and update callsites (MCQGen, BoolQGen,
ShortQGen, qg and all execute_with_timeout usages) to submit tasks to the worker
pool and enforce per-task timeouts via the pool’s timeout support or
Future.result(timeout=...) rather than creating new Process objects each
request.
Addressed Issues:
Fixes #592
Screenshots/Recordings:
N/A (Backend improvement – no UI changes)
Additional Notes:
This PR introduces robust timeout handling for ML endpoints to prevent indefinite execution and resource exhaustion.
Key improvements include:
This ensures better performance, stability, and user experience.
AI Usage Disclosure:
I have used the following AI models and tools:
Checklist
Summary by CodeRabbit
New Features
Improvements