Implement Distributed Asynchronous Inference Pipeline using Celery and Redis#587
Implement Distributed Asynchronous Inference Pipeline using Celery and Redis#587piyush06singhal wants to merge 11 commits intoAOSSIE-Org:mainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a Redis-backed Celery async inference pipeline: Celery app and worker, Celery tasks for generation/answering/hardening, Flask async endpoints (enqueue/status/result) and sync wrappers, dev Dockerfiles/compose and start scripts, env template, requirements additions, and README/tooling updates for async vs legacy modes. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Flask as "Flask API"
participant Redis as "Redis Broker/Backend"
participant Worker as "Celery Worker"
participant Models as "AI Generators"
Client->>Flask: POST /generate_mcq_async (input_text, params)
Note over Flask: validate input, enqueue task
Flask->>Redis: Enqueue task (task_id)
Flask-->>Client: 202 Accepted (task_id)
rect rgba(100,150,200,0.5)
Redis->>Worker: Worker fetches task
Worker->>Models: Lazy-load generators
Worker->>Models: Run inference (MCQ/BoolQ/ShortQ)
Worker->>Redis: Store result / mark task success
end
Client->>Flask: GET /task_status/{task_id}
Flask->>Redis: Query task state
Redis-->>Flask: Return state
Flask-->>Client: 200 OK (state)
Client->>Flask: GET /task_result/{task_id}
Flask->>Redis: Retrieve result
Redis-->>Flask: Return result
Flask-->>Client: 200 OK (questions)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 50-51: The handlers that currently do logger.error(f"Error
creating MCQ task: {str(e)}", exc_info=True) followed by return
jsonify({"error": str(e)}), 500 should stop returning raw exception text to
clients; keep the detailed str(e) in logs (using logger.error(...,
exc_info=True)) but return a sanitized JSON error (e.g., jsonify({"error":
"Internal server error"}), 500) or a short safe message and optionally a
correlation id. Update all similar branches (the ones around logger.error /
return jsonify at the MCQ/task creation handlers and the other occurrences you
noted) so they log the full exception but return a generic message or an error
code to the client; use the existing logger.error and the handler function names
to locate each spot.
- Around line 34-35: The numeric parameters read via data.get (e.g.,
max_questions at the shown occurrence and the other numeric params at the other
occurrences) must be parsed to integers, validated and clamped before
enqueueing: replace direct uses of data.get("...") with parsing logic that
attempts int(...) and on ValueError returns a 400 error, then enforce a safe
min/max range (e.g., min 1, sensible upper cap) and use the clamped value;
update places where max_questions and the other numeric parameters are used so
they consume the validated/clamped integers (ensure validation lives in the
route handler that prepares the task payload, not inside the worker).
In `@backend/tasks/inference_tasks.py`:
- Line 8: The import uses the wrong top-level module name; replace occurrences
of "from celery_worker import celery_app" with an explicit package import "from
backend.celery_worker import celery_app" so the module resolves from the package
root; update this in backend/tasks/inference_tasks.py (symbol: celery_app) and
the other backend modules mentioned (server.py and async_generation_routes.py)
to consistently use the backend.* namespace.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1276eb2e-d8bc-4359-9b49-b60c50096e1b
📒 Files selected for processing (7)
backend/celery_worker.pybackend/routes/__init__.pybackend/routes/async_generation_routes.pybackend/server.pybackend/tasks/__init__.pybackend/tasks/inference_tasks.pyrequirements.txt
…es for async endpoints
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
backend/tasks/inference_tasks.py (3)
97-99: Use explicit conversion flag in f-string.Per Ruff RUF010,
f"{str(e)}"should use the explicit!sconversion flag instead of wrapping withstr().♻️ Suggested fix
except Exception as e: - logger.error(f"Error in MCQ generation: {str(e)}", exc_info=True) + logger.error(f"Error in MCQ generation: {e!s}", exc_info=True) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` around lines 97 - 99, Replace the explicit str() conversion inside the f-string in the exception handler with the !s conversion flag; locate the except block around MCQ generation that reads "except Exception as e:" (using the logger variable) and change the log call to use f"{e!s}" instead of f"{str(e)}" so it complies with Ruff RUF010 and preserves the same message and exc_info behavior.
16-20: Remove unused module-level globals.These globals (
MCQGen,BoolQGen,ShortQGen,mediawikiapi) are declared but never used. TheGeneratorTaskclass manages generator instances via instance attributes (_mcq_gen,_boolq_gen, etc.) with lazy initialization through properties.♻️ Suggested fix
logger = logging.getLogger(__name__) -# Initialize generators (these will be loaded once per worker) -MCQGen = None -BoolQGen = None -ShortQGen = None -mediawikiapi = None - class GeneratorTask(Task):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` around lines 16 - 20, Remove the unused module-level globals MCQGen, BoolQGen, ShortQGen, and mediawikiapi from the top of the file and rely on the GeneratorTask instance attributes (_mcq_gen, _boolq_gen, _shortq_gen, etc.) and their lazy-initializing properties for generator access; ensure no other code in this module references the removed globals (update any stray references to use the GeneratorTask properties or instance fields) so all generator management is encapsulated in the GeneratorTask class.
6-6: Unused import:torchThe
torchmodule is imported but never used in this file. The GPU memory management (torch.cuda.empty_cache()) is handled inside the generator classes themselves.♻️ Suggested fix
import logging -import torch from celery import Task🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` at line 6, Remove the unused top-level import of torch in inference_tasks.py; specifically delete the import statement "import torch" at the top of the file so that the module no longer imports torch unnecessarily (GPU memory management and torch.cuda.empty_cache() remain handled inside the generator classes that manage their own cleanup).backend/routes/async_generation_routes.py (3)
67-70: Minor: Use explicit conversion flag in f-strings.Multiple lines use
f"{str(e)}"pattern (lines 67, 70, 100, 103, 133, 136, 174, 177, 219, 274). Per Ruff RUF010, preferf"{e!s}"for cleaner code.Example fix:
- logger.warning(f"Validation error in MCQ task: {str(e)}") + logger.warning(f"Validation error in MCQ task: {e!s}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 67 - 70, Replace occurrences of f"{str(e)}" with the explicit string-conversion f"{e!s}" in the exception handling and logging around MCQ task creation and other handlers to satisfy Ruff RUF010; specifically update the logger.warning, logger.error and any jsonify error responses that use f"{str(e)}" (e.g., inside the exception blocks in async_generation_routes.py handling MCQ task creation and similar try/except handlers) so they use f"{e!s}" instead.
26-34: Preserve exception context withraise ... from.When re-raising a
ValueErrorfrom within theexceptblock, the original exception context is lost. Useraise ... fromto preserve the exception chain for better debugging.♻️ Suggested fix
try: value = int(data.get(key, default)) except (TypeError, ValueError): - raise ValueError(f"{key} must be an integer") + raise ValueError(f"{key} must be an integer") from NoneUsing
from Noneexplicitly suppresses the chained exception (since the user-facing message is intentionally generic), whilefrom errwould chain it if you want debugging context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 26 - 34, The except block currently swallows the original TypeError/ValueError; modify the try/except that parses int(...) so you capture the caught exception (e.g., except (TypeError, ValueError) as err:) and re-raise the new ValueError using "raise ValueError(f\"{key} must be an integer\") from err" (or use "from None" if you intentionally want to suppress chaining). Update the try/except around int(data.get(key, default)) accordingly so the exception chain is preserved or explicitly suppressed as desired.
47-51: Consider validatinguse_mediawikiparameter.The
use_mediawikiparameter is passed directly without validation. While the task safely handles it (uses== 1comparison), validating at the API boundary would provide clearer error messages for invalid inputs.♻️ Suggested pattern
input_text = data.get("input_text", "") - use_mediawiki = data.get("use_mediawiki", 0) + use_mediawiki = 1 if data.get("use_mediawiki") in (1, True, "1") else 0 max_questions = _parse_bounded_int(data, "max_questions", 4)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 47 - 51, The handler reads use_mediawiki from the JSON payload without validation—add input validation for use_mediawiki in async_generation_routes.py right after request.get_json(silent=True) so the variable is guaranteed to be 0 or 1 (accept true/false booleans and numeric "0"/"1" strings by coercion), and return a 400 BadRequest with a clear message when the value cannot be parsed; update the local variable use_mediawiki and any downstream checks that expect an int (the name use_mediawiki is the unique symbol to change) so the rest of the function can safely use comparisons like use_mediawiki == 1.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 193-195: The runtime imports "from celery_worker import
celery_app" inside get_task_status and get_task_result are inconsistent and will
cause ModuleNotFoundError; move the import to module-level using the correct
module path (import celery_app from backend.celery_worker) and remove the
in-function imports in the get_task_status and get_task_result functions so both
functions reference the module-level celery_app; update any existing top-level
import section to include "from backend.celery_worker import celery_app" and
delete the two runtime import lines.
In `@backend/tasks/inference_tasks.py`:
- Around line 183-216: Remove the unnecessary f-string prefixes on the two
logger.info calls (the one before processing and the one after result creation)
so they are plain strings, and change the error logging to use logging's
explicit formatting or exception helper: replace logger.error(f"Error in
combined generation: {str(e)}", exc_info=True) with either logger.error("Error
in combined generation: %s", e, exc_info=True) or logger.exception("Error in
combined generation") — update the try/except block that calls
self.mcq_gen.generate_mcq, self.boolq_gen.generate_boolq, and
self.shortq_gen.generate_shortq accordingly.
---
Nitpick comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 67-70: Replace occurrences of f"{str(e)}" with the explicit
string-conversion f"{e!s}" in the exception handling and logging around MCQ task
creation and other handlers to satisfy Ruff RUF010; specifically update the
logger.warning, logger.error and any jsonify error responses that use
f"{str(e)}" (e.g., inside the exception blocks in async_generation_routes.py
handling MCQ task creation and similar try/except handlers) so they use f"{e!s}"
instead.
- Around line 26-34: The except block currently swallows the original
TypeError/ValueError; modify the try/except that parses int(...) so you capture
the caught exception (e.g., except (TypeError, ValueError) as err:) and re-raise
the new ValueError using "raise ValueError(f\"{key} must be an integer\") from
err" (or use "from None" if you intentionally want to suppress chaining). Update
the try/except around int(data.get(key, default)) accordingly so the exception
chain is preserved or explicitly suppressed as desired.
- Around line 47-51: The handler reads use_mediawiki from the JSON payload
without validation—add input validation for use_mediawiki in
async_generation_routes.py right after request.get_json(silent=True) so the
variable is guaranteed to be 0 or 1 (accept true/false booleans and numeric
"0"/"1" strings by coercion), and return a 400 BadRequest with a clear message
when the value cannot be parsed; update the local variable use_mediawiki and any
downstream checks that expect an int (the name use_mediawiki is the unique
symbol to change) so the rest of the function can safely use comparisons like
use_mediawiki == 1.
In `@backend/tasks/inference_tasks.py`:
- Around line 97-99: Replace the explicit str() conversion inside the f-string
in the exception handler with the !s conversion flag; locate the except block
around MCQ generation that reads "except Exception as e:" (using the logger
variable) and change the log call to use f"{e!s}" instead of f"{str(e)}" so it
complies with Ruff RUF010 and preserves the same message and exc_info behavior.
- Around line 16-20: Remove the unused module-level globals MCQGen, BoolQGen,
ShortQGen, and mediawikiapi from the top of the file and rely on the
GeneratorTask instance attributes (_mcq_gen, _boolq_gen, _shortq_gen, etc.) and
their lazy-initializing properties for generator access; ensure no other code in
this module references the removed globals (update any stray references to use
the GeneratorTask properties or instance fields) so all generator management is
encapsulated in the GeneratorTask class.
- Line 6: Remove the unused top-level import of torch in inference_tasks.py;
specifically delete the import statement "import torch" at the top of the file
so that the module no longer imports torch unnecessarily (GPU memory management
and torch.cuda.empty_cache() remain handled inside the generator classes that
manage their own cleanup).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bdda85f4-0851-4060-b89e-900855ddb3fe
📒 Files selected for processing (2)
backend/routes/async_generation_routes.pybackend/tasks/inference_tasks.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
backend/tasks/inference_tasks.py (3)
123-125: Use explicit conversion flag for consistency.Line 124 uses
{str(e)}while other error handlers in this file use{e!s}. Apply consistent style.♻️ Suggested fix
except Exception as e: - logger.error(f"Error in BoolQ generation: {str(e)}", exc_info=True) + logger.error(f"Error in BoolQ generation: {e!s}", exc_info=True) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` around lines 123 - 125, The log call in the BoolQ error handler uses inconsistent exception formatting; update the logger.error invocation in the except block that handles BoolQ generation so it uses the same explicit conversion flag `{e!s}` instead of `str(e)`, e.g., change the f-string in the except Exception as e block (the logger.error line) to use `{e!s}` to match other handlers and maintain consistent style across the file.
54-59: Consider adding error handling for MediaWiki API failures.The
mediawiki_instance.summary()call can fail due to network issues, rate limiting, or invalid page titles. Currently, such failures would cause the entire task to fail.Consider catching MediaWiki-specific exceptions and either falling back to the original input or raising a more descriptive error:
💡 Suggested pattern
def process_input_text(input_text, use_mediawiki, mediawiki_instance): """Process input text, optionally fetching from MediaWiki.""" if use_mediawiki == 1: logger.info(f"Fetching MediaWiki summary for: {input_text}") - input_text = mediawiki_instance.summary(input_text, 8) + try: + input_text = mediawiki_instance.summary(input_text, 8) + except Exception as e: + logger.warning(f"MediaWiki fetch failed, using original input: {e!s}") + # Optionally re-raise or return original input_text as fallback return input_text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` around lines 54 - 59, The MediaWiki summary call in process_input_text can raise network/API errors; wrap the mediawiki_instance.summary(input_text, 8) call in a try/except that catches MediaWiki-specific exceptions (and a general Exception fallback), log the error via logger.error with the input_text and exception details, and on failure either return the original input_text (fallback behavior) or re-raise a more descriptive exception; ensure you reference process_input_text and mediawiki_instance.summary when making the change and keep the behavior consistent with existing logging/error conventions.
156-158: Use explicit conversion flag for consistency.Same as above - use
{e!s}instead of{str(e)}.♻️ Suggested fix
except Exception as e: - logger.error(f"Error in ShortQ generation: {str(e)}", exc_info=True) + logger.error(f"Error in ShortQ generation: {e!s}", exc_info=True) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` around lines 156 - 158, Replace the explicit str(e) conversion in the exception log with the explicit conversion flag `{e!s}` to ensure consistent stringification; locate the except block handling "ShortQ generation" where logger.error is called (the line using logger.error(f"Error in ShortQ generation: {str(e)}", exc_info=True)), and change the formatted string to use `{e!s}` instead of `str(e)` so the message becomes f"Error in ShortQ generation: {e!s}" while keeping exc_info=True and re-raising the exception.backend/routes/async_generation_routes.py (2)
236-241: Consider clarifying behavior for unknown task IDs.Celery's
AsyncResultreturnsPENDINGstate for both genuinely pending tasks AND unknown/invalid task IDs. This means clients cannot distinguish between "task is queued" and "task ID doesn't exist."This is a known Celery behavior and may be acceptable, but consider documenting this in the API response or docstring so clients understand the ambiguity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 236 - 241, When task_result.state == 'PENDING' this can mean either a queued task or an unknown/invalid task_id; update the route handler in async_generation_routes (the block that checks task_result.state and returns 202) to make that ambiguity explicit by changing the response message to something like "Task is waiting to be executed or the task_id is unknown/invalid", and also add a brief docstring/comment on the route explaining Celery's PENDING semantics so clients understand the ambiguity (refer to task_result and task_id in the comment).
71-71: Consider using consistent f-string conversion style.Several logging statements use
{str(e)}while others use{e!s}. For consistency, prefer the explicit conversion flag{e!s}throughout. Affected lines: 71, 101, 104, 134, 137, 175, 178, 219, 273.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` at line 71, Replace inconsistent f-string exception conversion by using the explicit conversion flag {e!s} wherever logger.error currently interpolates exceptions with {str(e)}; locate logger.error calls in async_generation_routes.py that format exception variable e (e.g., the one logging "Error creating MCQ task") and change f"...{str(e)}" to f"...{e!s}" so all logs use the same {e!s} style while keeping exc_info=True intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Line 84: The endpoints BoolQ, ShortQ, and generate_all currently assign
use_mediawiki with data.get("use_mediawiki", 0) which can leave it as True or
"1" causing process_input_text's check if use_mediawiki == 1 to miss them;
normalize use_mediawiki the same way as MCQ (e.g., use_mediawiki = 1 if
data.get("use_mediawiki") in (1, True, "1") else 0) in each handler so
process_input_text sees a consistent integer 0/1 value; update the assignments
in the BoolQ, ShortQ, and generate_all handlers (where use_mediawiki is read) to
match the MCQ normalization and keep process_input_text unchanged.
---
Nitpick comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 236-241: When task_result.state == 'PENDING' this can mean either
a queued task or an unknown/invalid task_id; update the route handler in
async_generation_routes (the block that checks task_result.state and returns
202) to make that ambiguity explicit by changing the response message to
something like "Task is waiting to be executed or the task_id is
unknown/invalid", and also add a brief docstring/comment on the route explaining
Celery's PENDING semantics so clients understand the ambiguity (refer to
task_result and task_id in the comment).
- Line 71: Replace inconsistent f-string exception conversion by using the
explicit conversion flag {e!s} wherever logger.error currently interpolates
exceptions with {str(e)}; locate logger.error calls in
async_generation_routes.py that format exception variable e (e.g., the one
logging "Error creating MCQ task") and change f"...{str(e)}" to f"...{e!s}" so
all logs use the same {e!s} style while keeping exc_info=True intact.
In `@backend/tasks/inference_tasks.py`:
- Around line 123-125: The log call in the BoolQ error handler uses inconsistent
exception formatting; update the logger.error invocation in the except block
that handles BoolQ generation so it uses the same explicit conversion flag
`{e!s}` instead of `str(e)`, e.g., change the f-string in the except Exception
as e block (the logger.error line) to use `{e!s}` to match other handlers and
maintain consistent style across the file.
- Around line 54-59: The MediaWiki summary call in process_input_text can raise
network/API errors; wrap the mediawiki_instance.summary(input_text, 8) call in a
try/except that catches MediaWiki-specific exceptions (and a general Exception
fallback), log the error via logger.error with the input_text and exception
details, and on failure either return the original input_text (fallback
behavior) or re-raise a more descriptive exception; ensure you reference
process_input_text and mediawiki_instance.summary when making the change and
keep the behavior consistent with existing logging/error conventions.
- Around line 156-158: Replace the explicit str(e) conversion in the exception
log with the explicit conversion flag `{e!s}` to ensure consistent
stringification; locate the except block handling "ShortQ generation" where
logger.error is called (the line using logger.error(f"Error in ShortQ
generation: {str(e)}", exc_info=True)), and change the formatted string to use
`{e!s}` instead of `str(e)` so the message becomes f"Error in ShortQ generation:
{e!s}" while keeping exc_info=True and re-raising the exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 76670ed4-8fcd-4570-8e99-7071a03a7b56
📒 Files selected for processing (2)
backend/routes/async_generation_routes.pybackend/tasks/inference_tasks.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
backend/routes/async_generation_routes.py (1)
71-71: Replace{str(e)}with{e!s}(or use logger%sformatting) to resolve Ruff RUF010 warnings.All 9 instances in the file use redundant
str()calls inside f-strings. Using the{e!s}conversion flag or switching to logger's%sformatting is cleaner and aligns with Python logging best practices.Affected lines: 71, 101, 104, 134, 137, 175, 178, 219, 273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` at line 71, Replace redundant str(e) calls inside f-strings used in logger.error across async_generation_routes.py (the logger.error lines that log exceptions such as the "Error creating MCQ task" call and the eight other occurrences) by either using the conversion flag {e!s} inside the f-string or, preferably, switch to logger's %-style formatting (e.g., logger.error("... %s", e, exc_info=True)); update each logger.error invocation (the nine instances flagged) accordingly so no str(e) calls remain and keep exc_info=True as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 49-55: The code assumes input_text is a string and calls .strip()
directly; replace the raw access from request.get_json(...) and manual strip
checks with a call to the existing _parse_input_text(...) helper to validate
type and enforce a payload size cap before enqueueing. Specifically, in
backend/routes/async_generation_routes.py use _parse_input_text(data,
"input_text") (or the function signature used elsewhere) instead of
data.get("input_text", "") in the BoolQ, ShortQ and combined endpoints, ensure
invalid types or oversize payloads return a 400 JSON error, and keep
use_mediawiki and max_questions parsing unchanged.
In `@backend/tasks/inference_tasks.py`:
- Around line 58-62: The code in process_input_text has invalid literal '+'
characters at the start of lines (around the mediawiki_instance.summary call and
the except block) which breaks import; remove those leading '+' characters and
fix indentation so the try/except is a normal Python block (call
mediawiki_instance.summary(input_text, 8) inside the try, catch Exception as e
and log via logger.warning with the error), preserving the fallback behavior of
using the original input_text and not re-raising unless intended; update the
try/except around mediawiki_instance.summary to match surrounding function
indentation and ensure the module can import and Celery tasks register.
---
Nitpick comments:
In `@backend/routes/async_generation_routes.py`:
- Line 71: Replace redundant str(e) calls inside f-strings used in logger.error
across async_generation_routes.py (the logger.error lines that log exceptions
such as the "Error creating MCQ task" call and the eight other occurrences) by
either using the conversion flag {e!s} inside the f-string or, preferably,
switch to logger's %-style formatting (e.g., logger.error("... %s", e,
exc_info=True)); update each logger.error invocation (the nine instances
flagged) accordingly so no str(e) calls remain and keep exc_info=True as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1fc0e47b-29df-443b-947f-0eef6a2fab07
📒 Files selected for processing (2)
backend/routes/async_generation_routes.pybackend/tasks/inference_tasks.py
|
Hi @piyush06singhal This is a very ambitious and well-written PR. Moving to an async Celery pipeline is definitely the right long-term play for scaling inference. From a repository health and architectural perspective, I have a few vital questions before we integrate this into main:
Would love to discuss how we can safely stage this massive architectural shift without breaking the local dev environment. |
|
Hi @SxBxcoder , Thank you for the detailed review really appreciate you taking the time to go through this so thoroughly. Regarding your points: 1. Local Setup Friction
2. VRAM / OOM Concerns To address this, I see two possible improvements:
Happy to align this with the existing architecture to avoid memory overhead. 3. Frontend Compatibility The async endpoints are introduced as opt-in, allowing gradual frontend migration without breaking existing functionality. I agree this is a significant architectural shift, and I’m open to staging it more safely
Let me know how you'd prefer to proceed. Happy to iterate on this. Thanks again for the insightful feedback! |
|
That staged approach sounds good, @piyush06singhal . Adding the minimal docker-compose setup and ensuring the Flask container doesn't duplicate the ML models in memory will make this bulletproof for local devs. Looking forward to seeing the updated iteration—feel free to ping me when it's ready for another look. |
…fe rollout toggle
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
208-288:⚠️ Potential issue | 🔴 CriticalUnresolved merge conflict must be fixed before merging.
The README contains unresolved git merge conflict markers (
<<<<<<< HEAD,=======,>>>>>>> 8b154d5). This will render incorrectly and confuse users.Both sections contain valuable content—the LLM-based question generation documentation and the troubleshooting/async endpoints documentation should both be preserved.
🔧 Resolve by keeping both sections
Remove the conflict markers and keep both sections:
-<<<<<<< HEAD - If the script fails to run, ensure that you have execution permissions: ```bash chmod +x script.shLLM-Based Question Generation
... (keep this entire section)
-=======
+
+---
+
Issue: "Celery is not available" error
... (keep this entire section through the async/sync endpoints docs)->>>>>>> 8b154d5 (improve async pipeline with memory optimization, docker setup, and safe rollout toggle)
</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@README.mdaround lines 208 - 288, The README contains unresolved git merge
conflict markers (<<<<<<< HEAD, =======, >>>>>>> 8b154d5) that must be removed;
resolve by merging the two retained sections so both "LLM-Based Question
Generation" and the "Async vs Sync Endpoints / troubleshooting" content remain
intact, delete the conflict markers and any duplicated lines, ensure the
LLM-Based Question Generation heading and its JSON example and the Async vs Sync
Endpoints block (including the Celery/Redis troubleshooting notes) are present
and formatted consistently under their respective headings.</details> </blockquote></details> </blockquote></details>🧹 Nitpick comments (9)
backend/server.py (1)
284-301: Addstrict=Truetozip()for safety.Using
zip()withoutstrict=Truesilently truncates if lists have different lengths. While line 276 validates lengths match, explicitstrict=Trueprovides defense-in-depth.♻️ Add strict parameter
- for question, options in zip(input_questions, input_options): + for question, options in zip(input_questions, input_options, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 284 - 301, The loop using zip(input_questions, input_options) should use zip(..., strict=True) to prevent silent truncation if lengths differ; update the loop header to for question, options in zip(input_questions, input_options, strict=True): (the variables input_questions and input_options in the current loop) to provide defense-in-depth alongside the existing length check—ensure the runtime supports Python 3.10+ or provide an explicit length check fallback if not.Dockerfile.backend (2)
27-33: Sense2Vec conditional check is ineffective during Docker build.The
if [ ! -d "s2v_old" ]check will always be true in a fresh build sinceCOPY backend/happens just before. The check is harmless but adds no value—consider removing it for clarity, or keep it if you intend to use this Dockerfile with pre-existing volumes.♻️ Simplify by removing redundant conditional
# Download and extract Sense2Vec model if not present WORKDIR /app/backend -RUN if [ ! -d "s2v_old" ]; then \ - wget https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz && \ +RUN wget https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz && \ tar -xzf s2v_reddit_2015_md.tar.gz && \ - rm s2v_reddit_2015_md.tar.gz; \ - fi + rm s2v_reddit_2015_md.tar.gz🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.backend` around lines 27 - 33, The conditional in the RUN step checking for the s2v_old directory is ineffective during image build because COPY backend/ precedes it, so remove the redundant if-check and make the RUN download/extract commands unconditional (update the RUN block that references s2v_old, wget, tar, rm) or, if you need the conditional behavior for runtime-mounted volumes, replace the current conditional with a comment explaining that choice and keep the logic; ensure you modify the RUN line alongside WORKDIR and consider the interaction with the earlier COPY backend/ step.
1-13: Add non-root user and--no-install-recommendsfor security and image size.Running containers as root is a security risk. Additionally,
--no-install-recommendsreduces image size by avoiding unnecessary packages.🔒 Proposed security and optimization improvements
# Dockerfile for Flask backend server FROM python:3.10-slim # Set working directory WORKDIR /app +# Create non-root user +RUN groupadd -r eduaid && useradd -r -g eduaid eduaid + # Install system dependencies -RUN apt-get update && apt-get install -y \ +RUN apt-get update && apt-get install -y --no-install-recommends \ gcc \ g++ \ git \ wget \ && rm -rf /var/lib/apt/lists/*Then before
CMD, add:+# Switch to non-root user +RUN chown -R eduaid:eduaid /app +USER eduaid + # Run Flask server CMD ["python", "server.py"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.backend` around lines 1 - 13, Update the Dockerfile to avoid running as root and to reduce image size: modify the apt-get install RUN (the line starting with RUN apt-get update && apt-get install -y ...) to add --no-install-recommends and keep the apt-get clean up, create a non-root user/group (e.g., add_user or user creation block) after WORKDIR /app, chown the /app directory to that user, and switch to that non-root user with USER before the final CMD; preserve the existing FROM python:3.10-slim and current RUN structure but replace root operations with operations performed as root only for package install and file ownership, then run the app as the created non-root user.docker-compose.yml (1)
55-59:service_starteddoesn't guarantee Celery worker is ready.The backend depends on
celery-worker: condition: service_started, but this only waits for the container to start, not for Celery to be ready to accept tasks. Model loading in the worker can take significant time.Consider adding a healthcheck to the celery-worker service or documenting that initial requests may fail during worker startup.
🩺 Add Celery worker healthcheck
celery-worker: build: context: . dockerfile: Dockerfile.worker container_name: eduaid-celery-worker depends_on: redis: condition: service_healthy + healthcheck: + test: ["CMD", "celery", "-A", "celery_worker.celery_app", "inspect", "ping", "-d", "celery@$$HOSTNAME"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 60sThen update backend dependency:
depends_on: redis: condition: service_healthy celery-worker: - condition: service_started + condition: service_healthy🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 55 - 59, The docker-compose currently uses depends_on with celery-worker: condition: service_started which only waits for container start; add a proper healthcheck to the celery-worker service (e.g., a command that verifies Celery is ready to accept tasks) and then change the backend's depends_on entry to reference celery-worker: condition: service_healthy; alternatively, if you prefer not to add a healthcheck, add documentation/error-handling noting that initial requests may fail until the worker finishes model loading. Ensure you modify the celery-worker service block to include a healthcheck section and update the backend depends_on from service_started to service_healthy.Dockerfile.worker (1)
1-13: Same security improvements needed as Dockerfile.backend.Apply the same non-root user and
--no-install-recommendschanges recommended forDockerfile.backendto maintain consistency and security.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.worker` around lines 1 - 13, Update the Dockerfile to mirror the security improvements from Dockerfile.backend: change the apt-get install invocation to use --no-install-recommends in the RUN step that installs gcc, g++, git, wget; create a dedicated non-root user (e.g., appuser), chown the /app working directory to that user, and add a USER appuser directive so the container runs as non-root; keep the existing apt-get update and cleanup (rm -rf /var/lib/apt/lists/*) but ensure the create-user and chown steps occur before switching USER.backend/inference_service.py (3)
47-54: Missing exception handling for Celery task failures.
result.get(timeout=600)can raise various exceptions:celery.exceptions.TimeoutErrorif the task doesn't complete in time, task exceptions propagated from the worker, or connection errors if Redis is unavailable. These will propagate as-is to the caller.Consider wrapping in a try/except to translate Celery-specific exceptions into application-level errors with clearer messages.
💡 Suggested pattern
from celery.exceptions import TimeoutError as CeleryTimeout def generate_mcq_sync(input_text: str, max_questions: int = 4, use_mediawiki: int = 0) -> Dict[str, Any]: if not CELERY_AVAILABLE: raise RuntimeError("Celery is not available. Please ensure Redis and Celery worker are running.") try: result = generate_mcq_task.apply_async( args=[input_text, max_questions, use_mediawiki] ) return result.get(timeout=600) except CeleryTimeout: raise RuntimeError("Task timed out after 10 minutes") from None except Exception as e: logger.error(f"Task execution failed: {e!s}", exc_info=True) raise RuntimeError("Task execution failed") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/inference_service.py` around lines 47 - 54, Wrap the synchronous Celery call in generate_mcq_sync (or wherever generate_mcq_task is invoked) with try/except: call generate_mcq_task.apply_async(...) then result.get(timeout=600) inside the try, catch celery.exceptions.TimeoutError to raise a clear application-level RuntimeError like "Task timed out after 10 minutes" (or similar), and catch other exceptions to log via the module logger (include exc_info=True) and re-raise a RuntimeError("Task execution failed") from the original exception; ensure CELERY_AVAILABLE check remains and import celery.exceptions.TimeoutError where used.
11-12: UnusedUSE_ASYNCvariable.
USE_ASYNCis defined but never used in this module. The file only checksCELERY_AVAILABLEto determine functionality. Consider removing it to avoid confusion, or use it to gate whether the sync wrappers should be available.-# Check if async mode is enabled -USE_ASYNC = os.getenv('USE_ASYNC', 'false').lower() in ('true', '1', 'yes')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/inference_service.py` around lines 11 - 12, The module defines USE_ASYNC but never uses it, causing confusion; either remove the unused constant or integrate it into the service gating logic so async behavior is actually controlled by it (e.g., combine USE_ASYNC with CELERY_AVAILABLE when deciding whether to expose async wrappers or fall back to sync versions). Update references around CELERY_AVAILABLE and any wrapper export logic so the decision uses both USE_ASYNC and CELERY_AVAILABLE (or delete USE_ASYNC and any related comments) to keep the intent clear.
51-54: Consider extracting hardcoded timeout constant.The 600-second timeout is repeated across all functions. Extract to a module-level constant for easier configuration and maintenance.
# At module level TASK_TIMEOUT_SECONDS = 600 # 10 minutes # In functions return result.get(timeout=TASK_TIMEOUT_SECONDS)Also applies to: 72-75, 93-96
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/inference_service.py` around lines 51 - 54, The repeated hardcoded 600-second timeout used when awaiting Celery task results (e.g., where generate_mcq_task.apply_async is called and result.get(timeout=600) is used) should be pulled into a module-level constant (suggested name TASK_TIMEOUT_SECONDS) and all occurrences updated to use that constant; add the constant at the top of the module and replace every result.get(timeout=600) (including the other similar calls) with result.get(timeout=TASK_TIMEOUT_SECONDS) so the timeout is configurable and maintained in one place.backend/routes/async_generation_routes.py (1)
295-300: Consider: Celery's PENDING state is ambiguous.Celery returns
PENDINGfor both tasks waiting to execute and for unknown/non-existent task IDs. A client polling for a mistyped or expired task ID will indefinitely receive "pending" status instead of an error.Consider adding task existence validation (e.g., via Redis key lookup) or documenting this limitation in API responses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 295 - 300, The route currently returns "pending" whenever task_result.state == 'PENDING', but Celery uses PENDING for both queued and unknown task IDs; update the handler that checks task_result (the AsyncResult for task_id) to disambiguate by querying the backend for existence (e.g., use task_result.backend.get_task_meta(task_id) / result_backend.get or check Redis key for task meta) and if no metadata exists return a 404/errored JSON indicating unknown/expired task_id, otherwise keep returning 202 pending; update the code paths around task_result and task_id to perform this extra backend lookup and include a clear message when the task ID is not found.🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Inline comments: In `@backend/server.py`: - Around line 636-640: Replace the bare except in the CELERY availability check with targeted exception handling: when importing or accessing inference_service.CELERY_AVAILABLE fail, catch ImportError and AttributeError (and optionally ModuleNotFoundError) instead of a bare except, and set health_status["services"]["celery"] = "unavailable"; also consider logging the caught exception for debugging so you can see why inference_service or CELERY_AVAILABLE was unavailable. Use the existing identifiers inference_service, CELERY_AVAILABLE, and health_status to locate the code to change. - Around line 92-100: The qa_model is being unconditionally instantiated by qa_model = pipeline("question-answering") which overrides the earlier None set when USE_CELERY_INFERENCE is enabled; move the pipeline instantiation inside the legacy (non-Celery) initialization branch so qa_model remains None when using Celery inference, and adjust any code paths that reference qa_model (notably the /get_mcq_answer handler) to use the Celery path when qa_model is None; specifically locate the qa_model assignments and the LLMQuestionGenerator/MediaWikiAPI initializations and ensure only the legacy mode creates qa_model via pipeline("question-answering"), while the Celery branch leaves qa_model unset and uses the Celery task path. In `@backend/start-dev.bat`: - Around line 1-49: The batch script start-dev.bat contains LF-only line endings which can break Windows batch parsing; convert the file to CRLF line endings and commit that change (either run a tool to replace LF with CRLF for start-dev.bat or add a .gitattributes entry to force eol=crlf for this file), then verify the script (the sections referencing USE_CELERY_INFERENCE, the if /i "%USE_CELERY_INFERENCE%"=="true" block, and the final "python server.py" launch) runs correctly in cmd.exe; optionally remove or replace Unicode emoji characters if you need guaranteed rendering in default Windows consoles. In `@backend/start-dev.sh`: - Around line 20-23: Replace the fragile export $(cat .env | grep -v '^#' | xargs) pattern in start-dev.sh with a safe line-by-line loader: when .env exists, read it with while IFS= read -r line; skip blank/comment lines and run export "$line" for each remaining line. This avoids unquoted command-substitution word-splitting, handles values with spaces, and removes the unnecessary cat/grep pipeline; look for the .env loading block and update it accordingly. In `@backend/tasks/inference_tasks.py`: - Around line 261-277: The loop using zip(input_questions, input_options) in the method that builds outputs silently truncates when the two lists differ; add explicit length validation before the loop (e.g., check len(input_questions) vs len(input_options) and raise a ValueError with counts) or use zip(..., strict=True) to fail fast; update the block around the for question, options in zip(...) (and any callers that rely on matching lengths) so mismatched inputs are detected and reported rather than silently truncated. - Around line 444-454: generate_hard_boolq_task currently returns a flat list of strings because it applies make_question_harder to the output of question_gen.generate directly; to match generate_hard_shortq_task and generate_hard_mcq_task you should preserve the dict structure: iterate over the original generated list (from question_gen.generate), replace each item's "question" value with make_question_harder(item["question"]) (or otherwise wrap the hardened string back into the same dict shape), and return the modified list of dicts so consumers get a consistent output shape; reference generate_hard_boolq_task, question_gen.generate, and make_question_harder when making the change. In `@docker-compose.yml`: - Around line 42-47: The docker-compose.yml currently uses the deploy.resources memory limits/reservations block (deploy.resources -> limits.memory and deployments.resources -> reservations.memory) which only applies in Docker Swarm; update the compose file to support non‑Swarm runs by either adding the legacy keys mem_limit and mem_reservation under the relevant service or document/run instructions to start with docker-compose --compatibility up so memory limits are honored; adjust the service entry that contains deploy.resources to include mem_limit and mem_reservation (or add a README note about using --compatibility) accordingly. --- Outside diff comments: In `@README.md`: - Around line 208-288: The README contains unresolved git merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> 8b154d5) that must be removed; resolve by merging the two retained sections so both "LLM-Based Question Generation" and the "Async vs Sync Endpoints / troubleshooting" content remain intact, delete the conflict markers and any duplicated lines, ensure the LLM-Based Question Generation heading and its JSON example and the Async vs Sync Endpoints block (including the Celery/Redis troubleshooting notes) are present and formatted consistently under their respective headings. --- Nitpick comments: In `@backend/inference_service.py`: - Around line 47-54: Wrap the synchronous Celery call in generate_mcq_sync (or wherever generate_mcq_task is invoked) with try/except: call generate_mcq_task.apply_async(...) then result.get(timeout=600) inside the try, catch celery.exceptions.TimeoutError to raise a clear application-level RuntimeError like "Task timed out after 10 minutes" (or similar), and catch other exceptions to log via the module logger (include exc_info=True) and re-raise a RuntimeError("Task execution failed") from the original exception; ensure CELERY_AVAILABLE check remains and import celery.exceptions.TimeoutError where used. - Around line 11-12: The module defines USE_ASYNC but never uses it, causing confusion; either remove the unused constant or integrate it into the service gating logic so async behavior is actually controlled by it (e.g., combine USE_ASYNC with CELERY_AVAILABLE when deciding whether to expose async wrappers or fall back to sync versions). Update references around CELERY_AVAILABLE and any wrapper export logic so the decision uses both USE_ASYNC and CELERY_AVAILABLE (or delete USE_ASYNC and any related comments) to keep the intent clear. - Around line 51-54: The repeated hardcoded 600-second timeout used when awaiting Celery task results (e.g., where generate_mcq_task.apply_async is called and result.get(timeout=600) is used) should be pulled into a module-level constant (suggested name TASK_TIMEOUT_SECONDS) and all occurrences updated to use that constant; add the constant at the top of the module and replace every result.get(timeout=600) (including the other similar calls) with result.get(timeout=TASK_TIMEOUT_SECONDS) so the timeout is configurable and maintained in one place. In `@backend/routes/async_generation_routes.py`: - Around line 295-300: The route currently returns "pending" whenever task_result.state == 'PENDING', but Celery uses PENDING for both queued and unknown task IDs; update the handler that checks task_result (the AsyncResult for task_id) to disambiguate by querying the backend for existence (e.g., use task_result.backend.get_task_meta(task_id) / result_backend.get or check Redis key for task meta) and if no metadata exists return a 404/errored JSON indicating unknown/expired task_id, otherwise keep returning 202 pending; update the code paths around task_result and task_id to perform this extra backend lookup and include a clear message when the task ID is not found. In `@backend/server.py`: - Around line 284-301: The loop using zip(input_questions, input_options) should use zip(..., strict=True) to prevent silent truncation if lengths differ; update the loop header to for question, options in zip(input_questions, input_options, strict=True): (the variables input_questions and input_options in the current loop) to provide defense-in-depth alongside the existing length check—ensure the runtime supports Python 3.10+ or provide an explicit length check fallback if not. In `@docker-compose.yml`: - Around line 55-59: The docker-compose currently uses depends_on with celery-worker: condition: service_started which only waits for container start; add a proper healthcheck to the celery-worker service (e.g., a command that verifies Celery is ready to accept tasks) and then change the backend's depends_on entry to reference celery-worker: condition: service_healthy; alternatively, if you prefer not to add a healthcheck, add documentation/error-handling noting that initial requests may fail until the worker finishes model loading. Ensure you modify the celery-worker service block to include a healthcheck section and update the backend depends_on from service_started to service_healthy. In `@Dockerfile.backend`: - Around line 27-33: The conditional in the RUN step checking for the s2v_old directory is ineffective during image build because COPY backend/ precedes it, so remove the redundant if-check and make the RUN download/extract commands unconditional (update the RUN block that references s2v_old, wget, tar, rm) or, if you need the conditional behavior for runtime-mounted volumes, replace the current conditional with a comment explaining that choice and keep the logic; ensure you modify the RUN line alongside WORKDIR and consider the interaction with the earlier COPY backend/ step. - Around line 1-13: Update the Dockerfile to avoid running as root and to reduce image size: modify the apt-get install RUN (the line starting with RUN apt-get update && apt-get install -y ...) to add --no-install-recommends and keep the apt-get clean up, create a non-root user/group (e.g., add_user or user creation block) after WORKDIR /app, chown the /app directory to that user, and switch to that non-root user with USER before the final CMD; preserve the existing FROM python:3.10-slim and current RUN structure but replace root operations with operations performed as root only for package install and file ownership, then run the app as the created non-root user. In `@Dockerfile.worker`: - Around line 1-13: Update the Dockerfile to mirror the security improvements from Dockerfile.backend: change the apt-get install invocation to use --no-install-recommends in the RUN step that installs gcc, g++, git, wget; create a dedicated non-root user (e.g., appuser), chown the /app working directory to that user, and add a USER appuser directive so the container runs as non-root; keep the existing apt-get update and cleanup (rm -rf /var/lib/apt/lists/*) but ensure the create-user and chown steps occur before switching USER.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID:
c0fff435-fe07-43e4-868f-cfd093bacfc9📒 Files selected for processing (13)
.dockerignoreDockerfile.backendDockerfile.workerREADME.mdbackend/.env.examplebackend/inference_service.pybackend/routes/async_generation_routes.pybackend/server.pybackend/start-dev.batbackend/start-dev.shbackend/tasks/inference_tasks.pydocker-compose.ymlrequirements.txt✅ Files skipped from review due to trivial changes (2)
- requirements.txt
- .dockerignore
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai resume |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
backend/routes/async_generation_routes.py (2)
200-214:⚠️ Potential issue | 🟡 MinorNormalize
use_mediawikiin/generate_all_asynctoo.The other submit handlers coerce
True/"1"to integer1, but Line 202 passes the raw JSON value through.backend/tasks/inference_tasks.pychecksif use_mediawiki == 1, so"1"silently skips MediaWiki fetching only for the combined async endpoint.🛠️ Minimal fix
- use_mediawiki = data.get("use_mediawiki", 0) + use_mediawiki = 1 if data.get("use_mediawiki") in (1, True, "1") else 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 200 - 214, The combined async endpoint passes the raw JSON value for use_mediawiki to generate_all_questions_task, causing non-integer truthy values (e.g. "1" or True) to not match the check in backend/tasks/inference_tasks.py (which expects use_mediawiki == 1); coerce/normalize use_mediawiki to an integer like the other handlers before sending it to generate_all_questions_task by converting the value to 1 or 0 (e.g., replicate the same normalization used elsewhere), updating the variable used in the call to generate_all_questions_task so downstream code receives an integer 1 when MediaWiki should be used and 0 otherwise.
11-18:⚠️ Potential issue | 🔴 CriticalPick one import mode and use it everywhere.
This file now imports
backend.tasks...andbackend.celery_worker, butbackend/server.pystill loads the blueprint asfrom routes.async_generation_routes import async_routes, andbackend/start-dev.shstill tells contributors to runpython server.py/celery -A celery_worker.celery_app .... In that script-based startup mode, thebackend.*imports here are not resolvable, so Flask can fail before it serves any request. Either convert the whole backend to package-mode startup, or keep these imports aligned with the existing local-module style. The same mismatch also appears inbackend/tasks/inference_tasks.pyLine 7.#!/bin/bash cd backend || exit 1 python - <<'PY' from pathlib import Path cwd = Path.cwd() for rel in ( "celery_worker.py", "tasks/inference_tasks.py", "backend/celery_worker.py", "backend/tasks/inference_tasks.py", ): print(f"{rel}: {'present' if (cwd / rel).exists() else 'missing'}") PY echo sed -n '1,20p' routes/async_generation_routes.py echo sed -n '20,35p' server.py echo sed -n '51,75p' start-dev.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/async_generation_routes.py` around lines 11 - 18, Imports in async_generation_routes.py mix package-style (backend.tasks...) with local-module startup elsewhere, causing import errors when running server.py; pick one mode and make imports consistent: either change the imports of generate_mcq_task, generate_boolq_task, generate_shortq_task, generate_all_questions_task and celery_app to local relative/local-module form used by server.py (e.g., from tasks.inference_tasks import ... and from celery_worker import celery_app) or convert the whole project to package-mode and update all call sites (including backend/tasks/inference_tasks.py line 7) to use the package-prefixed imports; update all occurrence of these symbols to match the chosen import style so imports resolve under the current startup script.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 111-113: The Celery branches in server.py are dispatching tasks
after the route already ran process_input_text, causing workers to re-run
MediaWiki preprocessing because use_mediawiki remains set; either stop
preprocessing before queuing to Celery or clear the use_mediawiki flag when
using Celery. Update the code paths that call generate_mcq_sync /
generate_mcq_async and any Celery dispatch logic so that when
USE_CELERY_INFERENCE is True you do not pass a preprocessed MediaWiki
payload—specifically, clear or set use_mediawiki=False before calling the Celery
task dispatch (and likewise for other branches mentioned), or move
process_input_text into the non-Celery (legacy) branch only, ensuring
backend/tasks/inference_tasks.py still receives the original user query, not the
summarized text.
- Around line 77-93: Add the missing import "from transformers import pipeline"
at the module top so pipeline is defined, and remove the duplicate QA model
initialization by deleting the second "if not USE_CELERY_INFERENCE: qa_model =
pipeline('question-answering')" block; ensure qa_model is only created once (the
earlier initialization inside the first USE_CELERY_INFERENCE guard) so symbols
to edit are the pipeline import and the duplicate qa_model assignment in
backend/server.py (alongside MCQGenerator, AnswerPredictor, BoolQGenerator,
ShortQGenerator, QuestionGenerator, and docs_service initializations).
In `@backend/start-dev.sh`:
- Around line 53-54: Update the documented Celery worker startup command so it
explicitly runs in single-process mode to avoid spawning multiple transformer
instances; change the example that references celery -A celery_worker.celery_app
worker --loglevel=info to include --pool=solo --concurrency=1 (and note that
higher concurrency is an explicit opt-in), ensuring the example that mentions
celery_worker.celery_app shows the single-process flags for development.
---
Duplicate comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 200-214: The combined async endpoint passes the raw JSON value for
use_mediawiki to generate_all_questions_task, causing non-integer truthy values
(e.g. "1" or True) to not match the check in backend/tasks/inference_tasks.py
(which expects use_mediawiki == 1); coerce/normalize use_mediawiki to an integer
like the other handlers before sending it to generate_all_questions_task by
converting the value to 1 or 0 (e.g., replicate the same normalization used
elsewhere), updating the variable used in the call to
generate_all_questions_task so downstream code receives an integer 1 when
MediaWiki should be used and 0 otherwise.
- Around line 11-18: Imports in async_generation_routes.py mix package-style
(backend.tasks...) with local-module startup elsewhere, causing import errors
when running server.py; pick one mode and make imports consistent: either change
the imports of generate_mcq_task, generate_boolq_task, generate_shortq_task,
generate_all_questions_task and celery_app to local relative/local-module form
used by server.py (e.g., from tasks.inference_tasks import ... and from
celery_worker import celery_app) or convert the whole project to package-mode
and update all call sites (including backend/tasks/inference_tasks.py line 7) to
use the package-prefixed imports; update all occurrence of these symbols to
match the chosen import style so imports resolve under the current startup
script.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ce44e88-546c-4932-b67c-58b4fab7875c
📒 Files selected for processing (14)
.dockerignore.gitattributesDockerfile.backendDockerfile.workerREADME.mdbackend/.env.examplebackend/inference_service.pybackend/routes/async_generation_routes.pybackend/server.pybackend/start-dev.batbackend/start-dev.shbackend/tasks/inference_tasks.pydocker-compose.ymlrequirements.txt
✅ Files skipped from review due to trivial changes (6)
- .gitattributes
- requirements.txt
- .dockerignore
- Dockerfile.worker
- Dockerfile.backend
- docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (3)
- README.md
- backend/start-dev.bat
- backend/inference_service.py
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/tasks/inference_tasks.py (1)
267-267: Consider list unpacking for consistency with server.py.Same minor style suggestion as in
server.py:- options_with_answer = options + [generated_answer] + options_with_answer = [*options, generated_answer]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tasks/inference_tasks.py` at line 267, The code builds options_with_answer using concatenation (options_with_answer = options + [generated_answer]); for consistency with server.py switch to list unpacking by constructing options_with_answer with [*options, generated_answer] — locate the occurrence of options_with_answer in inference_tasks.py (inside the inference/task function that uses options and generated_answer) and replace the concatenation pattern with the unpacking pattern.backend/server.py (1)
289-292: Consider list unpacking for consistency.The list concatenation works correctly. As a minor style improvement, you could use unpacking:
- options_with_answer = options + [generated_answer] + options_with_answer = [*options, generated_answer]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 289 - 292, Replace the list concatenation with list unpacking for consistency: instead of building options_with_answer by using options + [generated_answer], create it using unpacking (referencing options_with_answer, options, and generated_answer) so the subsequent call to TfidfVectorizer().fit_transform(vectorizer) receives the same sequence but follows the stylistic preference; ensure options is an iterable/list before unpacking to avoid TypeError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tasks/inference_tasks.py`:
- Around line 331-338: The loop in inference_tasks.py calls
answer_predictor.predict_boolean_answer with a string (question) but
predict_boolean_answer expects a list and returns a list of booleans; change the
call to pass a single-element list (e.g., {"input_text": input_text,
"input_question": [question]}) and then use the first boolean result
(qa_response[0]) when appending "True"/"False" to output so
predict_boolean_answer and its return shape are handled correctly.
---
Nitpick comments:
In `@backend/server.py`:
- Around line 289-292: Replace the list concatenation with list unpacking for
consistency: instead of building options_with_answer by using options +
[generated_answer], create it using unpacking (referencing options_with_answer,
options, and generated_answer) so the subsequent call to
TfidfVectorizer().fit_transform(vectorizer) receives the same sequence but
follows the stylistic preference; ensure options is an iterable/list before
unpacking to avoid TypeError.
In `@backend/tasks/inference_tasks.py`:
- Line 267: The code builds options_with_answer using concatenation
(options_with_answer = options + [generated_answer]); for consistency with
server.py switch to list unpacking by constructing options_with_answer with
[*options, generated_answer] — locate the occurrence of options_with_answer in
inference_tasks.py (inside the inference/task function that uses options and
generated_answer) and replace the concatenation pattern with the unpacking
pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43c08677-b505-4c44-8866-6c44302fff13
📒 Files selected for processing (5)
backend/inference_service.pybackend/routes/async_generation_routes.pybackend/server.pybackend/start-dev.shbackend/tasks/inference_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/inference_service.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tasks/inference_tasks.py`:
- Around line 261-277: The loop that compares QA-generated answers to
multiple-choice options can hit an unhelpful ValueError when options is empty;
inside the for question, options in zip(...) loop (before calling
TfidfVectorizer), add a defensive check for an empty options list (e.g., if not
options:) and handle it deterministically: either append the generated_answer
(from qa_response["answer"]) or a clear sentinel (None or "NO_OPTIONS") to
outputs and continue, or raise a new ValueError with a descriptive message
referencing the question; ensure this check is placed before building
options_with_answer and computing similarities so that similarities.argmax()
never runs on an empty sequence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 19c73cab-25a6-4cf7-90fa-34e157e68e17
📒 Files selected for processing (2)
backend/server.pybackend/tasks/inference_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/server.py
|
Hi @SxBxcoder, I’ve now addressed the key concerns you pointed out:
I’ve kept the changes incremental and aligned with the current architecture to avoid breaking anything. |
Addressed Issues:
Fixes #586
Summary
This PR introduces a distributed asynchronous inference pipeline to improve the scalability and responsiveness of EduAid’s AI question generation system.
Previously, question generation (MCQ, BoolQ, ShortQ) executed synchronously inside Flask endpoints, causing requests to remain blocked while transformer models performed inference.
This change moves inference to background Celery workers while the API immediately returns a task ID, allowing clients to retrieve results asynchronously.
Key Changes
New Components
backend/celery_worker.pyCelery configuration with Redis broker and worker lifecycle settings.
backend/tasks/inference_tasks.pyAsynchronous Celery tasks wrapping existing generator classes.
backend/routes/async_generation_routes.pyAsync API endpoints for submitting and retrieving inference tasks.
Updated Files
backend/server.pyrequirements.txtNew API Endpoints
Async Generation
Returns:
{ "task_id": "123abc", "status": "queued" }Task Monitoring
Implementation Highlights
Running the System Locally
Install dependencies:
Start Redis:
Run the Flask server:
Start Celery worker:
Additional Notes
This enhancement decouples the API layer from heavy ML inference, allowing EduAid to handle long-running model tasks and concurrent requests more efficiently.
AI Usage Disclosure
I have used the following AI tools:
Checklist
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Summary by CodeRabbit