Skip to content

Implement Distributed Asynchronous Inference Pipeline using Celery and Redis#587

Open
piyush06singhal wants to merge 11 commits intoAOSSIE-Org:mainfrom
piyush06singhal:feature/distributed-inference-pipeline
Open

Implement Distributed Asynchronous Inference Pipeline using Celery and Redis#587
piyush06singhal wants to merge 11 commits intoAOSSIE-Org:mainfrom
piyush06singhal:feature/distributed-inference-pipeline

Conversation

@piyush06singhal
Copy link
Copy Markdown

@piyush06singhal piyush06singhal commented Mar 16, 2026

Addressed Issues:

Fixes #586


Summary

This PR introduces a distributed asynchronous inference pipeline to improve the scalability and responsiveness of EduAid’s AI question generation system.

Previously, question generation (MCQ, BoolQ, ShortQ) executed synchronously inside Flask endpoints, causing requests to remain blocked while transformer models performed inference.

This change moves inference to background Celery workers while the API immediately returns a task ID, allowing clients to retrieve results asynchronously.


Key Changes

New Components

  • backend/celery_worker.py
    Celery configuration with Redis broker and worker lifecycle settings.

  • backend/tasks/inference_tasks.py
    Asynchronous Celery tasks wrapping existing generator classes.

  • backend/routes/async_generation_routes.py
    Async API endpoints for submitting and retrieving inference tasks.


Updated Files

  • backend/server.py

    • Registered async route blueprint.
  • requirements.txt

    • Added:
    celery==5.3.4
    redis==5.0.1
    

New API Endpoints

Async Generation

POST /generate_mcq_async
POST /generate_boolq_async
POST /generate_shortq_async
POST /generate_all_async

Returns:

{
  "task_id": "123abc",
  "status": "queued"
}

Task Monitoring

GET /task_status/<task_id>
GET /task_result/<task_id>

Implementation Highlights

  • Maintains full backward compatibility with existing synchronous endpoints.
  • Uses Celery + Redis for distributed task execution.
  • Implements lazy model initialization in workers to avoid repeated model loading.
  • Preserves existing GPU memory management.
  • Adds structured logging for task creation, execution, and failures.
  • Includes worker lifecycle management to prevent memory leaks.

Running the System Locally

Install dependencies:

pip install -r requirements.txt

Start Redis:

redis-server

Run the Flask server:

python backend/server.py

Start Celery worker:

celery -A backend.celery_worker.celery_app worker --loglevel=info --pool=solo

Additional Notes

This enhancement decouples the API layer from heavy ML inference, allowing EduAid to handle long-running model tasks and concurrent requests more efficiently.


AI Usage Disclosure

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and verified the implementation locally.

I have used the following AI tools:

  • ChatGPT (architecture discussion and review)

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Summary by CodeRabbit

  • New Features
    • Async generation endpoints with task queuing, status/result polling, and a health endpoint showing inference mode; background worker support and synchronous helpers that wait for task results.
  • Documentation
    • Expanded backend setup: Docker Compose, manual Celery workflow, env examples, quick-start scripts, and troubleshooting.
  • Bug Fixes
    • Corrected NLP resource download.
  • Chores
    • Added container configs, worker image, .dockerignore, dev start scripts, and pinned Celery/Redis dependencies.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

Warning

Rate limit exceeded

@piyush06singhal has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 16 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90fb6560-9699-4f56-9927-f44c94384e77

📥 Commits

Reviewing files that changed from the base of the PR and between fac0ab9 and b0ccc2a.

📒 Files selected for processing (1)
  • backend/tasks/inference_tasks.py
📝 Walkthrough

Walkthrough

Adds a Redis-backed Celery async inference pipeline: Celery app and worker, Celery tasks for generation/answering/hardening, Flask async endpoints (enqueue/status/result) and sync wrappers, dev Dockerfiles/compose and start scripts, env template, requirements additions, and README/tooling updates for async vs legacy modes.

Changes

Cohort / File(s) Summary
Celery runtime & config
backend/celery_worker.py, requirements.txt
New Celery app celery_app configured via REDIS_HOST/PORT/DB -> REDIS_URL; JSON serializers, UTC, task time limits, prefetch=1, max tasks per child, and exported Redis-related variables; celery==5.3.4 and redis==5.0.1 added.
Worker image & compose
Dockerfile.worker, Dockerfile.backend, docker-compose.yml, .dockerignore
New worker and backend Dockerfiles (non-root user, NLTK downloads, optional Sense2Vec extraction); docker-compose defines redis, celery-worker, and backend services with networking, volumes, healthchecks and resource hints; .dockerignore added.
Celery tasks implementation
backend/tasks/__init__.py, backend/tasks/inference_tasks.py
Adds GeneratorTask base with lazy per-worker initialization (generators, AnswerPredictor, QuestionGenerator, QA pipeline, MediaWiki client), process_input_text, and many Celery tasks for generate/predict/harden (MCQ/BoolQ/ShortQ/all) with logging and error handling.
API integration & sync wrappers
backend/routes/__init__.py, backend/routes/async_generation_routes.py, backend/inference_service.py, backend/server.py
Adds async_routes Blueprint with enqueue endpoints, task status/result endpoints, input validation and 202/400/500 semantics; inference_service provides *_sync blocking wrappers that call Celery tasks (600s timeout); server.py branches on USE_CELERY_INFERENCE, registers async routes, adjusts model initialization, and adds /health.
Dev tooling & env
backend/.env.example, backend/start-dev.sh, backend/start-dev.bat
Adds env template for Redis and async flags, plus dev start scripts that ensure .env, check Redis, and show worker commands for Celery vs legacy flows.
Docs & repo metadata
README.md, .gitattributes
README expanded with Docker Compose/manual Celery/legacy instructions, troubleshooting, and Async vs Sync section; .gitattributes enforces eol for .sh/.bat.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Flask as "Flask API"
    participant Redis as "Redis Broker/Backend"
    participant Worker as "Celery Worker"
    participant Models as "AI Generators"

    Client->>Flask: POST /generate_mcq_async (input_text, params)
    Note over Flask: validate input, enqueue task
    Flask->>Redis: Enqueue task (task_id)
    Flask-->>Client: 202 Accepted (task_id)

    rect rgba(100,150,200,0.5)
    Redis->>Worker: Worker fetches task
    Worker->>Models: Lazy-load generators
    Worker->>Models: Run inference (MCQ/BoolQ/ShortQ)
    Worker->>Redis: Store result / mark task success
    end

    Client->>Flask: GET /task_status/{task_id}
    Flask->>Redis: Query task state
    Redis-->>Flask: Return state
    Flask-->>Client: 200 OK (state)

    Client->>Flask: GET /task_result/{task_id}
    Flask->>Redis: Retrieve result
    Redis-->>Flask: Return result
    Flask-->>Client: 200 OK (questions)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I queued the questions, soft and bright,
Redis hums while workers bite,
Celery wakes the generator choir,
Tasks hop slow and never tire,
Hop! Async answers saved the night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main objective: implementing a distributed asynchronous inference pipeline using Celery and Redis, which is the core focus across all file changes in this PR.
Linked Issues check ✅ Passed The PR fully addresses issue #586 requirements: implements a distributed asynchronous pipeline with Celery/Redis, provides async endpoints for task submission and result retrieval, reuses existing generator classes, preserves backward compatibility, includes worker lifecycle controls, and adds Docker Compose and documentation for setup.
Out of Scope Changes check ✅ Passed All changes are within scope: new async infrastructure (celery_worker.py, inference_tasks.py, async_generation_routes.py), integration into Flask server, Dockerfiles for containerization, environment configuration, and supporting documentation. Only in-scope additions supporting the distributed inference feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 50-51: The handlers that currently do logger.error(f"Error
creating MCQ task: {str(e)}", exc_info=True) followed by return
jsonify({"error": str(e)}), 500 should stop returning raw exception text to
clients; keep the detailed str(e) in logs (using logger.error(...,
exc_info=True)) but return a sanitized JSON error (e.g., jsonify({"error":
"Internal server error"}), 500) or a short safe message and optionally a
correlation id. Update all similar branches (the ones around logger.error /
return jsonify at the MCQ/task creation handlers and the other occurrences you
noted) so they log the full exception but return a generic message or an error
code to the client; use the existing logger.error and the handler function names
to locate each spot.
- Around line 34-35: The numeric parameters read via data.get (e.g.,
max_questions at the shown occurrence and the other numeric params at the other
occurrences) must be parsed to integers, validated and clamped before
enqueueing: replace direct uses of data.get("...") with parsing logic that
attempts int(...) and on ValueError returns a 400 error, then enforce a safe
min/max range (e.g., min 1, sensible upper cap) and use the clamped value;
update places where max_questions and the other numeric parameters are used so
they consume the validated/clamped integers (ensure validation lives in the
route handler that prepares the task payload, not inside the worker).

In `@backend/tasks/inference_tasks.py`:
- Line 8: The import uses the wrong top-level module name; replace occurrences
of "from celery_worker import celery_app" with an explicit package import "from
backend.celery_worker import celery_app" so the module resolves from the package
root; update this in backend/tasks/inference_tasks.py (symbol: celery_app) and
the other backend modules mentioned (server.py and async_generation_routes.py)
to consistently use the backend.* namespace.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1276eb2e-d8bc-4359-9b49-b60c50096e1b

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and 119a4b1.

📒 Files selected for processing (7)
  • backend/celery_worker.py
  • backend/routes/__init__.py
  • backend/routes/async_generation_routes.py
  • backend/server.py
  • backend/tasks/__init__.py
  • backend/tasks/inference_tasks.py
  • requirements.txt

Comment thread backend/routes/async_generation_routes.py Outdated
Comment thread backend/routes/async_generation_routes.py Outdated
Comment thread backend/tasks/inference_tasks.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
backend/tasks/inference_tasks.py (3)

97-99: Use explicit conversion flag in f-string.

Per Ruff RUF010, f"{str(e)}" should use the explicit !s conversion flag instead of wrapping with str().

♻️ Suggested fix
     except Exception as e:
-        logger.error(f"Error in MCQ generation: {str(e)}", exc_info=True)
+        logger.error(f"Error in MCQ generation: {e!s}", exc_info=True)
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` around lines 97 - 99, Replace the explicit
str() conversion inside the f-string in the exception handler with the !s
conversion flag; locate the except block around MCQ generation that reads
"except Exception as e:" (using the logger variable) and change the log call to
use f"{e!s}" instead of f"{str(e)}" so it complies with Ruff RUF010 and
preserves the same message and exc_info behavior.

16-20: Remove unused module-level globals.

These globals (MCQGen, BoolQGen, ShortQGen, mediawikiapi) are declared but never used. The GeneratorTask class manages generator instances via instance attributes (_mcq_gen, _boolq_gen, etc.) with lazy initialization through properties.

♻️ Suggested fix
 logger = logging.getLogger(__name__)
 
-# Initialize generators (these will be loaded once per worker)
-MCQGen = None
-BoolQGen = None
-ShortQGen = None
-mediawikiapi = None
-

 class GeneratorTask(Task):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` around lines 16 - 20, Remove the unused
module-level globals MCQGen, BoolQGen, ShortQGen, and mediawikiapi from the top
of the file and rely on the GeneratorTask instance attributes (_mcq_gen,
_boolq_gen, _shortq_gen, etc.) and their lazy-initializing properties for
generator access; ensure no other code in this module references the removed
globals (update any stray references to use the GeneratorTask properties or
instance fields) so all generator management is encapsulated in the
GeneratorTask class.

6-6: Unused import: torch

The torch module is imported but never used in this file. The GPU memory management (torch.cuda.empty_cache()) is handled inside the generator classes themselves.

♻️ Suggested fix
 import logging
-import torch
 from celery import Task
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` at line 6, Remove the unused top-level
import of torch in inference_tasks.py; specifically delete the import statement
"import torch" at the top of the file so that the module no longer imports torch
unnecessarily (GPU memory management and torch.cuda.empty_cache() remain handled
inside the generator classes that manage their own cleanup).
backend/routes/async_generation_routes.py (3)

67-70: Minor: Use explicit conversion flag in f-strings.

Multiple lines use f"{str(e)}" pattern (lines 67, 70, 100, 103, 133, 136, 174, 177, 219, 274). Per Ruff RUF010, prefer f"{e!s}" for cleaner code.

Example fix:

-        logger.warning(f"Validation error in MCQ task: {str(e)}")
+        logger.warning(f"Validation error in MCQ task: {e!s}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 67 - 70, Replace
occurrences of f"{str(e)}" with the explicit string-conversion f"{e!s}" in the
exception handling and logging around MCQ task creation and other handlers to
satisfy Ruff RUF010; specifically update the logger.warning, logger.error and
any jsonify error responses that use f"{str(e)}" (e.g., inside the exception
blocks in async_generation_routes.py handling MCQ task creation and similar
try/except handlers) so they use f"{e!s}" instead.

26-34: Preserve exception context with raise ... from.

When re-raising a ValueError from within the except block, the original exception context is lost. Use raise ... from to preserve the exception chain for better debugging.

♻️ Suggested fix
     try:
         value = int(data.get(key, default))
     except (TypeError, ValueError):
-        raise ValueError(f"{key} must be an integer")
+        raise ValueError(f"{key} must be an integer") from None

Using from None explicitly suppresses the chained exception (since the user-facing message is intentionally generic), while from err would chain it if you want debugging context.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 26 - 34, The except
block currently swallows the original TypeError/ValueError; modify the
try/except that parses int(...) so you capture the caught exception (e.g.,
except (TypeError, ValueError) as err:) and re-raise the new ValueError using
"raise ValueError(f\"{key} must be an integer\") from err" (or use "from None"
if you intentionally want to suppress chaining). Update the try/except around
int(data.get(key, default)) accordingly so the exception chain is preserved or
explicitly suppressed as desired.

47-51: Consider validating use_mediawiki parameter.

The use_mediawiki parameter is passed directly without validation. While the task safely handles it (uses == 1 comparison), validating at the API boundary would provide clearer error messages for invalid inputs.

♻️ Suggested pattern
         input_text = data.get("input_text", "")
-        use_mediawiki = data.get("use_mediawiki", 0)
+        use_mediawiki = 1 if data.get("use_mediawiki") in (1, True, "1") else 0
         max_questions = _parse_bounded_int(data, "max_questions", 4)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 47 - 51, The handler
reads use_mediawiki from the JSON payload without validation—add input
validation for use_mediawiki in async_generation_routes.py right after
request.get_json(silent=True) so the variable is guaranteed to be 0 or 1 (accept
true/false booleans and numeric "0"/"1" strings by coercion), and return a 400
BadRequest with a clear message when the value cannot be parsed; update the
local variable use_mediawiki and any downstream checks that expect an int (the
name use_mediawiki is the unique symbol to change) so the rest of the function
can safely use comparisons like use_mediawiki == 1.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 193-195: The runtime imports "from celery_worker import
celery_app" inside get_task_status and get_task_result are inconsistent and will
cause ModuleNotFoundError; move the import to module-level using the correct
module path (import celery_app from backend.celery_worker) and remove the
in-function imports in the get_task_status and get_task_result functions so both
functions reference the module-level celery_app; update any existing top-level
import section to include "from backend.celery_worker import celery_app" and
delete the two runtime import lines.

In `@backend/tasks/inference_tasks.py`:
- Around line 183-216: Remove the unnecessary f-string prefixes on the two
logger.info calls (the one before processing and the one after result creation)
so they are plain strings, and change the error logging to use logging's
explicit formatting or exception helper: replace logger.error(f"Error in
combined generation: {str(e)}", exc_info=True) with either logger.error("Error
in combined generation: %s", e, exc_info=True) or logger.exception("Error in
combined generation") — update the try/except block that calls
self.mcq_gen.generate_mcq, self.boolq_gen.generate_boolq, and
self.shortq_gen.generate_shortq accordingly.

---

Nitpick comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 67-70: Replace occurrences of f"{str(e)}" with the explicit
string-conversion f"{e!s}" in the exception handling and logging around MCQ task
creation and other handlers to satisfy Ruff RUF010; specifically update the
logger.warning, logger.error and any jsonify error responses that use
f"{str(e)}" (e.g., inside the exception blocks in async_generation_routes.py
handling MCQ task creation and similar try/except handlers) so they use f"{e!s}"
instead.
- Around line 26-34: The except block currently swallows the original
TypeError/ValueError; modify the try/except that parses int(...) so you capture
the caught exception (e.g., except (TypeError, ValueError) as err:) and re-raise
the new ValueError using "raise ValueError(f\"{key} must be an integer\") from
err" (or use "from None" if you intentionally want to suppress chaining). Update
the try/except around int(data.get(key, default)) accordingly so the exception
chain is preserved or explicitly suppressed as desired.
- Around line 47-51: The handler reads use_mediawiki from the JSON payload
without validation—add input validation for use_mediawiki in
async_generation_routes.py right after request.get_json(silent=True) so the
variable is guaranteed to be 0 or 1 (accept true/false booleans and numeric
"0"/"1" strings by coercion), and return a 400 BadRequest with a clear message
when the value cannot be parsed; update the local variable use_mediawiki and any
downstream checks that expect an int (the name use_mediawiki is the unique
symbol to change) so the rest of the function can safely use comparisons like
use_mediawiki == 1.

In `@backend/tasks/inference_tasks.py`:
- Around line 97-99: Replace the explicit str() conversion inside the f-string
in the exception handler with the !s conversion flag; locate the except block
around MCQ generation that reads "except Exception as e:" (using the logger
variable) and change the log call to use f"{e!s}" instead of f"{str(e)}" so it
complies with Ruff RUF010 and preserves the same message and exc_info behavior.
- Around line 16-20: Remove the unused module-level globals MCQGen, BoolQGen,
ShortQGen, and mediawikiapi from the top of the file and rely on the
GeneratorTask instance attributes (_mcq_gen, _boolq_gen, _shortq_gen, etc.) and
their lazy-initializing properties for generator access; ensure no other code in
this module references the removed globals (update any stray references to use
the GeneratorTask properties or instance fields) so all generator management is
encapsulated in the GeneratorTask class.
- Line 6: Remove the unused top-level import of torch in inference_tasks.py;
specifically delete the import statement "import torch" at the top of the file
so that the module no longer imports torch unnecessarily (GPU memory management
and torch.cuda.empty_cache() remain handled inside the generator classes that
manage their own cleanup).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdda85f4-0851-4060-b89e-900855ddb3fe

📥 Commits

Reviewing files that changed from the base of the PR and between 119a4b1 and 4587983.

📒 Files selected for processing (2)
  • backend/routes/async_generation_routes.py
  • backend/tasks/inference_tasks.py

Comment thread backend/routes/async_generation_routes.py
Comment thread backend/tasks/inference_tasks.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
backend/tasks/inference_tasks.py (3)

123-125: Use explicit conversion flag for consistency.

Line 124 uses {str(e)} while other error handlers in this file use {e!s}. Apply consistent style.

♻️ Suggested fix
     except Exception as e:
-        logger.error(f"Error in BoolQ generation: {str(e)}", exc_info=True)
+        logger.error(f"Error in BoolQ generation: {e!s}", exc_info=True)
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` around lines 123 - 125, The log call in the
BoolQ error handler uses inconsistent exception formatting; update the
logger.error invocation in the except block that handles BoolQ generation so it
uses the same explicit conversion flag `{e!s}` instead of `str(e)`, e.g., change
the f-string in the except Exception as e block (the logger.error line) to use
`{e!s}` to match other handlers and maintain consistent style across the file.

54-59: Consider adding error handling for MediaWiki API failures.

The mediawiki_instance.summary() call can fail due to network issues, rate limiting, or invalid page titles. Currently, such failures would cause the entire task to fail.

Consider catching MediaWiki-specific exceptions and either falling back to the original input or raising a more descriptive error:

💡 Suggested pattern
 def process_input_text(input_text, use_mediawiki, mediawiki_instance):
     """Process input text, optionally fetching from MediaWiki."""
     if use_mediawiki == 1:
         logger.info(f"Fetching MediaWiki summary for: {input_text}")
-        input_text = mediawiki_instance.summary(input_text, 8)
+        try:
+            input_text = mediawiki_instance.summary(input_text, 8)
+        except Exception as e:
+            logger.warning(f"MediaWiki fetch failed, using original input: {e!s}")
+            # Optionally re-raise or return original input_text as fallback
     return input_text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` around lines 54 - 59, The MediaWiki summary
call in process_input_text can raise network/API errors; wrap the
mediawiki_instance.summary(input_text, 8) call in a try/except that catches
MediaWiki-specific exceptions (and a general Exception fallback), log the error
via logger.error with the input_text and exception details, and on failure
either return the original input_text (fallback behavior) or re-raise a more
descriptive exception; ensure you reference process_input_text and
mediawiki_instance.summary when making the change and keep the behavior
consistent with existing logging/error conventions.

156-158: Use explicit conversion flag for consistency.

Same as above - use {e!s} instead of {str(e)}.

♻️ Suggested fix
     except Exception as e:
-        logger.error(f"Error in ShortQ generation: {str(e)}", exc_info=True)
+        logger.error(f"Error in ShortQ generation: {e!s}", exc_info=True)
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` around lines 156 - 158, Replace the
explicit str(e) conversion in the exception log with the explicit conversion
flag `{e!s}` to ensure consistent stringification; locate the except block
handling "ShortQ generation" where logger.error is called (the line using
logger.error(f"Error in ShortQ generation: {str(e)}", exc_info=True)), and
change the formatted string to use `{e!s}` instead of `str(e)` so the message
becomes f"Error in ShortQ generation: {e!s}" while keeping exc_info=True and
re-raising the exception.
backend/routes/async_generation_routes.py (2)

236-241: Consider clarifying behavior for unknown task IDs.

Celery's AsyncResult returns PENDING state for both genuinely pending tasks AND unknown/invalid task IDs. This means clients cannot distinguish between "task is queued" and "task ID doesn't exist."

This is a known Celery behavior and may be acceptable, but consider documenting this in the API response or docstring so clients understand the ambiguity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 236 - 241, When
task_result.state == 'PENDING' this can mean either a queued task or an
unknown/invalid task_id; update the route handler in async_generation_routes
(the block that checks task_result.state and returns 202) to make that ambiguity
explicit by changing the response message to something like "Task is waiting to
be executed or the task_id is unknown/invalid", and also add a brief
docstring/comment on the route explaining Celery's PENDING semantics so clients
understand the ambiguity (refer to task_result and task_id in the comment).

71-71: Consider using consistent f-string conversion style.

Several logging statements use {str(e)} while others use {e!s}. For consistency, prefer the explicit conversion flag {e!s} throughout. Affected lines: 71, 101, 104, 134, 137, 175, 178, 219, 273.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` at line 71, Replace inconsistent
f-string exception conversion by using the explicit conversion flag {e!s}
wherever logger.error currently interpolates exceptions with {str(e)}; locate
logger.error calls in async_generation_routes.py that format exception variable
e (e.g., the one logging "Error creating MCQ task") and change f"...{str(e)}" to
f"...{e!s}" so all logs use the same {e!s} style while keeping exc_info=True
intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Line 84: The endpoints BoolQ, ShortQ, and generate_all currently assign
use_mediawiki with data.get("use_mediawiki", 0) which can leave it as True or
"1" causing process_input_text's check if use_mediawiki == 1 to miss them;
normalize use_mediawiki the same way as MCQ (e.g., use_mediawiki = 1 if
data.get("use_mediawiki") in (1, True, "1") else 0) in each handler so
process_input_text sees a consistent integer 0/1 value; update the assignments
in the BoolQ, ShortQ, and generate_all handlers (where use_mediawiki is read) to
match the MCQ normalization and keep process_input_text unchanged.

---

Nitpick comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 236-241: When task_result.state == 'PENDING' this can mean either
a queued task or an unknown/invalid task_id; update the route handler in
async_generation_routes (the block that checks task_result.state and returns
202) to make that ambiguity explicit by changing the response message to
something like "Task is waiting to be executed or the task_id is
unknown/invalid", and also add a brief docstring/comment on the route explaining
Celery's PENDING semantics so clients understand the ambiguity (refer to
task_result and task_id in the comment).
- Line 71: Replace inconsistent f-string exception conversion by using the
explicit conversion flag {e!s} wherever logger.error currently interpolates
exceptions with {str(e)}; locate logger.error calls in
async_generation_routes.py that format exception variable e (e.g., the one
logging "Error creating MCQ task") and change f"...{str(e)}" to f"...{e!s}" so
all logs use the same {e!s} style while keeping exc_info=True intact.

In `@backend/tasks/inference_tasks.py`:
- Around line 123-125: The log call in the BoolQ error handler uses inconsistent
exception formatting; update the logger.error invocation in the except block
that handles BoolQ generation so it uses the same explicit conversion flag
`{e!s}` instead of `str(e)`, e.g., change the f-string in the except Exception
as e block (the logger.error line) to use `{e!s}` to match other handlers and
maintain consistent style across the file.
- Around line 54-59: The MediaWiki summary call in process_input_text can raise
network/API errors; wrap the mediawiki_instance.summary(input_text, 8) call in a
try/except that catches MediaWiki-specific exceptions (and a general Exception
fallback), log the error via logger.error with the input_text and exception
details, and on failure either return the original input_text (fallback
behavior) or re-raise a more descriptive exception; ensure you reference
process_input_text and mediawiki_instance.summary when making the change and
keep the behavior consistent with existing logging/error conventions.
- Around line 156-158: Replace the explicit str(e) conversion in the exception
log with the explicit conversion flag `{e!s}` to ensure consistent
stringification; locate the except block handling "ShortQ generation" where
logger.error is called (the line using logger.error(f"Error in ShortQ
generation: {str(e)}", exc_info=True)), and change the formatted string to use
`{e!s}` instead of `str(e)` so the message becomes f"Error in ShortQ generation:
{e!s}" while keeping exc_info=True and re-raising the exception.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76670ed4-8fcd-4570-8e99-7071a03a7b56

📥 Commits

Reviewing files that changed from the base of the PR and between 4587983 and 0ba86b0.

📒 Files selected for processing (2)
  • backend/routes/async_generation_routes.py
  • backend/tasks/inference_tasks.py

Comment thread backend/routes/async_generation_routes.py Outdated
@piyush06singhal
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@piyush06singhal
Copy link
Copy Markdown
Author

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/routes/async_generation_routes.py (1)

71-71: Replace {str(e)} with {e!s} (or use logger %s formatting) to resolve Ruff RUF010 warnings.

All 9 instances in the file use redundant str() calls inside f-strings. Using the {e!s} conversion flag or switching to logger's %s formatting is cleaner and aligns with Python logging best practices.

Affected lines: 71, 101, 104, 134, 137, 175, 178, 219, 273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` at line 71, Replace redundant
str(e) calls inside f-strings used in logger.error across
async_generation_routes.py (the logger.error lines that log exceptions such as
the "Error creating MCQ task" call and the eight other occurrences) by either
using the conversion flag {e!s} inside the f-string or, preferably, switch to
logger's %-style formatting (e.g., logger.error("... %s", e, exc_info=True));
update each logger.error invocation (the nine instances flagged) accordingly so
no str(e) calls remain and keep exc_info=True as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 49-55: The code assumes input_text is a string and calls .strip()
directly; replace the raw access from request.get_json(...) and manual strip
checks with a call to the existing _parse_input_text(...) helper to validate
type and enforce a payload size cap before enqueueing. Specifically, in
backend/routes/async_generation_routes.py use _parse_input_text(data,
"input_text") (or the function signature used elsewhere) instead of
data.get("input_text", "") in the BoolQ, ShortQ and combined endpoints, ensure
invalid types or oversize payloads return a 400 JSON error, and keep
use_mediawiki and max_questions parsing unchanged.

In `@backend/tasks/inference_tasks.py`:
- Around line 58-62: The code in process_input_text has invalid literal '+'
characters at the start of lines (around the mediawiki_instance.summary call and
the except block) which breaks import; remove those leading '+' characters and
fix indentation so the try/except is a normal Python block (call
mediawiki_instance.summary(input_text, 8) inside the try, catch Exception as e
and log via logger.warning with the error), preserving the fallback behavior of
using the original input_text and not re-raising unless intended; update the
try/except around mediawiki_instance.summary to match surrounding function
indentation and ensure the module can import and Celery tasks register.

---

Nitpick comments:
In `@backend/routes/async_generation_routes.py`:
- Line 71: Replace redundant str(e) calls inside f-strings used in logger.error
across async_generation_routes.py (the logger.error lines that log exceptions
such as the "Error creating MCQ task" call and the eight other occurrences) by
either using the conversion flag {e!s} inside the f-string or, preferably,
switch to logger's %-style formatting (e.g., logger.error("... %s", e,
exc_info=True)); update each logger.error invocation (the nine instances
flagged) accordingly so no str(e) calls remain and keep exc_info=True as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1fc0e47b-29df-443b-947f-0eef6a2fab07

📥 Commits

Reviewing files that changed from the base of the PR and between 0ba86b0 and 40b8c95.

📒 Files selected for processing (2)
  • backend/routes/async_generation_routes.py
  • backend/tasks/inference_tasks.py

Comment thread backend/routes/async_generation_routes.py Outdated
Comment thread backend/tasks/inference_tasks.py Outdated
@SxBxcoder
Copy link
Copy Markdown
Contributor

Hi @piyush06singhal This is a very ambitious and well-written PR. Moving to an async Celery pipeline is definitely the right long-term play for scaling inference.

From a repository health and architectural perspective, I have a few vital questions before we integrate this into main:

  1. Local Setup Friction: This makes Redis and Celery hard dependencies. Without providing a docker-compose.yml, forcing new student contributors to natively install and run a Redis broker just to boot the backend might severely block onboarding. Have we updated the README.md setup scripts to handle this?
  2. VRAM/OOM Management: I see GeneratorTask lazy-initializes the ML models inside the Celery worker. If the Flask server AND the Celery worker are running locally, are we duplicating these massive 1.2GB+ transformer models in memory? This will instantly cause OOM crashes on standard 8GB RAM laptops, bypassing the thread-safe Singleton cache i also built in PR perf: Implement centralized lazy loading for all ML models to resolve OOM startup crashes #456.
  3. Frontend Implementation: The new async endpoints return a task_id, but is the React frontend actually equipped to handle this yet? If the UI isn't updated to poll /task_status, merging this might break the quiz generation flow for end-users.

Would love to discuss how we can safely stage this massive architectural shift without breaking the local dev environment.

@piyush06singhal
Copy link
Copy Markdown
Author

piyush06singhal commented Mar 17, 2026

Hi @SxBxcoder ,

Thank you for the detailed review really appreciate you taking the time to go through this so thoroughly.

Regarding your points:

1. Local Setup Friction
You're absolutely right, introducing Redis and Celery adds setup overhead.
Currently, this PR does not include a docker-compose.yml, but I agree that this is important for smoother onboarding. I can add:

  • A minimal docker-compose setup for Redis + worker
  • Updated README with step-by-step instructions

2. VRAM / OOM Concerns
Good catch. Right now, models are lazily initialised inside each Celery worker, which can indeed lead to duplication if both Flask and workers load models separately.

To address this, I see two possible improvements:

Happy to align this with the existing architecture to avoid memory overhead.

3. Frontend Compatibility
Yes, currently the frontend is not yet adapted for async polling (task_id → status → result).
However, this PR keeps all existing synchronous endpoints intact, so current UI flows remain unaffected.

The async endpoints are introduced as opt-in, allowing gradual frontend migration without breaking existing functionality.

I agree this is a significant architectural shift, and I’m open to staging it more safely
for example:

  • Adding Docker + docs first
  • Ensuring memory-safe model handling
  • Then, gradually integrating async usage in the frontend

Let me know how you'd prefer to proceed. Happy to iterate on this.

Thanks again for the insightful feedback!

@SxBxcoder
Copy link
Copy Markdown
Contributor

That staged approach sounds good, @piyush06singhal . Adding the minimal docker-compose setup and ensuring the Flask container doesn't duplicate the ML models in memory will make this bulletproof for local devs. Looking forward to seeing the updated iteration—feel free to ping me when it's ready for another look.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

208-288: ⚠️ Potential issue | 🔴 Critical

Unresolved merge conflict must be fixed before merging.

The README contains unresolved git merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> 8b154d5). This will render incorrectly and confuse users.

Both sections contain valuable content—the LLM-based question generation documentation and the troubleshooting/async endpoints documentation should both be preserved.

🔧 Resolve by keeping both sections

Remove the conflict markers and keep both sections:

-<<<<<<< HEAD
 - If the script fails to run, ensure that you have execution permissions:
   ```bash
   chmod +x script.sh

LLM-Based Question Generation

... (keep this entire section)
-=======
+
+---
+
Issue: "Celery is not available" error
... (keep this entire section through the async/sync endpoints docs)

->>>>>>> 8b154d5 (improve async pipeline with memory optimization, docker setup, and safe rollout toggle)

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 208 - 288, The README contains unresolved git merge
conflict markers (<<<<<<< HEAD, =======, >>>>>>> 8b154d5) that must be removed;
resolve by merging the two retained sections so both "LLM-Based Question
Generation" and the "Async vs Sync Endpoints / troubleshooting" content remain
intact, delete the conflict markers and any duplicated lines, ensure the
LLM-Based Question Generation heading and its JSON example and the Async vs Sync
Endpoints block (including the Celery/Redis troubleshooting notes) are present
and formatted consistently under their respective headings.


</details>

</blockquote></details>

</blockquote></details>
🧹 Nitpick comments (9)
backend/server.py (1)

284-301: Add strict=True to zip() for safety.

Using zip() without strict=True silently truncates if lists have different lengths. While line 276 validates lengths match, explicit strict=True provides defense-in-depth.

♻️ Add strict parameter
-        for question, options in zip(input_questions, input_options):
+        for question, options in zip(input_questions, input_options, strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 284 - 301, The loop using
zip(input_questions, input_options) should use zip(..., strict=True) to prevent
silent truncation if lengths differ; update the loop header to for question,
options in zip(input_questions, input_options, strict=True): (the variables
input_questions and input_options in the current loop) to provide
defense-in-depth alongside the existing length check—ensure the runtime supports
Python 3.10+ or provide an explicit length check fallback if not.
Dockerfile.backend (2)

27-33: Sense2Vec conditional check is ineffective during Docker build.

The if [ ! -d "s2v_old" ] check will always be true in a fresh build since COPY backend/ happens just before. The check is harmless but adds no value—consider removing it for clarity, or keep it if you intend to use this Dockerfile with pre-existing volumes.

♻️ Simplify by removing redundant conditional
 # Download and extract Sense2Vec model if not present
 WORKDIR /app/backend
-RUN if [ ! -d "s2v_old" ]; then \
-    wget https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz && \
+RUN wget https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz && \
     tar -xzf s2v_reddit_2015_md.tar.gz && \
-    rm s2v_reddit_2015_md.tar.gz; \
-    fi
+    rm s2v_reddit_2015_md.tar.gz
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.backend` around lines 27 - 33, The conditional in the RUN step
checking for the s2v_old directory is ineffective during image build because
COPY backend/ precedes it, so remove the redundant if-check and make the RUN
download/extract commands unconditional (update the RUN block that references
s2v_old, wget, tar, rm) or, if you need the conditional behavior for
runtime-mounted volumes, replace the current conditional with a comment
explaining that choice and keep the logic; ensure you modify the RUN line
alongside WORKDIR and consider the interaction with the earlier COPY backend/
step.

1-13: Add non-root user and --no-install-recommends for security and image size.

Running containers as root is a security risk. Additionally, --no-install-recommends reduces image size by avoiding unnecessary packages.

🔒 Proposed security and optimization improvements
 # Dockerfile for Flask backend server
 FROM python:3.10-slim

 # Set working directory
 WORKDIR /app

+# Create non-root user
+RUN groupadd -r eduaid && useradd -r -g eduaid eduaid
+
 # Install system dependencies
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     gcc \
     g++ \
     git \
     wget \
     && rm -rf /var/lib/apt/lists/*

Then before CMD, add:

+# Switch to non-root user
+RUN chown -R eduaid:eduaid /app
+USER eduaid
+
 # Run Flask server
 CMD ["python", "server.py"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.backend` around lines 1 - 13, Update the Dockerfile to avoid
running as root and to reduce image size: modify the apt-get install RUN (the
line starting with RUN apt-get update && apt-get install -y ...) to add
--no-install-recommends and keep the apt-get clean up, create a non-root
user/group (e.g., add_user or user creation block) after WORKDIR /app, chown the
/app directory to that user, and switch to that non-root user with USER before
the final CMD; preserve the existing FROM python:3.10-slim and current RUN
structure but replace root operations with operations performed as root only for
package install and file ownership, then run the app as the created non-root
user.
docker-compose.yml (1)

55-59: service_started doesn't guarantee Celery worker is ready.

The backend depends on celery-worker: condition: service_started, but this only waits for the container to start, not for Celery to be ready to accept tasks. Model loading in the worker can take significant time.

Consider adding a healthcheck to the celery-worker service or documenting that initial requests may fail during worker startup.

🩺 Add Celery worker healthcheck
   celery-worker:
     build:
       context: .
       dockerfile: Dockerfile.worker
     container_name: eduaid-celery-worker
     depends_on:
       redis:
         condition: service_healthy
+    healthcheck:
+      test: ["CMD", "celery", "-A", "celery_worker.celery_app", "inspect", "ping", "-d", "celery@$$HOSTNAME"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s

Then update backend dependency:

     depends_on:
       redis:
         condition: service_healthy
       celery-worker:
-        condition: service_started
+        condition: service_healthy
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 55 - 59, The docker-compose currently uses
depends_on with celery-worker: condition: service_started which only waits for
container start; add a proper healthcheck to the celery-worker service (e.g., a
command that verifies Celery is ready to accept tasks) and then change the
backend's depends_on entry to reference celery-worker: condition:
service_healthy; alternatively, if you prefer not to add a healthcheck, add
documentation/error-handling noting that initial requests may fail until the
worker finishes model loading. Ensure you modify the celery-worker service block
to include a healthcheck section and update the backend depends_on from
service_started to service_healthy.
Dockerfile.worker (1)

1-13: Same security improvements needed as Dockerfile.backend.

Apply the same non-root user and --no-install-recommends changes recommended for Dockerfile.backend to maintain consistency and security.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.worker` around lines 1 - 13, Update the Dockerfile to mirror the
security improvements from Dockerfile.backend: change the apt-get install
invocation to use --no-install-recommends in the RUN step that installs gcc,
g++, git, wget; create a dedicated non-root user (e.g., appuser), chown the /app
working directory to that user, and add a USER appuser directive so the
container runs as non-root; keep the existing apt-get update and cleanup (rm -rf
/var/lib/apt/lists/*) but ensure the create-user and chown steps occur before
switching USER.
backend/inference_service.py (3)

47-54: Missing exception handling for Celery task failures.

result.get(timeout=600) can raise various exceptions: celery.exceptions.TimeoutError if the task doesn't complete in time, task exceptions propagated from the worker, or connection errors if Redis is unavailable. These will propagate as-is to the caller.

Consider wrapping in a try/except to translate Celery-specific exceptions into application-level errors with clearer messages.

💡 Suggested pattern
from celery.exceptions import TimeoutError as CeleryTimeout

def generate_mcq_sync(input_text: str, max_questions: int = 4, use_mediawiki: int = 0) -> Dict[str, Any]:
    if not CELERY_AVAILABLE:
        raise RuntimeError("Celery is not available. Please ensure Redis and Celery worker are running.")
    
    try:
        result = generate_mcq_task.apply_async(
            args=[input_text, max_questions, use_mediawiki]
        )
        return result.get(timeout=600)
    except CeleryTimeout:
        raise RuntimeError("Task timed out after 10 minutes") from None
    except Exception as e:
        logger.error(f"Task execution failed: {e!s}", exc_info=True)
        raise RuntimeError("Task execution failed") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/inference_service.py` around lines 47 - 54, Wrap the synchronous
Celery call in generate_mcq_sync (or wherever generate_mcq_task is invoked) with
try/except: call generate_mcq_task.apply_async(...) then result.get(timeout=600)
inside the try, catch celery.exceptions.TimeoutError to raise a clear
application-level RuntimeError like "Task timed out after 10 minutes" (or
similar), and catch other exceptions to log via the module logger (include
exc_info=True) and re-raise a RuntimeError("Task execution failed") from the
original exception; ensure CELERY_AVAILABLE check remains and import
celery.exceptions.TimeoutError where used.

11-12: Unused USE_ASYNC variable.

USE_ASYNC is defined but never used in this module. The file only checks CELERY_AVAILABLE to determine functionality. Consider removing it to avoid confusion, or use it to gate whether the sync wrappers should be available.

-# Check if async mode is enabled
-USE_ASYNC = os.getenv('USE_ASYNC', 'false').lower() in ('true', '1', 'yes')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/inference_service.py` around lines 11 - 12, The module defines
USE_ASYNC but never uses it, causing confusion; either remove the unused
constant or integrate it into the service gating logic so async behavior is
actually controlled by it (e.g., combine USE_ASYNC with CELERY_AVAILABLE when
deciding whether to expose async wrappers or fall back to sync versions). Update
references around CELERY_AVAILABLE and any wrapper export logic so the decision
uses both USE_ASYNC and CELERY_AVAILABLE (or delete USE_ASYNC and any related
comments) to keep the intent clear.

51-54: Consider extracting hardcoded timeout constant.

The 600-second timeout is repeated across all functions. Extract to a module-level constant for easier configuration and maintenance.

# At module level
TASK_TIMEOUT_SECONDS = 600  # 10 minutes

# In functions
return result.get(timeout=TASK_TIMEOUT_SECONDS)

Also applies to: 72-75, 93-96

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/inference_service.py` around lines 51 - 54, The repeated hardcoded
600-second timeout used when awaiting Celery task results (e.g., where
generate_mcq_task.apply_async is called and result.get(timeout=600) is used)
should be pulled into a module-level constant (suggested name
TASK_TIMEOUT_SECONDS) and all occurrences updated to use that constant; add the
constant at the top of the module and replace every result.get(timeout=600)
(including the other similar calls) with
result.get(timeout=TASK_TIMEOUT_SECONDS) so the timeout is configurable and
maintained in one place.
backend/routes/async_generation_routes.py (1)

295-300: Consider: Celery's PENDING state is ambiguous.

Celery returns PENDING for both tasks waiting to execute and for unknown/non-existent task IDs. A client polling for a mistyped or expired task ID will indefinitely receive "pending" status instead of an error.

Consider adding task existence validation (e.g., via Redis key lookup) or documenting this limitation in API responses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 295 - 300, The route
currently returns "pending" whenever task_result.state == 'PENDING', but Celery
uses PENDING for both queued and unknown task IDs; update the handler that
checks task_result (the AsyncResult for task_id) to disambiguate by querying the
backend for existence (e.g., use task_result.backend.get_task_meta(task_id) /
result_backend.get or check Redis key for task meta) and if no metadata exists
return a 404/errored JSON indicating unknown/expired task_id, otherwise keep
returning 202 pending; update the code paths around task_result and task_id to
perform this extra backend lookup and include a clear message when the task ID
is not found.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 636-640: Replace the bare except in the CELERY availability check
with targeted exception handling: when importing or accessing
inference_service.CELERY_AVAILABLE fail, catch ImportError and AttributeError
(and optionally ModuleNotFoundError) instead of a bare except, and set
health_status["services"]["celery"] = "unavailable"; also consider logging the
caught exception for debugging so you can see why inference_service or
CELERY_AVAILABLE was unavailable. Use the existing identifiers
inference_service, CELERY_AVAILABLE, and health_status to locate the code to
change.
- Around line 92-100: The qa_model is being unconditionally instantiated by
qa_model = pipeline("question-answering") which overrides the earlier None set
when USE_CELERY_INFERENCE is enabled; move the pipeline instantiation inside the
legacy (non-Celery) initialization branch so qa_model remains None when using
Celery inference, and adjust any code paths that reference qa_model (notably the
/get_mcq_answer handler) to use the Celery path when qa_model is None;
specifically locate the qa_model assignments and the
LLMQuestionGenerator/MediaWikiAPI initializations and ensure only the legacy
mode creates qa_model via pipeline("question-answering"), while the Celery
branch leaves qa_model unset and uses the Celery task path.

In `@backend/start-dev.bat`:
- Around line 1-49: The batch script start-dev.bat contains LF-only line endings
which can break Windows batch parsing; convert the file to CRLF line endings and
commit that change (either run a tool to replace LF with CRLF for start-dev.bat
or add a .gitattributes entry to force eol=crlf for this file), then verify the
script (the sections referencing USE_CELERY_INFERENCE, the if /i
"%USE_CELERY_INFERENCE%"=="true" block, and the final "python server.py" launch)
runs correctly in cmd.exe; optionally remove or replace Unicode emoji characters
if you need guaranteed rendering in default Windows consoles.

In `@backend/start-dev.sh`:
- Around line 20-23: Replace the fragile export $(cat .env | grep -v '^#' |
xargs) pattern in start-dev.sh with a safe line-by-line loader: when .env
exists, read it with while IFS= read -r line; skip blank/comment lines and run
export "$line" for each remaining line. This avoids unquoted
command-substitution word-splitting, handles values with spaces, and removes the
unnecessary cat/grep pipeline; look for the .env loading block and update it
accordingly.

In `@backend/tasks/inference_tasks.py`:
- Around line 261-277: The loop using zip(input_questions, input_options) in the
method that builds outputs silently truncates when the two lists differ; add
explicit length validation before the loop (e.g., check len(input_questions) vs
len(input_options) and raise a ValueError with counts) or use zip(...,
strict=True) to fail fast; update the block around the for question, options in
zip(...) (and any callers that rely on matching lengths) so mismatched inputs
are detected and reported rather than silently truncated.
- Around line 444-454: generate_hard_boolq_task currently returns a flat list of
strings because it applies make_question_harder to the output of
question_gen.generate directly; to match generate_hard_shortq_task and
generate_hard_mcq_task you should preserve the dict structure: iterate over the
original generated list (from question_gen.generate), replace each item's
"question" value with make_question_harder(item["question"]) (or otherwise wrap
the hardened string back into the same dict shape), and return the modified list
of dicts so consumers get a consistent output shape; reference
generate_hard_boolq_task, question_gen.generate, and make_question_harder when
making the change.

In `@docker-compose.yml`:
- Around line 42-47: The docker-compose.yml currently uses the deploy.resources
memory limits/reservations block (deploy.resources -> limits.memory and
deployments.resources -> reservations.memory) which only applies in Docker
Swarm; update the compose file to support non‑Swarm runs by either adding the
legacy keys mem_limit and mem_reservation under the relevant service or
document/run instructions to start with docker-compose --compatibility up so
memory limits are honored; adjust the service entry that contains
deploy.resources to include mem_limit and mem_reservation (or add a README note
about using --compatibility) accordingly.

---

Outside diff comments:
In `@README.md`:
- Around line 208-288: The README contains unresolved git merge conflict markers
(<<<<<<< HEAD, =======, >>>>>>> 8b154d5) that must be removed; resolve by
merging the two retained sections so both "LLM-Based Question Generation" and
the "Async vs Sync Endpoints / troubleshooting" content remain intact, delete
the conflict markers and any duplicated lines, ensure the LLM-Based Question
Generation heading and its JSON example and the Async vs Sync Endpoints block
(including the Celery/Redis troubleshooting notes) are present and formatted
consistently under their respective headings.

---

Nitpick comments:
In `@backend/inference_service.py`:
- Around line 47-54: Wrap the synchronous Celery call in generate_mcq_sync (or
wherever generate_mcq_task is invoked) with try/except: call
generate_mcq_task.apply_async(...) then result.get(timeout=600) inside the try,
catch celery.exceptions.TimeoutError to raise a clear application-level
RuntimeError like "Task timed out after 10 minutes" (or similar), and catch
other exceptions to log via the module logger (include exc_info=True) and
re-raise a RuntimeError("Task execution failed") from the original exception;
ensure CELERY_AVAILABLE check remains and import celery.exceptions.TimeoutError
where used.
- Around line 11-12: The module defines USE_ASYNC but never uses it, causing
confusion; either remove the unused constant or integrate it into the service
gating logic so async behavior is actually controlled by it (e.g., combine
USE_ASYNC with CELERY_AVAILABLE when deciding whether to expose async wrappers
or fall back to sync versions). Update references around CELERY_AVAILABLE and
any wrapper export logic so the decision uses both USE_ASYNC and
CELERY_AVAILABLE (or delete USE_ASYNC and any related comments) to keep the
intent clear.
- Around line 51-54: The repeated hardcoded 600-second timeout used when
awaiting Celery task results (e.g., where generate_mcq_task.apply_async is
called and result.get(timeout=600) is used) should be pulled into a module-level
constant (suggested name TASK_TIMEOUT_SECONDS) and all occurrences updated to
use that constant; add the constant at the top of the module and replace every
result.get(timeout=600) (including the other similar calls) with
result.get(timeout=TASK_TIMEOUT_SECONDS) so the timeout is configurable and
maintained in one place.

In `@backend/routes/async_generation_routes.py`:
- Around line 295-300: The route currently returns "pending" whenever
task_result.state == 'PENDING', but Celery uses PENDING for both queued and
unknown task IDs; update the handler that checks task_result (the AsyncResult
for task_id) to disambiguate by querying the backend for existence (e.g., use
task_result.backend.get_task_meta(task_id) / result_backend.get or check Redis
key for task meta) and if no metadata exists return a 404/errored JSON
indicating unknown/expired task_id, otherwise keep returning 202 pending; update
the code paths around task_result and task_id to perform this extra backend
lookup and include a clear message when the task ID is not found.

In `@backend/server.py`:
- Around line 284-301: The loop using zip(input_questions, input_options) should
use zip(..., strict=True) to prevent silent truncation if lengths differ; update
the loop header to for question, options in zip(input_questions, input_options,
strict=True): (the variables input_questions and input_options in the current
loop) to provide defense-in-depth alongside the existing length check—ensure the
runtime supports Python 3.10+ or provide an explicit length check fallback if
not.

In `@docker-compose.yml`:
- Around line 55-59: The docker-compose currently uses depends_on with
celery-worker: condition: service_started which only waits for container start;
add a proper healthcheck to the celery-worker service (e.g., a command that
verifies Celery is ready to accept tasks) and then change the backend's
depends_on entry to reference celery-worker: condition: service_healthy;
alternatively, if you prefer not to add a healthcheck, add
documentation/error-handling noting that initial requests may fail until the
worker finishes model loading. Ensure you modify the celery-worker service block
to include a healthcheck section and update the backend depends_on from
service_started to service_healthy.

In `@Dockerfile.backend`:
- Around line 27-33: The conditional in the RUN step checking for the s2v_old
directory is ineffective during image build because COPY backend/ precedes it,
so remove the redundant if-check and make the RUN download/extract commands
unconditional (update the RUN block that references s2v_old, wget, tar, rm) or,
if you need the conditional behavior for runtime-mounted volumes, replace the
current conditional with a comment explaining that choice and keep the logic;
ensure you modify the RUN line alongside WORKDIR and consider the interaction
with the earlier COPY backend/ step.
- Around line 1-13: Update the Dockerfile to avoid running as root and to reduce
image size: modify the apt-get install RUN (the line starting with RUN apt-get
update && apt-get install -y ...) to add --no-install-recommends and keep the
apt-get clean up, create a non-root user/group (e.g., add_user or user creation
block) after WORKDIR /app, chown the /app directory to that user, and switch to
that non-root user with USER before the final CMD; preserve the existing FROM
python:3.10-slim and current RUN structure but replace root operations with
operations performed as root only for package install and file ownership, then
run the app as the created non-root user.

In `@Dockerfile.worker`:
- Around line 1-13: Update the Dockerfile to mirror the security improvements
from Dockerfile.backend: change the apt-get install invocation to use
--no-install-recommends in the RUN step that installs gcc, g++, git, wget;
create a dedicated non-root user (e.g., appuser), chown the /app working
directory to that user, and add a USER appuser directive so the container runs
as non-root; keep the existing apt-get update and cleanup (rm -rf
/var/lib/apt/lists/*) but ensure the create-user and chown steps occur before
switching USER.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0fff435-fe07-43e4-868f-cfd093bacfc9

📥 Commits

Reviewing files that changed from the base of the PR and between 40b8c95 and 681fe51.

📒 Files selected for processing (13)
  • .dockerignore
  • Dockerfile.backend
  • Dockerfile.worker
  • README.md
  • backend/.env.example
  • backend/inference_service.py
  • backend/routes/async_generation_routes.py
  • backend/server.py
  • backend/start-dev.bat
  • backend/start-dev.sh
  • backend/tasks/inference_tasks.py
  • docker-compose.yml
  • requirements.txt
✅ Files skipped from review due to trivial changes (2)
  • requirements.txt
  • .dockerignore

Comment thread backend/server.py
Comment thread backend/server.py
Comment thread backend/start-dev.bat
Comment thread backend/start-dev.sh
Comment thread backend/tasks/inference_tasks.py Outdated
Comment thread backend/tasks/inference_tasks.py Outdated
Comment thread docker-compose.yml Outdated
@piyush06singhal
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@piyush06singhal
Copy link
Copy Markdown
Author

@coderabbitai resume

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
backend/routes/async_generation_routes.py (2)

200-214: ⚠️ Potential issue | 🟡 Minor

Normalize use_mediawiki in /generate_all_async too.

The other submit handlers coerce True/"1" to integer 1, but Line 202 passes the raw JSON value through. backend/tasks/inference_tasks.py checks if use_mediawiki == 1, so "1" silently skips MediaWiki fetching only for the combined async endpoint.

🛠️ Minimal fix
-        use_mediawiki = data.get("use_mediawiki", 0)
+        use_mediawiki = 1 if data.get("use_mediawiki") in (1, True, "1") else 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 200 - 214, The
combined async endpoint passes the raw JSON value for use_mediawiki to
generate_all_questions_task, causing non-integer truthy values (e.g. "1" or
True) to not match the check in backend/tasks/inference_tasks.py (which expects
use_mediawiki == 1); coerce/normalize use_mediawiki to an integer like the other
handlers before sending it to generate_all_questions_task by converting the
value to 1 or 0 (e.g., replicate the same normalization used elsewhere),
updating the variable used in the call to generate_all_questions_task so
downstream code receives an integer 1 when MediaWiki should be used and 0
otherwise.

11-18: ⚠️ Potential issue | 🔴 Critical

Pick one import mode and use it everywhere.

This file now imports backend.tasks... and backend.celery_worker, but backend/server.py still loads the blueprint as from routes.async_generation_routes import async_routes, and backend/start-dev.sh still tells contributors to run python server.py / celery -A celery_worker.celery_app .... In that script-based startup mode, the backend.* imports here are not resolvable, so Flask can fail before it serves any request. Either convert the whole backend to package-mode startup, or keep these imports aligned with the existing local-module style. The same mismatch also appears in backend/tasks/inference_tasks.py Line 7.

#!/bin/bash
cd backend || exit 1

python - <<'PY'
from pathlib import Path
cwd = Path.cwd()
for rel in (
    "celery_worker.py",
    "tasks/inference_tasks.py",
    "backend/celery_worker.py",
    "backend/tasks/inference_tasks.py",
):
    print(f"{rel}: {'present' if (cwd / rel).exists() else 'missing'}")
PY

echo
sed -n '1,20p' routes/async_generation_routes.py
echo
sed -n '20,35p' server.py
echo
sed -n '51,75p' start-dev.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/async_generation_routes.py` around lines 11 - 18, Imports in
async_generation_routes.py mix package-style (backend.tasks...) with
local-module startup elsewhere, causing import errors when running server.py;
pick one mode and make imports consistent: either change the imports of
generate_mcq_task, generate_boolq_task, generate_shortq_task,
generate_all_questions_task and celery_app to local relative/local-module form
used by server.py (e.g., from tasks.inference_tasks import ... and from
celery_worker import celery_app) or convert the whole project to package-mode
and update all call sites (including backend/tasks/inference_tasks.py line 7) to
use the package-prefixed imports; update all occurrence of these symbols to
match the chosen import style so imports resolve under the current startup
script.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 111-113: The Celery branches in server.py are dispatching tasks
after the route already ran process_input_text, causing workers to re-run
MediaWiki preprocessing because use_mediawiki remains set; either stop
preprocessing before queuing to Celery or clear the use_mediawiki flag when
using Celery. Update the code paths that call generate_mcq_sync /
generate_mcq_async and any Celery dispatch logic so that when
USE_CELERY_INFERENCE is True you do not pass a preprocessed MediaWiki
payload—specifically, clear or set use_mediawiki=False before calling the Celery
task dispatch (and likewise for other branches mentioned), or move
process_input_text into the non-Celery (legacy) branch only, ensuring
backend/tasks/inference_tasks.py still receives the original user query, not the
summarized text.
- Around line 77-93: Add the missing import "from transformers import pipeline"
at the module top so pipeline is defined, and remove the duplicate QA model
initialization by deleting the second "if not USE_CELERY_INFERENCE: qa_model =
pipeline('question-answering')" block; ensure qa_model is only created once (the
earlier initialization inside the first USE_CELERY_INFERENCE guard) so symbols
to edit are the pipeline import and the duplicate qa_model assignment in
backend/server.py (alongside MCQGenerator, AnswerPredictor, BoolQGenerator,
ShortQGenerator, QuestionGenerator, and docs_service initializations).

In `@backend/start-dev.sh`:
- Around line 53-54: Update the documented Celery worker startup command so it
explicitly runs in single-process mode to avoid spawning multiple transformer
instances; change the example that references celery -A celery_worker.celery_app
worker --loglevel=info to include --pool=solo --concurrency=1 (and note that
higher concurrency is an explicit opt-in), ensuring the example that mentions
celery_worker.celery_app shows the single-process flags for development.

---

Duplicate comments:
In `@backend/routes/async_generation_routes.py`:
- Around line 200-214: The combined async endpoint passes the raw JSON value for
use_mediawiki to generate_all_questions_task, causing non-integer truthy values
(e.g. "1" or True) to not match the check in backend/tasks/inference_tasks.py
(which expects use_mediawiki == 1); coerce/normalize use_mediawiki to an integer
like the other handlers before sending it to generate_all_questions_task by
converting the value to 1 or 0 (e.g., replicate the same normalization used
elsewhere), updating the variable used in the call to
generate_all_questions_task so downstream code receives an integer 1 when
MediaWiki should be used and 0 otherwise.
- Around line 11-18: Imports in async_generation_routes.py mix package-style
(backend.tasks...) with local-module startup elsewhere, causing import errors
when running server.py; pick one mode and make imports consistent: either change
the imports of generate_mcq_task, generate_boolq_task, generate_shortq_task,
generate_all_questions_task and celery_app to local relative/local-module form
used by server.py (e.g., from tasks.inference_tasks import ... and from
celery_worker import celery_app) or convert the whole project to package-mode
and update all call sites (including backend/tasks/inference_tasks.py line 7) to
use the package-prefixed imports; update all occurrence of these symbols to
match the chosen import style so imports resolve under the current startup
script.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ce44e88-546c-4932-b67c-58b4fab7875c

📥 Commits

Reviewing files that changed from the base of the PR and between 40b8c95 and 7488d09.

📒 Files selected for processing (14)
  • .dockerignore
  • .gitattributes
  • Dockerfile.backend
  • Dockerfile.worker
  • README.md
  • backend/.env.example
  • backend/inference_service.py
  • backend/routes/async_generation_routes.py
  • backend/server.py
  • backend/start-dev.bat
  • backend/start-dev.sh
  • backend/tasks/inference_tasks.py
  • docker-compose.yml
  • requirements.txt
✅ Files skipped from review due to trivial changes (6)
  • .gitattributes
  • requirements.txt
  • .dockerignore
  • Dockerfile.worker
  • Dockerfile.backend
  • docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • backend/start-dev.bat
  • backend/inference_service.py

Comment thread backend/server.py Outdated
Comment thread backend/server.py
Comment thread backend/start-dev.sh Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/tasks/inference_tasks.py (1)

267-267: Consider list unpacking for consistency with server.py.

Same minor style suggestion as in server.py:

-            options_with_answer = options + [generated_answer]
+            options_with_answer = [*options, generated_answer]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tasks/inference_tasks.py` at line 267, The code builds
options_with_answer using concatenation (options_with_answer = options +
[generated_answer]); for consistency with server.py switch to list unpacking by
constructing options_with_answer with [*options, generated_answer] — locate the
occurrence of options_with_answer in inference_tasks.py (inside the
inference/task function that uses options and generated_answer) and replace the
concatenation pattern with the unpacking pattern.
backend/server.py (1)

289-292: Consider list unpacking for consistency.

The list concatenation works correctly. As a minor style improvement, you could use unpacking:

-            options_with_answer = options + [generated_answer]
+            options_with_answer = [*options, generated_answer]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 289 - 292, Replace the list concatenation
with list unpacking for consistency: instead of building options_with_answer by
using options + [generated_answer], create it using unpacking (referencing
options_with_answer, options, and generated_answer) so the subsequent call to
TfidfVectorizer().fit_transform(vectorizer) receives the same sequence but
follows the stylistic preference; ensure options is an iterable/list before
unpacking to avoid TypeError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tasks/inference_tasks.py`:
- Around line 331-338: The loop in inference_tasks.py calls
answer_predictor.predict_boolean_answer with a string (question) but
predict_boolean_answer expects a list and returns a list of booleans; change the
call to pass a single-element list (e.g., {"input_text": input_text,
"input_question": [question]}) and then use the first boolean result
(qa_response[0]) when appending "True"/"False" to output so
predict_boolean_answer and its return shape are handled correctly.

---

Nitpick comments:
In `@backend/server.py`:
- Around line 289-292: Replace the list concatenation with list unpacking for
consistency: instead of building options_with_answer by using options +
[generated_answer], create it using unpacking (referencing options_with_answer,
options, and generated_answer) so the subsequent call to
TfidfVectorizer().fit_transform(vectorizer) receives the same sequence but
follows the stylistic preference; ensure options is an iterable/list before
unpacking to avoid TypeError.

In `@backend/tasks/inference_tasks.py`:
- Line 267: The code builds options_with_answer using concatenation
(options_with_answer = options + [generated_answer]); for consistency with
server.py switch to list unpacking by constructing options_with_answer with
[*options, generated_answer] — locate the occurrence of options_with_answer in
inference_tasks.py (inside the inference/task function that uses options and
generated_answer) and replace the concatenation pattern with the unpacking
pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43c08677-b505-4c44-8866-6c44302fff13

📥 Commits

Reviewing files that changed from the base of the PR and between 7488d09 and d4eecd5.

📒 Files selected for processing (5)
  • backend/inference_service.py
  • backend/routes/async_generation_routes.py
  • backend/server.py
  • backend/start-dev.sh
  • backend/tasks/inference_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/inference_service.py

Comment thread backend/tasks/inference_tasks.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tasks/inference_tasks.py`:
- Around line 261-277: The loop that compares QA-generated answers to
multiple-choice options can hit an unhelpful ValueError when options is empty;
inside the for question, options in zip(...) loop (before calling
TfidfVectorizer), add a defensive check for an empty options list (e.g., if not
options:) and handle it deterministically: either append the generated_answer
(from qa_response["answer"]) or a clear sentinel (None or "NO_OPTIONS") to
outputs and continue, or raise a new ValueError with a descriptive message
referencing the question; ensure this check is placed before building
options_with_answer and computing similarities so that similarities.argmax()
never runs on an empty sequence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 19c73cab-25a6-4cf7-90fa-34e157e68e17

📥 Commits

Reviewing files that changed from the base of the PR and between d4eecd5 and fac0ab9.

📒 Files selected for processing (2)
  • backend/server.py
  • backend/tasks/inference_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/server.py

Comment thread backend/tasks/inference_tasks.py
@piyush06singhal
Copy link
Copy Markdown
Author

Hi @SxBxcoder,
Thanks again for the detailed feedback earlier; it really helped in refining the approach.

I’ve now addressed the key concerns you pointed out:

  1. Reduced local setup friction by adding a minimal Docker Compose setup for Redis + worker
  2. Ensured ML models are no longer duplicated between Flask and Celery (moved to worker-side lazy loading)
  3. Introduced a safe async rollout using a configurable flag, so existing synchronous flows remain unaffected

I’ve kept the changes incremental and aligned with the current architecture to avoid breaking anything.
Please take another look when you have a moment and share any further suggestions.
Thanks again for the guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Introduce Distributed Asynchronous Inference Pipeline for AI Question Generation

2 participants