Skip to content

feat: Add question quality scoring and improve input validation#598

Open
piyush06singhal wants to merge 8 commits intoAOSSIE-Org:mainfrom
piyush06singhal:feature/question-quality-scoring
Open

feat: Add question quality scoring and improve input validation#598
piyush06singhal wants to merge 8 commits intoAOSSIE-Org:mainfrom
piyush06singhal:feature/question-quality-scoring

Conversation

@piyush06singhal
Copy link
Copy Markdown

@piyush06singhal piyush06singhal commented Mar 17, 2026

Addressed Issues:

Fixes #597


Summary

This PR introduces a Question Quality Scoring & Ranking system using the existing QAEvaluator, along with input validation improvements to enhance API robustness and output quality.


Key Changes

1. Question Quality Scoring & Ranking

  • Integrated QAEvaluator into main endpoints:
    • /get_mcq
    • /get_boolq
    • /get_shortq
  • Added optional flag: "use_scoring": true
  • Updated pipeline:
    Generate → Score → Rank → Slice → Return
  • When enabled, generates a larger pool (2×) and returns the top-N best questions

2. Input Validation & Safety

  • Added safe JSON parsing using request.get_json(silent=True) or {}
  • Validates input_text:
    • must be a string
    • must not be empty
  • Returns proper HTTP 400 for invalid inputs

3. Code Quality Improvements

  • Added structured logging for better debugging
  • Implemented reusable helper: score_and_rank_questions(...)
  • Ensured safe fallback if scoring fails
  • Handled edge cases (e.g., small question sets)

Benefits

  • Improved quality and relevance of generated questions
  • More consistent output across different inputs
  • Safer API handling (no crashes on invalid requests)
  • Clean and maintainable backend logic

Backward Compatibility

  • Default behaviour unchanged (use_scoring=False)
  • No breaking changes to API responses
  • Existing endpoints continue to work as before

Testing

  • Verified all endpoints with and without scoring
  • Tested invalid input handling and edge cases
  • No new warnings or runtime errors observed

Additional Notes

This feature leverages the already available QAEvaluator and integrates it into the main generation pipeline without introducing new dependencies or altering existing behaviour.


AI Usage Disclosure

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and tested the code locally.

I have used the following AI models and tools: ChatGPT


Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Summary by CodeRabbit

  • New Features

    • Optional question-quality scoring and ranking across question types (MCQ, boolean, short); when enabled the service generates extras and returns the top-ranked results.
  • Bug Fixes

    • Stricter input validation with clear 400 responses for missing/invalid fields.
    • Centralized logging and graceful fallbacks: scoring errors now log and return original results.
  • Tests

    • Comprehensive tests covering scoring, ranking, validation, endpoints, and failure cases.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an optional question-quality scoring and ranking layer to MCQ/BoolQ/ShortQ endpoints, plus logging, a thread-safe lazy QAEvaluator singleton, input validation helpers, scoring/ranking logic with safe fallbacks, and tests covering scoring behavior and edge cases.

Changes

Cohort / File(s) Summary
Server core & helpers
backend/server.py
Adds logger setup; thread-safe lazy qa_evaluator with get_qa_evaluator(); parse_max_questions() input validation; score_and_rank_questions() to encode, score, rank, and return reordered questions with exception-safe fallback; replaces startup print with logger.info.
Endpoints: scoring integration & validation
backend/server.py
Updates /get_mcq, /get_boolq, /get_shortq to use request.get_json(silent=True) or {}, validate input_text, parse max_questions, accept use_scoring flag, request up to max_questions * 2 when scoring, apply scoring when ≥3 items, slice to original max_questions; BoolQ adapts string items to dicts for scoring and unwraps them.
Endpoint tests: scoring mode invocation
backend/test_server.py
Adds test_get_mcq_with_scoring, test_get_boolq_with_scoring, test_get_shortq_with_scoring and calls them in __main__ to exercise scoring-enabled endpoints.
Scoring unit/integration tests
backend/test_scoring_feature.py
New pytest suite validating score_and_rank_questions, parse_max_questions, mocked QAEvaluator behaviors (scores, exceptions, out-of-range), endpoint validation (good/bad payloads), and end-to-end scoring-enabled flows returning top-N ranked items.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint as /get_mcq|/get_boolq|/get_shortq
    participant Generator as QuestionGenerator
    participant Scorer as score_and_rank_questions
    participant QAEval as QAEvaluator
    participant Response

    Client->>Endpoint: POST (input_text, max_questions, use_scoring:true)
    Endpoint->>Endpoint: validate input_text, parse_max_questions
    alt invalid input_text or max_questions
        Endpoint->>Response: 400 Error
    else valid
        Endpoint->>Generator: generate questions (limit = max_questions × 2)
        Generator->>Endpoint: generated questions
        alt use_scoring == true and >= 3 questions
            Endpoint->>Scorer: score_and_rank_questions(questions)
            Scorer->>QAEval: encode_qa_pairs
            QAEval->>Scorer: encoded vectors
            Scorer->>QAEval: get_scores
            QAEval->>Scorer: scores / indices
            Scorer->>Scorer: sort by score, handle exceptions/fallback
            Scorer->>Endpoint: ranked questions
        end
        Endpoint->>Response: return top-N questions (JSON)
    end
    Response->>Client: JSON payload
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through lines to find the best,

I scored each prompt, then shelved the rest.
I ranked them tidy, neat, and bright —
A clever rabbit, pruning for insight. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: adding question quality scoring and improving input validation, which are the primary focuses of the PR.
Linked Issues check ✅ Passed The PR fully implements all core coding requirements from issue #597: integrates QAEvaluator for scoring, ranks questions by quality, makes scoring optional via 'use_scoring' flag, generates larger candidate pools (2×), and includes fallback behavior for edge cases.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the objectives from issue #597; no out-of-scope modifications detected. Logging setup and test coverage are supporting changes that enhance the core feature.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
backend/server.py (1)

89-91: Simplify answer extraction using dict.get.

This removes redundant membership checks and matches Ruff’s RUF019 hint.

Proposed fix
-            if 'answer' in q and q['answer']:
-                answer_texts.append(q['answer'])
-            elif 'options' in q and q['options']:
+            if q.get("answer"):
+                answer_texts.append(q.get("answer"))
+            elif q.get("options"):
                 # fallback only if answer missing
-                answer_texts.append(q['options'][0])
+                answer_texts.append(q.get("options")[0])
             else:
                 answer_texts.append('')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 89 - 91, Replace the membership checks on q
with dict.get to simplify extraction: in the block that appends to answer_texts
(referencing q and answer_texts) use q.get('answer') to fetch the answer and
fall back to q.get('options') for options instead of testing 'answer' in q /
'options' in q; ensure you keep the same truthiness checks (i.e., only append
when the retrieved value is truthy) and preserve the existing control flow that
handles answers vs options.
backend/test_server.py (1)

36-46: Strengthen scoring tests with behavioral assertions.

These tests currently only check presence of output, so ranking/slicing regressions can pass unnoticed. Assert response status, output type, and len(output) <= max_questions.

Proposed test enhancement
 def test_get_mcq_with_scoring():
@@
     response = make_post_request(endpoint, data)
     print(f'/get_mcq with scoring Response: {response}')
     assert 'output' in response
+    assert isinstance(response['output'], list)
+    assert len(response['output']) <= data['max_questions']

 def test_get_boolq_with_scoring():
@@
     response = make_post_request(endpoint, data)
     print(f'/get_boolq with scoring Response: {response}')
     assert 'output' in response
+    assert isinstance(response['output'], list)
+    assert len(response['output']) <= data['max_questions']

 def test_get_shortq_with_scoring():
@@
     response = make_post_request(endpoint, data)
     print(f'/get_shortq with scoring Response: {response}')
     assert 'output' in response
+    assert isinstance(response['output'], list)
+    assert len(response['output']) <= data['max_questions']

Also applies to: 57-67, 78-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` around lines 36 - 46, The test_get_mcq_with_scoring
currently only checks that 'output' exists; update it (and the similar tests at
the other ranges) to assert the HTTP success status, output type, and length
constraints: after calling make_post_request(endpoint, data) assert the response
indicates success (e.g., response.get('status_code') == 200 or
response.get('status') == 'ok'), assert isinstance(response['output'], list),
and assert len(response['output']) <= data['max_questions']; keep references to
test_get_mcq_with_scoring, endpoint '/get_mcq', make_post_request and the
response['output'] field so the assertions are added in the same test and
mirrored in the other tests mentioned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 139-144: The current handlers only slice questions to
max_questions inside the scoring branch, so when use_scoring=True but
len(questions)<3 the result can exceed max_questions; update the logic in the
affected endpoints (the /get_boolq and /get_shortq handlers and the block using
use_scoring, questions, max_questions, and score_and_rank_questions) to always
enforce questions = questions[:max_questions] after any ranking/processing step
(i.e., perform the slicing outside or after the conditional that calls
score_and_rank_questions) so the returned list never exceeds max_questions even
when scoring is skipped.
- Around line 125-132: Validate and normalize max_questions before using it in
arithmetic: parse the incoming data["max_questions"] into an integer (or reject
if it cannot be parsed), ensure it is a positive int (>0), and if not return a
400 response with a clear error message; then use the validated variable (e.g.,
validated_max_questions) when computing generation_limit =
validated_max_questions * 2 if use_scoring else validated_max_questions. Apply
this change wherever generation_limit is computed (the blocks that reference
max_questions, use_scoring, and generation_limit) and ensure
process_input_text(input_text, use_mediawiki) remains unchanged.
- Around line 86-88: The loop that builds scoring inputs uses q.get('question',
'') which misses generator outputs that populate 'question_statement'; update
the logic in the for q in questions loop that appends to question_texts to
prefer q.get('question_statement') and fall back to q.get('question', '') (or
empty string) so scoring uses the actual question text produced by the
MCQ/ShortQ generators; locate the append to question_texts in backend/server.py
and replace the key access accordingly.
- Around line 170-174: The request handler for /get_boolq currently normalizes
boolean_questions into dicts unconditionally (the boolean_questions list
comprehension), which changes the response shape even when scoring is disabled;
change the logic so the original boolean_questions variable is preserved for
non-scoring responses and only create a normalized scoring_payload (e.g.,
scoring_payload = [{"question": q} if isinstance(q, str) else q for q in
boolean_questions]) inside the scoring branch where scoring is performed (the
code that calls the scorer), leaving the top-level boolean_questions list
unchanged and returning the original raw list when scoring is not requested.
- Line 53: The file currently creates qa_evaluator = main.QAEvaluator() at
import time which forces heavy model load even when use_scoring is false; change
this to lazy initialization by setting qa_evaluator = None at module scope and
instantiating main.QAEvaluator() on first scoring request (inside the scoring
handler function that checks use_scoring) while wrapping the construction in a
try/except to catch model load failures, set qa_evaluator back to None on error,
and return a clean fallback response or error message when use_scoring is
disabled or the model failed to load.

---

Nitpick comments:
In `@backend/server.py`:
- Around line 89-91: Replace the membership checks on q with dict.get to
simplify extraction: in the block that appends to answer_texts (referencing q
and answer_texts) use q.get('answer') to fetch the answer and fall back to
q.get('options') for options instead of testing 'answer' in q / 'options' in q;
ensure you keep the same truthiness checks (i.e., only append when the retrieved
value is truthy) and preserve the existing control flow that handles answers vs
options.

In `@backend/test_server.py`:
- Around line 36-46: The test_get_mcq_with_scoring currently only checks that
'output' exists; update it (and the similar tests at the other ranges) to assert
the HTTP success status, output type, and length constraints: after calling
make_post_request(endpoint, data) assert the response indicates success (e.g.,
response.get('status_code') == 200 or response.get('status') == 'ok'), assert
isinstance(response['output'], list), and assert len(response['output']) <=
data['max_questions']; keep references to test_get_mcq_with_scoring, endpoint
'/get_mcq', make_post_request and the response['output'] field so the assertions
are added in the same test and mirrored in the other tests mentioned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e354f48-9e0a-4e02-acdc-ef7ecf1c992f

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and dc55dae.

📒 Files selected for processing (2)
  • backend/server.py
  • backend/test_server.py

Comment thread backend/server.py Outdated
Comment thread backend/server.py
Comment thread backend/server.py Outdated
Comment thread backend/server.py
Comment thread backend/server.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
backend/server.py (2)

88-99: Unused parameter: context is documented but never used.

The context parameter is defined in the signature and documented in the docstring, but it's not referenced anywhere in the function body.

♻️ Either remove the parameter or use it for enhanced scoring

If the parameter is intended for future use, consider removing it for now and adding it when needed:

-def score_and_rank_questions(questions, context):
+def score_and_rank_questions(questions):
     """
     Use QAEvaluator to score and rank questions based on quality.
     Returns sorted list with best questions first.
     
     Args:
         questions: List of question dictionaries
-        context: The input text context
         
     Returns:
         List of questions sorted by quality score (best first)
     """

Then update call sites to remove the input_text argument.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 88 - 99, The function
score_and_rank_questions currently declares and documents the parameter context
but never uses it; either remove the unused parameter from the signature and
docstring in score_and_rank_questions and update all call sites that pass
input_text, or incorporate context into the scoring logic (e.g., pass context
into QAEvaluator calls inside score_and_rank_questions and update the docstring
to reflect its use); ensure the function signature, docstring, and any callers
are kept in sync to avoid lint/type errors.

55-59: Race condition in lazy initialization.

Two concurrent requests could both see qa_evaluator is None and instantiate QAEvaluator() simultaneously. Given the heavy model loading (BERT from HuggingFace), this wastes resources and could cause memory issues.

♻️ Thread-safe lazy initialization
+import threading
+
 qa_evaluator = None
+_qa_evaluator_lock = threading.Lock()

 def get_qa_evaluator():
     global qa_evaluator
-    if qa_evaluator is None:
-        qa_evaluator = main.QAEvaluator()
+    if qa_evaluator is None:
+        with _qa_evaluator_lock:
+            if qa_evaluator is None:
+                qa_evaluator = main.QAEvaluator()
     return qa_evaluator
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 55 - 59, The lazy initializer
get_qa_evaluator has a race where multiple threads can instantiate
main.QAEvaluator simultaneously; fix by introducing a module-level
threading.Lock (e.g., qa_evaluator_lock) and use double-checked locking inside
get_qa_evaluator: first check qa_evaluator is None, acquire the lock, check
again, and only then set qa_evaluator = main.QAEvaluator(); release the lock and
return qa_evaluator so only one heavy model load occurs. Ensure the lock is
initialized at module import alongside the qa_evaluator variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 112-122: The for-loop starting with "for q in questions:" is
mis-indented and causing a Python IndentationError; adjust the indentation so
the "for q in questions:" line aligns with its surrounding block (the enclosing
try/except or function scope) and not indented one extra space, ensuring
subsequent lines that append to question_texts and answer_texts remain
consistently indented under that for loop; locate the block by searching for the
symbols "for q in questions:", "question_texts", and "answer_texts" and fix
indentation to match the surrounding code structure.
- Around line 209-217: boolean_questions is a list of plain strings (from
beam_search_decoding) but score_and_rank_questions expects dicts with keys like
"question" or "question_statement", so convert each string in boolean_questions
into a dict (e.g., {"question": <string>}) before calling
score_and_rank_questions; update the block that calls score_and_rank_questions
to map/transform boolean_questions into the expected dict shape (refer to how
MCQ formatting is done) and then pass the transformed list into
score_and_rank_questions(boolean_questions, input_text) while keeping the final
slicing boolean_questions = boolean_questions[:max_questions].

---

Nitpick comments:
In `@backend/server.py`:
- Around line 88-99: The function score_and_rank_questions currently declares
and documents the parameter context but never uses it; either remove the unused
parameter from the signature and docstring in score_and_rank_questions and
update all call sites that pass input_text, or incorporate context into the
scoring logic (e.g., pass context into QAEvaluator calls inside
score_and_rank_questions and update the docstring to reflect its use); ensure
the function signature, docstring, and any callers are kept in sync to avoid
lint/type errors.
- Around line 55-59: The lazy initializer get_qa_evaluator has a race where
multiple threads can instantiate main.QAEvaluator simultaneously; fix by
introducing a module-level threading.Lock (e.g., qa_evaluator_lock) and use
double-checked locking inside get_qa_evaluator: first check qa_evaluator is
None, acquire the lock, check again, and only then set qa_evaluator =
main.QAEvaluator(); release the lock and return qa_evaluator so only one heavy
model load occurs. Ensure the lock is initialized at module import alongside the
qa_evaluator variable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4766a5e-072d-4cb0-bb94-a6823f3af48b

📥 Commits

Reviewing files that changed from the base of the PR and between dc55dae and bfe0652.

📒 Files selected for processing (1)
  • backend/server.py

Comment thread backend/server.py Outdated
Comment thread backend/server.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 218-225: The current code mutates boolean_questions into dicts for
scoring which changes the /get_boolq response shape; instead, create a separate
variable (e.g., boolean_questions_for_scoring) that normalizes strings to
{"question": q} only for passing into
score_and_rank_questions(boolean_questions_for_scoring, input_text), then merge
the returned scores/ranks back onto the original boolean_questions items
(preserving strings as strings and original dict shapes) before returning;
update the return path that currently uses boolean_questions to use the merged
result so the endpoint output schema remains unchanged when scoring is enabled.
- Line 92: The function score_and_rank_questions currently declared as
score_and_rank_questions(questions) has a call-signature mismatch with its
callers (they pass two arguments); fix by updating the function signature to
accept the second parameter (e.g., def score_and_rank_questions(questions,
context=None)) and adjust the implementation to use the new context parameter
(or ignore if None), ensuring all internal calls to the scorer (and any use of
the second arg) match the updated signature; specifically update the function
definition and any internal references inside score_and_rank_questions to accept
and handle the extra argument so the calls from the locations that pass two args
succeed without TypeError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0648e080-81cd-452f-bf31-6ebc310d8306

📥 Commits

Reviewing files that changed from the base of the PR and between bfe0652 and d9965fe.

📒 Files selected for processing (1)
  • backend/server.py

Comment thread backend/server.py Outdated
Comment thread backend/server.py Outdated
@Aditya30ag
Copy link
Copy Markdown
Contributor

Needs test!!!!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/test_scoring_feature.py`:
- Line 65: Tests are calling score_and_rank_questions with three positional args
but the function defined in server.py (score_and_rank_questions(questions,
_input_text=None)) only accepts two; update each call in
backend/test_scoring_feature.py to pass at most the questions and the optional
_input_text (or omit the second arg) and remove the extra third argument, and
also correct any lowercase false to Python's False so calls use the proper
boolean literal.
- Line 93: The test calls score_and_rank_questions with three positional
arguments and uses the undefined lowercase false; update the assertion to call
score_and_rank_questions with only the supported parameters and use Python
booleans if needed (e.g., replace the three-argument call with
score_and_rank_questions([], 0) or, if a boolean flag was intended, pass
True/False using the proper capitalized name). Ensure the change targets the
test assertion in test_scoring_feature.py referencing score_and_rank_questions
so it no longer triggers NameError or TypeError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 084faca6-6833-4cc6-a24c-02298ff66074

📥 Commits

Reviewing files that changed from the base of the PR and between d9965fe and ce31579.

📒 Files selected for processing (3)
  • backend/server.py
  • backend/test_scoring_feature.py
  • backend/test_server.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/test_server.py
  • backend/server.py

Comment thread backend/test_scoring_feature.py Outdated
Comment thread backend/test_scoring_feature.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/test_scoring_feature.py (1)

212-231: Add non-JSON request coverage for safe JSON parsing.

The PR objective includes request.get_json(silent=True) or {} hardening; current API tests only send JSON payloads.

Suggested test case
 class TestAPIValidation:
@@
     def test_invalid_input_text(self, client, payload):
         """Invalid input_text returns 400."""
         response = client.post('/get_mcq', json=payload)
         assert response.status_code == 400
+
+    def test_non_json_body_returns_400(self, client):
+        """Non-JSON body should be handled safely and return 400."""
+        response = client.post('/get_mcq', data="plain text", content_type="text/plain")
+        assert response.status_code == 400
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_scoring_feature.py` around lines 212 - 231, Tests only cover
JSON requests but the code now uses request.get_json(silent=True) or {} so we
must add a non-JSON request test to ensure the endpoint handles empty/invalid
JSON safely; add a new test (e.g., test_non_json_body_returns_400) that posts
plain text (Content-Type: text/plain or no JSON) to the /get_mcq endpoint and
asserts a 400 response, referencing the existing test patterns
(test_invalid_input_text and test_invalid_max_questions) to match style and
reuse the client fixture; ensure the request content is non-JSON (e.g.,
data="not json", headers={"Content-Type":"text/plain"}) so the silent JSON
parsing branch is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/test_scoring_feature.py`:
- Around line 175-187: Extend the invalid-types tests to assert that booleans
are rejected: update the test function test_invalid_types_rejected (and the
similar assertions around the other block mentioned) to include cases like
parse_max_questions({"max_questions": True}) and
parse_max_questions({"max_questions": False}) wrapped in
pytest.raises(ValueError); this targets the parse_max_questions function which
currently relies on isinstance(value, int) and ensures True/False do not pass as
valid integers.

---

Nitpick comments:
In `@backend/test_scoring_feature.py`:
- Around line 212-231: Tests only cover JSON requests but the code now uses
request.get_json(silent=True) or {} so we must add a non-JSON request test to
ensure the endpoint handles empty/invalid JSON safely; add a new test (e.g.,
test_non_json_body_returns_400) that posts plain text (Content-Type: text/plain
or no JSON) to the /get_mcq endpoint and asserts a 400 response, referencing the
existing test patterns (test_invalid_input_text and test_invalid_max_questions)
to match style and reuse the client fixture; ensure the request content is
non-JSON (e.g., data="not json", headers={"Content-Type":"text/plain"}) so the
silent JSON parsing branch is exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1326a0d-38ea-4c82-bd10-1d0410935064

📥 Commits

Reviewing files that changed from the base of the PR and between ce31579 and afb6fb2.

📒 Files selected for processing (1)
  • backend/test_scoring_feature.py

Comment thread backend/test_scoring_feature.py
@piyush06singhal
Copy link
Copy Markdown
Author

Hi @Aditya30ag

I’ve added comprehensive pytest test coverage for the feature as suggested.

This includes:

  • Unit tests for question scoring & ranking logic
  • Edge case handling (empty/small inputs)
  • Failure scenarios (evaluator exceptions & fallback behaviour)
  • API input validation for all endpoints
  • Integration test for scoring with top_k selection

All tests are properly mocked and designed to ensure stability without external dependencies.

Please let me know if you’d like any additional test cases or improvements. Happy to iterate further!

@Aditya30ag
Copy link
Copy Markdown
Contributor

Its looks too much Ai generated code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add Question Quality Scoring and Ranking Layer to Improve Output Quality

2 participants