feat: Add question quality scoring and improve input validation by piyush06singhal · Pull Request #598 · AOSSIE-Org/EduAid

piyush06singhal · 2026-03-17T16:35:12Z

Addressed Issues:

Fixes #597

Summary

This PR introduces a Question Quality Scoring & Ranking system using the existing QAEvaluator, along with input validation improvements to enhance API robustness and output quality.

Key Changes

1. Question Quality Scoring & Ranking

Integrated QAEvaluator into main endpoints:
- /get_mcq
- /get_boolq
- /get_shortq
Added optional flag: "use_scoring": true
Updated pipeline:
Generate → Score → Rank → Slice → Return
When enabled, generates a larger pool (2×) and returns the top-N best questions

2. Input Validation & Safety

Added safe JSON parsing using request.get_json(silent=True) or {}
Validates input_text:
- must be a string
- must not be empty
Returns proper HTTP 400 for invalid inputs

3. Code Quality Improvements

Added structured logging for better debugging
Implemented reusable helper: score_and_rank_questions(...)
Ensured safe fallback if scoring fails
Handled edge cases (e.g., small question sets)

Benefits

Improved quality and relevance of generated questions
More consistent output across different inputs
Safer API handling (no crashes on invalid requests)
Clean and maintainable backend logic

Backward Compatibility

Default behaviour unchanged (use_scoring=False)
No breaking changes to API responses
Existing endpoints continue to work as before

Testing

Verified all endpoints with and without scoring
Tested invalid input handling and edge cases
No new warnings or runtime errors observed

Additional Notes

This feature leverages the already available QAEvaluator and integrates it into the main generation pipeline without introducing new dependencies or altering existing behaviour.

AI Usage Disclosure

This PR does not contain AI-generated code at all.
This PR contains AI-generated code. I have read the AI Usage Policy and tested the code locally.

I have used the following AI models and tools: ChatGPT

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have joined the Discord server and will share a link to this PR with the project maintainers there
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
I have filled this PR template completely and carefully

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Summary by CodeRabbit

New Features
- Optional question-quality scoring and ranking across question types (MCQ, boolean, short); when enabled the service generates extras and returns the top-ranked results.
Bug Fixes
- Stricter input validation with clear 400 responses for missing/invalid fields.
- Centralized logging and graceful fallbacks: scoring errors now log and return original results.
Tests
- Comprehensive tests covering scoring, ranking, validation, endpoints, and failure cases.

…andlings

coderabbitai · 2026-03-17T16:36:22Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an optional question-quality scoring and ranking layer to MCQ/BoolQ/ShortQ endpoints, plus logging, a thread-safe lazy QAEvaluator singleton, input validation helpers, scoring/ranking logic with safe fallbacks, and tests covering scoring behavior and edge cases.

Changes

Cohort / File(s)	Summary
Server core & helpers `backend/server.py`	Adds `logger` setup; thread-safe lazy `qa_evaluator` with `get_qa_evaluator()`; `parse_max_questions()` input validation; `score_and_rank_questions()` to encode, score, rank, and return reordered questions with exception-safe fallback; replaces startup print with `logger.info`.
Endpoints: scoring integration & validation `backend/server.py`	Updates `/get_mcq`, `/get_boolq`, `/get_shortq` to use `request.get_json(silent=True) or {}`, validate `input_text`, parse `max_questions`, accept `use_scoring` flag, request up to `max_questions * 2` when scoring, apply scoring when ≥3 items, slice to original `max_questions`; BoolQ adapts string items to dicts for scoring and unwraps them.
Endpoint tests: scoring mode invocation `backend/test_server.py`	Adds `test_get_mcq_with_scoring`, `test_get_boolq_with_scoring`, `test_get_shortq_with_scoring` and calls them in `__main__` to exercise scoring-enabled endpoints.
Scoring unit/integration tests `backend/test_scoring_feature.py`	New pytest suite validating `score_and_rank_questions`, `parse_max_questions`, mocked `QAEvaluator` behaviors (scores, exceptions, out-of-range), endpoint validation (good/bad payloads), and end-to-end scoring-enabled flows returning top-N ranked items.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint as /get_mcq|/get_boolq|/get_shortq
    participant Generator as QuestionGenerator
    participant Scorer as score_and_rank_questions
    participant QAEval as QAEvaluator
    participant Response

    Client->>Endpoint: POST (input_text, max_questions, use_scoring:true)
    Endpoint->>Endpoint: validate input_text, parse_max_questions
    alt invalid input_text or max_questions
        Endpoint->>Response: 400 Error
    else valid
        Endpoint->>Generator: generate questions (limit = max_questions × 2)
        Generator->>Endpoint: generated questions
        alt use_scoring == true and >= 3 questions
            Endpoint->>Scorer: score_and_rank_questions(questions)
            Scorer->>QAEval: encode_qa_pairs
            QAEval->>Scorer: encoded vectors
            Scorer->>QAEval: get_scores
            QAEval->>Scorer: scores / indices
            Scorer->>Scorer: sort by score, handle exceptions/fallback
            Scorer->>Endpoint: ranked questions
        end
        Endpoint->>Response: return top-N questions (JSON)
    end
    Response->>Client: JSON payload

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through lines to find the best,

I scored each prompt, then shelved the rest.
I ranked them tidy, neat, and bright —
A clever rabbit, pruning for insight. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the two main changes: adding question quality scoring and improving input validation, which are the primary focuses of the PR.
Linked Issues check	✅ Passed	The PR fully implements all core coding requirements from issue `#597`: integrates QAEvaluator for scoring, ranks questions by quality, makes scoring optional via 'use_scoring' flag, generates larger candidate pools (2×), and includes fallback behavior for edge cases.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the objectives from issue `#597`; no out-of-scope modifications detected. Logging setup and test coverage are supporting changes that enhance the core feature.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

backend/server.py (1)

89-91: Simplify answer extraction using dict.get.

This removes redundant membership checks and matches Ruff’s RUF019 hint.

Proposed fix

-            if 'answer' in q and q['answer']:
-                answer_texts.append(q['answer'])
-            elif 'options' in q and q['options']:
+            if q.get("answer"):
+                answer_texts.append(q.get("answer"))
+            elif q.get("options"):
                 # fallback only if answer missing
-                answer_texts.append(q['options'][0])
+                answer_texts.append(q.get("options")[0])
             else:
                 answer_texts.append('')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 89 - 91, Replace the membership checks on q
with dict.get to simplify extraction: in the block that appends to answer_texts
(referencing q and answer_texts) use q.get('answer') to fetch the answer and
fall back to q.get('options') for options instead of testing 'answer' in q /
'options' in q; ensure you keep the same truthiness checks (i.e., only append
when the retrieved value is truthy) and preserve the existing control flow that
handles answers vs options.

backend/test_server.py (1)

36-46: Strengthen scoring tests with behavioral assertions.

These tests currently only check presence of output, so ranking/slicing regressions can pass unnoticed. Assert response status, output type, and len(output) <= max_questions.

Proposed test enhancement

 def test_get_mcq_with_scoring():
@@
     response = make_post_request(endpoint, data)
     print(f'/get_mcq with scoring Response: {response}')
     assert 'output' in response
+    assert isinstance(response['output'], list)
+    assert len(response['output']) <= data['max_questions']

 def test_get_boolq_with_scoring():
@@
     response = make_post_request(endpoint, data)
     print(f'/get_boolq with scoring Response: {response}')
     assert 'output' in response
+    assert isinstance(response['output'], list)
+    assert len(response['output']) <= data['max_questions']

 def test_get_shortq_with_scoring():
@@
     response = make_post_request(endpoint, data)
     print(f'/get_shortq with scoring Response: {response}')
     assert 'output' in response
+    assert isinstance(response['output'], list)
+    assert len(response['output']) <= data['max_questions']

Also applies to: 57-67, 78-88

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` around lines 36 - 46, The test_get_mcq_with_scoring
currently only checks that 'output' exists; update it (and the similar tests at
the other ranges) to assert the HTTP success status, output type, and length
constraints: after calling make_post_request(endpoint, data) assert the response
indicates success (e.g., response.get('status_code') == 200 or
response.get('status') == 'ok'), assert isinstance(response['output'], list),
and assert len(response['output']) <= data['max_questions']; keep references to
test_get_mcq_with_scoring, endpoint '/get_mcq', make_post_request and the
response['output'] field so the assertions are added in the same test and
mirrored in the other tests mentioned.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 139-144: The current handlers only slice questions to
max_questions inside the scoring branch, so when use_scoring=True but
len(questions)<3 the result can exceed max_questions; update the logic in the
affected endpoints (the /get_boolq and /get_shortq handlers and the block using
use_scoring, questions, max_questions, and score_and_rank_questions) to always
enforce questions = questions[:max_questions] after any ranking/processing step
(i.e., perform the slicing outside or after the conditional that calls
score_and_rank_questions) so the returned list never exceeds max_questions even
when scoring is skipped.
- Around line 125-132: Validate and normalize max_questions before using it in
arithmetic: parse the incoming data["max_questions"] into an integer (or reject
if it cannot be parsed), ensure it is a positive int (>0), and if not return a
400 response with a clear error message; then use the validated variable (e.g.,
validated_max_questions) when computing generation_limit =
validated_max_questions * 2 if use_scoring else validated_max_questions. Apply
this change wherever generation_limit is computed (the blocks that reference
max_questions, use_scoring, and generation_limit) and ensure
process_input_text(input_text, use_mediawiki) remains unchanged.
- Around line 86-88: The loop that builds scoring inputs uses q.get('question',
'') which misses generator outputs that populate 'question_statement'; update
the logic in the for q in questions loop that appends to question_texts to
prefer q.get('question_statement') and fall back to q.get('question', '') (or
empty string) so scoring uses the actual question text produced by the
MCQ/ShortQ generators; locate the append to question_texts in backend/server.py
and replace the key access accordingly.
- Around line 170-174: The request handler for /get_boolq currently normalizes
boolean_questions into dicts unconditionally (the boolean_questions list
comprehension), which changes the response shape even when scoring is disabled;
change the logic so the original boolean_questions variable is preserved for
non-scoring responses and only create a normalized scoring_payload (e.g.,
scoring_payload = [{"question": q} if isinstance(q, str) else q for q in
boolean_questions]) inside the scoring branch where scoring is performed (the
code that calls the scorer), leaving the top-level boolean_questions list
unchanged and returning the original raw list when scoring is not requested.
- Line 53: The file currently creates qa_evaluator = main.QAEvaluator() at
import time which forces heavy model load even when use_scoring is false; change
this to lazy initialization by setting qa_evaluator = None at module scope and
instantiating main.QAEvaluator() on first scoring request (inside the scoring
handler function that checks use_scoring) while wrapping the construction in a
try/except to catch model load failures, set qa_evaluator back to None on error,
and return a clean fallback response or error message when use_scoring is
disabled or the model failed to load.

---

Nitpick comments:
In `@backend/server.py`:
- Around line 89-91: Replace the membership checks on q with dict.get to
simplify extraction: in the block that appends to answer_texts (referencing q
and answer_texts) use q.get('answer') to fetch the answer and fall back to
q.get('options') for options instead of testing 'answer' in q / 'options' in q;
ensure you keep the same truthiness checks (i.e., only append when the retrieved
value is truthy) and preserve the existing control flow that handles answers vs
options.

In `@backend/test_server.py`:
- Around line 36-46: The test_get_mcq_with_scoring currently only checks that
'output' exists; update it (and the similar tests at the other ranges) to assert
the HTTP success status, output type, and length constraints: after calling
make_post_request(endpoint, data) assert the response indicates success (e.g.,
response.get('status_code') == 200 or response.get('status') == 'ok'), assert
isinstance(response['output'], list), and assert len(response['output']) <=
data['max_questions']; keep references to test_get_mcq_with_scoring, endpoint
'/get_mcq', make_post_request and the response['output'] field so the assertions
are added in the same test and mirrored in the other tests mentioned.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e354f48-9e0a-4e02-acdc-ef7ecf1c992f

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and dc55dae.

📒 Files selected for processing (2)

backend/server.py
backend/test_server.py

…ss endpoints

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

backend/server.py (2)

88-99: Unused parameter: context is documented but never used.

The context parameter is defined in the signature and documented in the docstring, but it's not referenced anywhere in the function body.

♻️ Either remove the parameter or use it for enhanced scoring

If the parameter is intended for future use, consider removing it for now and adding it when needed:

-def score_and_rank_questions(questions, context):
+def score_and_rank_questions(questions):
     """
     Use QAEvaluator to score and rank questions based on quality.
     Returns sorted list with best questions first.
     
     Args:
         questions: List of question dictionaries
-        context: The input text context
         
     Returns:
         List of questions sorted by quality score (best first)
     """

Then update call sites to remove the input_text argument.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 88 - 99, The function
score_and_rank_questions currently declares and documents the parameter context
but never uses it; either remove the unused parameter from the signature and
docstring in score_and_rank_questions and update all call sites that pass
input_text, or incorporate context into the scoring logic (e.g., pass context
into QAEvaluator calls inside score_and_rank_questions and update the docstring
to reflect its use); ensure the function signature, docstring, and any callers
are kept in sync to avoid lint/type errors.

55-59: Race condition in lazy initialization.

Two concurrent requests could both see qa_evaluator is None and instantiate QAEvaluator() simultaneously. Given the heavy model loading (BERT from HuggingFace), this wastes resources and could cause memory issues.

♻️ Thread-safe lazy initialization

+import threading
+
 qa_evaluator = None
+_qa_evaluator_lock = threading.Lock()

 def get_qa_evaluator():
     global qa_evaluator
-    if qa_evaluator is None:
-        qa_evaluator = main.QAEvaluator()
+    if qa_evaluator is None:
+        with _qa_evaluator_lock:
+            if qa_evaluator is None:
+                qa_evaluator = main.QAEvaluator()
     return qa_evaluator

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 55 - 59, The lazy initializer
get_qa_evaluator has a race where multiple threads can instantiate
main.QAEvaluator simultaneously; fix by introducing a module-level
threading.Lock (e.g., qa_evaluator_lock) and use double-checked locking inside
get_qa_evaluator: first check qa_evaluator is None, acquire the lock, check
again, and only then set qa_evaluator = main.QAEvaluator(); release the lock and
return qa_evaluator so only one heavy model load occurs. Ensure the lock is
initialized at module import alongside the qa_evaluator variable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 112-122: The for-loop starting with "for q in questions:" is
mis-indented and causing a Python IndentationError; adjust the indentation so
the "for q in questions:" line aligns with its surrounding block (the enclosing
try/except or function scope) and not indented one extra space, ensuring
subsequent lines that append to question_texts and answer_texts remain
consistently indented under that for loop; locate the block by searching for the
symbols "for q in questions:", "question_texts", and "answer_texts" and fix
indentation to match the surrounding code structure.
- Around line 209-217: boolean_questions is a list of plain strings (from
beam_search_decoding) but score_and_rank_questions expects dicts with keys like
"question" or "question_statement", so convert each string in boolean_questions
into a dict (e.g., {"question": <string>}) before calling
score_and_rank_questions; update the block that calls score_and_rank_questions
to map/transform boolean_questions into the expected dict shape (refer to how
MCQ formatting is done) and then pass the transformed list into
score_and_rank_questions(boolean_questions, input_text) while keeping the final
slicing boolean_questions = boolean_questions[:max_questions].

---

Nitpick comments:
In `@backend/server.py`:
- Around line 88-99: The function score_and_rank_questions currently declares
and documents the parameter context but never uses it; either remove the unused
parameter from the signature and docstring in score_and_rank_questions and
update all call sites that pass input_text, or incorporate context into the
scoring logic (e.g., pass context into QAEvaluator calls inside
score_and_rank_questions and update the docstring to reflect its use); ensure
the function signature, docstring, and any callers are kept in sync to avoid
lint/type errors.
- Around line 55-59: The lazy initializer get_qa_evaluator has a race where
multiple threads can instantiate main.QAEvaluator simultaneously; fix by
introducing a module-level threading.Lock (e.g., qa_evaluator_lock) and use
double-checked locking inside get_qa_evaluator: first check qa_evaluator is
None, acquire the lock, check again, and only then set qa_evaluator =
main.QAEvaluator(); release the lock and return qa_evaluator so only one heavy
model load occurs. Ensure the lock is initialized at module import alongside the
qa_evaluator variable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4766a5e-072d-4cb0-bb94-a6823f3af48b

📥 Commits

Reviewing files that changed from the base of the PR and between dc55dae and bfe0652.

📒 Files selected for processing (1)

backend/server.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 218-225: The current code mutates boolean_questions into dicts for
scoring which changes the /get_boolq response shape; instead, create a separate
variable (e.g., boolean_questions_for_scoring) that normalizes strings to
{"question": q} only for passing into
score_and_rank_questions(boolean_questions_for_scoring, input_text), then merge
the returned scores/ranks back onto the original boolean_questions items
(preserving strings as strings and original dict shapes) before returning;
update the return path that currently uses boolean_questions to use the merged
result so the endpoint output schema remains unchanged when scoring is enabled.
- Line 92: The function score_and_rank_questions currently declared as
score_and_rank_questions(questions) has a call-signature mismatch with its
callers (they pass two arguments); fix by updating the function signature to
accept the second parameter (e.g., def score_and_rank_questions(questions,
context=None)) and adjust the implementation to use the new context parameter
(or ignore if None), ensuring all internal calls to the scorer (and any use of
the second arg) match the updated signature; specifically update the function
definition and any internal references inside score_and_rank_questions to accept
and handle the extra argument so the calls from the locations that pass two args
succeed without TypeError.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0648e080-81cd-452f-bf31-6ebc310d8306

📥 Commits

Reviewing files that changed from the base of the PR and between bfe0652 and d9965fe.

📒 Files selected for processing (1)

backend/server.py

Aditya30ag · 2026-03-19T13:39:04Z

Needs test!!!!

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/test_scoring_feature.py`:
- Line 65: Tests are calling score_and_rank_questions with three positional args
but the function defined in server.py (score_and_rank_questions(questions,
_input_text=None)) only accepts two; update each call in
backend/test_scoring_feature.py to pass at most the questions and the optional
_input_text (or omit the second arg) and remove the extra third argument, and
also correct any lowercase false to Python's False so calls use the proper
boolean literal.
- Line 93: The test calls score_and_rank_questions with three positional
arguments and uses the undefined lowercase false; update the assertion to call
score_and_rank_questions with only the supported parameters and use Python
booleans if needed (e.g., replace the three-argument call with
score_and_rank_questions([], 0) or, if a boolean flag was intended, pass
True/False using the proper capitalized name). Ensure the change targets the
test assertion in test_scoring_feature.py referencing score_and_rank_questions
so it no longer triggers NameError or TypeError.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 084faca6-6833-4cc6-a24c-02298ff66074

📥 Commits

Reviewing files that changed from the base of the PR and between d9965fe and ce31579.

📒 Files selected for processing (3)

backend/server.py
backend/test_scoring_feature.py
backend/test_server.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/test_server.py
backend/server.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/test_scoring_feature.py (1)

212-231: Add non-JSON request coverage for safe JSON parsing.

The PR objective includes request.get_json(silent=True) or {} hardening; current API tests only send JSON payloads.

Suggested test case

 class TestAPIValidation:
@@
     def test_invalid_input_text(self, client, payload):
         """Invalid input_text returns 400."""
         response = client.post('/get_mcq', json=payload)
         assert response.status_code == 400
+
+    def test_non_json_body_returns_400(self, client):
+        """Non-JSON body should be handled safely and return 400."""
+        response = client.post('/get_mcq', data="plain text", content_type="text/plain")
+        assert response.status_code == 400

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/test_scoring_feature.py` around lines 212 - 231, Tests only cover
JSON requests but the code now uses request.get_json(silent=True) or {} so we
must add a non-JSON request test to ensure the endpoint handles empty/invalid
JSON safely; add a new test (e.g., test_non_json_body_returns_400) that posts
plain text (Content-Type: text/plain or no JSON) to the /get_mcq endpoint and
asserts a 400 response, referencing the existing test patterns
(test_invalid_input_text and test_invalid_max_questions) to match style and
reuse the client fixture; ensure the request content is non-JSON (e.g.,
data="not json", headers={"Content-Type":"text/plain"}) so the silent JSON
parsing branch is exercised.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/test_scoring_feature.py`:
- Around line 175-187: Extend the invalid-types tests to assert that booleans
are rejected: update the test function test_invalid_types_rejected (and the
similar assertions around the other block mentioned) to include cases like
parse_max_questions({"max_questions": True}) and
parse_max_questions({"max_questions": False}) wrapped in
pytest.raises(ValueError); this targets the parse_max_questions function which
currently relies on isinstance(value, int) and ensures True/False do not pass as
valid integers.

---

Nitpick comments:
In `@backend/test_scoring_feature.py`:
- Around line 212-231: Tests only cover JSON requests but the code now uses
request.get_json(silent=True) or {} so we must add a non-JSON request test to
ensure the endpoint handles empty/invalid JSON safely; add a new test (e.g.,
test_non_json_body_returns_400) that posts plain text (Content-Type: text/plain
or no JSON) to the /get_mcq endpoint and asserts a 400 response, referencing the
existing test patterns (test_invalid_input_text and test_invalid_max_questions)
to match style and reuse the client fixture; ensure the request content is
non-JSON (e.g., data="not json", headers={"Content-Type":"text/plain"}) so the
silent JSON parsing branch is exercised.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1326a0d-38ea-4c82-bd10-1d0410935064

📥 Commits

Reviewing files that changed from the base of the PR and between ce31579 and afb6fb2.

📒 Files selected for processing (1)

backend/test_scoring_feature.py

piyush06singhal · 2026-03-19T15:57:29Z

Hi @Aditya30ag

I’ve added comprehensive pytest test coverage for the feature as suggested.

This includes:

Unit tests for question scoring & ranking logic
Edge case handling (empty/small inputs)
Failure scenarios (evaluator exceptions & fallback behaviour)
API input validation for all endpoints
Integration test for scoring with top_k selection

All tests are properly mocked and designed to ensure stability without external dependencies.

Please let me know if you’d like any additional test cases or improvements. Happy to iterate further!

Aditya30ag · 2026-03-20T06:01:11Z

Its looks too much Ai generated code!

fix: improve scoring safety, input validation, and boolean question h…

dc55dae

…andlings

coderabbitai Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread backend/server.py Outdated

Comment thread backend/server.py

Comment thread backend/server.py Outdated

Comment thread backend/server.py

Comment thread backend/server.py Outdated

fix: validate max_questions input and enforce consistent slicing acro…

bfe0652

…ss endpoints

coderabbitai Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread backend/server.py Outdated

Comment thread backend/server.py

fix: ensure boolean question are properly formatted for scoring

d9965fe

coderabbitai Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread backend/server.py Outdated

Comment thread backend/server.py Outdated

piyush06singhal and others added 2 commits March 17, 2026 23:06

fixed the coderabbit issues

383b024

Merge branch 'main' into feature/question-quality-scoring

d915055

added pytest tests for scoring logic, edge cases, and Api validation

ce31579

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread backend/test_scoring_feature.py Outdated

Comment thread backend/test_scoring_feature.py Outdated

fix: resolve test signature issues and improve test coverage

afb6fb2

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread backend/test_scoring_feature.py

reject boolean values in max_question validation

5c8e86a

Uh oh!

Conversation

piyush06singhal commented Mar 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issues:

Summary

Key Changes

1. Question Quality Scoring & Ranking

2. Input Validation & Safety

3. Code Quality Improvements

Benefits

Backward Compatibility

Testing

Additional Notes

AI Usage Disclosure

Checklist

AI Usage Disclosure:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Aditya30ag commented Mar 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

piyush06singhal commented Mar 19, 2026

Uh oh!

Aditya30ag commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

piyush06singhal commented Mar 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 17, 2026 •

edited

Loading