Skip to content

DanielWidrich/Medical-RAG-Assistant-Prototype

Repository files navigation

Medical-RAG-Assistant-Prototype

RAG-based medical assistant prototype that retrieves evidence from the Merck Manuals PDF to generate grounded clinical answers with evaluable outputs.

This project demonstrates retrieval-augmented generation (RAG) by grounding LLM responses in a large medical reference corpus (Merck Manuals) to reduce information overload and support faster clinical decision-making.

Business Problem

Healthcare professionals must make time-sensitive decisions while navigating an overwhelming volume of medical information. Reliably locating relevant, up-to-date clinical guidance is difficult under pressure, especially when the knowledge is spread across large manuals and research references.

This project builds a RAG-based AI assistant that enables clinicians to ask questions in natural language and receive answers grounded in authoritative medical content. The intent is decision support: improving access to information and standardizing references used during diagnostic and treatment planning.


Forensic Research & Key Findings

  • The system generally produces medically relevant, context-grounded responses when retrieval succeeds.
  • Output quality variability is driven more by generation limits and evaluation instability than by fundamental retrieval failure.
  • Truncation/incomplete answers indicate the need for larger max_tokens and careful tuning of retrieved context size.
  • Automated self-scoring of groundedness/relevance was noisy, suggesting improvements are needed in prompt formatting, parsing, and evaluation criteria.

System Overview

The solution follows a standard RAG pipeline:

  1. Document ingestion from a large PDF corpus (Merck Manuals).
  2. Chunking + embedding to create a searchable knowledge index.
  3. Retrieval (top-k) of the most relevant chunks for a given user query.
  4. Generation using an LLM constrained to the retrieved context.
  5. Evaluation via automated scoring plus qualitative review.

Recommendations

  • Stabilize evaluation by standardizing prompts, output parsing, and scoring rules before scaling.
  • Introduce human-in-the-loop review for clinical validation and risk control.
  • Improve completeness and consistency by tuning context size (k) and generation limits.
  • Consider a separate evaluator model to reduce bias in groundedness/relevance scoring.
  • Explore improved encoders/models for better retrieval precision and robustness.

Production-Oriented Parameters (Current Best)

The most stable decoding configuration tested:

  • k = 4
  • max_tokens = 1024
  • temperature = 0.1
  • top_p = 0.9
  • top_k = 40

Caveats

This prototype is intended for decision support and information retrieval, not autonomous clinical diagnosis. Outputs require clinical judgment and verification, and the system should be deployed with appropriate safeguards.


Repository Structure

  • Qwen_Full_Code_NLP_RAG_Project_Notebook.ipynb
    End-to-end implementation of the RAG pipeline, including PDF ingestion, text chunking, embedding generation, vector-based retrieval, LLM response generation, and automated groundedness/relevance evaluation.

Tools & Technologies

  • Python
  • NLP / Information Retrieval
  • Embeddings + Vector Search
  • LLM prompting (RAG)
  • PDF ingestion and chunking pipeline

About

RAG-based medical assistant prototype that retrieves evidence from the Merck Manuals PDF to generate grounded clinical answers with evaluable outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors