Skip to content

fahadsid1770/Automated-Chest-X-Ray-Report-Generation-Using-Vision-Language-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Chest X-Ray Report Generation Using Vision-Language Models

A research project exploring automated generation of chest X-ray diagnostic reports using various Vision-Language Models (VLMs) and CNN backbones.

Project Structure

.
├── notebooks/                    # Jupyter notebooks
│   ├── finetuning/              # Fine-tuning experiments
│   ├── inference/               # Inference scripts
│   └── data_generation/         # Dataset generation
├── models/                       # Trained CNN models
│   ├── ResNet50 - Chest XRay_
│   ├── EfficientNet - Chest XRay_
│   └── ... (other CNN models)
├── data/                         # Datasets
│   ├── raw/                      # Raw data files
│   └── processed/               # Processed datasets
├── src/                          # Source code
│   └── augmentation.py          # Data augmentation utilities
├── docs/                         # Documentation
│   └── paper_draft/             # Research paper
└── README.md

Approaches Tested

CNN Backbones

  • ResNet50 - 50-layer residual network
  • EfficientNet - Efficient architecture
  • VGG16 - Classic 16-layer network
  • InceptionV3 - Inception module architecture
  • MobileNet - Lightweight mobile-optimized network

Vision-Language Models

  • LLaMa 3.2 (11B) - Meta's large language model
  • Qwen3-VL-8B-Instruct - Alibaba's vision-language model
  • Ministral-3-3B-Instruct - Mistral AI's compact VLM

Getting Started

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended)
  • Google Colab (for notebook execution)

Usage

  1. Fine-tuning a model: Open notebooks in notebooks/finetuning/ and run cells sequentially

  2. Running inference: Use notebooks in notebooks/inference/ with your trained models

Datasets

  • Fine-tuning dataset: 1300+ chest X-ray images with corresponding reports
  • Augmented dataset for improved generalization

Results

This project compares different backbone-VLM combinations to determine optimal architectures for medical report generation. Detailed results are available in the docs folder.

License

For academic/research purposes.

About

A research project exploring automated generation of chest X-ray diagnostic reports using various Vision-Language Models (VLMs) and CNN backbones.

Topics

Resources

Stars

Watchers

Forks

Contributors