Medical Reasoning LLM Fine-Tuning with Unsloth & LoRA

This project demonstrates fine-tuning a large language model (LLM) for medical clinical reasoning using Unsloth, LoRA, and Supervised Fine-Tuning (SFT).
The base model used is DeepSeek-R1-Distill-Llama-8B, optimized with 4-bit quantization for efficient training on limited GPU resources.

🚀 Project Overview

The goal of this project is to:

Perform inference on a pretrained medical reasoning model
Fine-tune the model using chain-of-thought (CoT) medical datasets
Apply LoRA-based parameter-efficient fine-tuning
Validate performance before and after fine-tuning

🧠 Model & Dataset

Base Model

Model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Quantization: 4-bit
Context Length: 2048 tokens

Dataset

Source: FreedomIntelligence/medical-o1-reasoning-SFT
Language: English
Subset Used: First 500 training samples
Includes:
- Medical questions
- Complex Chain-of-Thought reasoning
- Final clinical answers

🛠️ Tech Stack

Python
Unsloth
Hugging Face Transformers
TRL (SFTTrainer)
LoRA (PEFT)
PyTorch
Datasets
Weights & Biases (wandb)

📦 Installation

pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
pip install torch transformers datasets trl wandb huggingface_hub

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
FineTuning.ipynb		FineTuning.ipynb
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Reasoning LLM Fine-Tuning with Unsloth & LoRA

🚀 Project Overview

🧠 Model & Dataset

Base Model

Dataset

🛠️ Tech Stack

📦 Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Medical Reasoning LLM Fine-Tuning with Unsloth & LoRA

🚀 Project Overview

🧠 Model & Dataset

Base Model

Dataset

🛠️ Tech Stack

📦 Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages