This project demonstrates fine-tuning a large language model (LLM) for medical clinical reasoning using Unsloth, LoRA, and Supervised Fine-Tuning (SFT).
The base model used is DeepSeek-R1-Distill-Llama-8B, optimized with 4-bit quantization for efficient training on limited GPU resources.
The goal of this project is to:
- Perform inference on a pretrained medical reasoning model
- Fine-tune the model using chain-of-thought (CoT) medical datasets
- Apply LoRA-based parameter-efficient fine-tuning
- Validate performance before and after fine-tuning
- Model:
deepseek-ai/DeepSeek-R1-Distill-Llama-8B - Quantization: 4-bit
- Context Length: 2048 tokens
- Source:
FreedomIntelligence/medical-o1-reasoning-SFT - Language: English
- Subset Used: First 500 training samples
- Includes:
- Medical questions
- Complex Chain-of-Thought reasoning
- Final clinical answers
- Python
- Unsloth
- Hugging Face Transformers
- TRL (SFTTrainer)
- LoRA (PEFT)
- PyTorch
- Datasets
- Weights & Biases (wandb)
pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
pip install torch transformers datasets trl wandb huggingface_hub