Skip to content

abhishekCS0024/Finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Medical Reasoning LLM Fine-Tuning with Unsloth & LoRA

This project demonstrates fine-tuning a large language model (LLM) for medical clinical reasoning using Unsloth, LoRA, and Supervised Fine-Tuning (SFT).
The base model used is DeepSeek-R1-Distill-Llama-8B, optimized with 4-bit quantization for efficient training on limited GPU resources.


๐Ÿš€ Project Overview

The goal of this project is to:

  • Perform inference on a pretrained medical reasoning model
  • Fine-tune the model using chain-of-thought (CoT) medical datasets
  • Apply LoRA-based parameter-efficient fine-tuning
  • Validate performance before and after fine-tuning

๐Ÿง  Model & Dataset

Base Model

  • Model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  • Quantization: 4-bit
  • Context Length: 2048 tokens

Dataset

  • Source: FreedomIntelligence/medical-o1-reasoning-SFT
  • Language: English
  • Subset Used: First 500 training samples
  • Includes:
    • Medical questions
    • Complex Chain-of-Thought reasoning
    • Final clinical answers

๐Ÿ› ๏ธ Tech Stack

  • Python
  • Unsloth
  • Hugging Face Transformers
  • TRL (SFTTrainer)
  • LoRA (PEFT)
  • PyTorch
  • Datasets
  • Weights & Biases (wandb)

๐Ÿ“ฆ Installation

pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
pip install torch transformers datasets trl wandb huggingface_hub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors