Skip to content

meghanaNanuvala/RLHF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning with Human Feebback (RLHF): End-to-End RLHF Pipeline — from Pretraining to PPO

Python PyTorch Flask React Tailwind CSS NLP RLHF

Technologies & Key Concepts

  • Model stack: minGPT, GPT-2, custom transformer architectures
  • Datasets: TinyStories, OpenAI Summarize TL;DR, CarperAI/openai_summarize_comparisons
  • Training pipeline: pretraining, supervised fine-tuning (SFT), RL fine-tuning
  • RL methods: vanilla policy gradient, PPO, KL-divergence penalty, GAE (gamma=1, lambda=0.95)
  • Reward modeling: learned reward model, scalar reward head, Bradley-Terry pairwise ranking loss
  • Deployment: Flask web interface, React visualization front-end
  • Concepts: reward modeling, preference learning, KL-regularized RL, text summarization, sentiment steering

This repository contains an RLHF project by meghanaNanuvala. It includes two main parts:

  • RLHF-Part1: core RLHF model training and inference workflows.
  • RLHF-Visualizer: a React-based visualization app to explain and inspect the RLHF process.

What this repository does

This repository brings together a complete RLHF research project and a visualization interface:

  • RLHF-Part1 implements a full three-stage RLHF pipeline from scratch: pretraining, supervised fine-tuning, and RL fine-tuning.
  • RLHF-Visualizer provides a user-friendly view of how the RLHF components connect and how the training pipeline behaves.

The goal is to maintain this as a personal RLHF project by meghanaNanuvala, with clean branding and clear documentation.


RLHF-Part1

RLHF-Part1 is the core model project. It includes:

  • mingpt/: minimal GPT model implementation used for training and inference.
  • happy_gpt/: a Flask web app that loads both a pre-trained model and an RL fine-tuned model.
  • chargpt/: additional RL or model-related scripts.
  • summarize_rlhf/: tools for summarization and reward model evaluation.

Key behavior:

  • Implements a complete 3-stage RLHF pipeline on TinyStories: pretraining → supervised fine-tuning → RL fine-tuning.
  • Uses vanilla policy gradient with a KL-divergence penalty against a frozen reference model to steer generation toward positive sentiment based on VADER compound score.
  • Builds a PPO optimization loop for text summarization on the OpenAI Summarize TL;DR dataset, with a GPT-2 architecture (12-layer, 768-dim), clipped surrogate objective, GAE advantage estimation (gamma=1, lambda=0.95), and a separate value function.
  • Trains a learned reward model from human preference comparisons (CarperAI/openai_summarize_comparisons) by fine-tuning a GPT-2 transformer with a scalar reward head and Bradley-Terry ranking loss.
  • Uses the learned reward model to drive PPO optimization with KL-regularized rewards.
  • Serves a Flask web app from happy_gpt where users can compare output from pre-trained and RL fine-tuned story models.

RLHF-Visualizer

RLHF-Visualizer is a React front-end app that visualizes the RLHF pipeline.

It is built with:

  • React and react-router-dom
  • Tailwind CSS
  • react-scripts

This app is intended to show the RLHF process in a clearer way, making the training and reward pipeline easier to understand.


How to use this repo

RLHF-Part1

  1. Navigate to RLHF-Part1.
  2. Install the Python dependencies.
  3. Run the Flask app in happy_gpt.

Example:

cd /Users/mnanuva/Documents/RLHF/RLHF-Part1
pip install -r requirements.txt
cd happy_gpt
python app.py

If model files are missing, the app warns about missing files and will not load the corresponding model.

RLHF-Visualizer

  1. Navigate to RLHF/RLHF-Visualizer.
  2. Install the Node dependencies.
  3. Start the React app.
cd /Users/mnanuva/Documents/RLHF/RLHF-Visualizer
npm install
npm start

Repository structure

RLHF/
  README.md                # This file
  RLHF-Part1/              # Core RLHF model and runtime
    README.txt            # Existing quick instructions
    happy_gpt/            # Flask web app for story generation
    mingpt/               # Minimal GPT implementation
    summarize_rlhf/       # RLHF summarization utilities
  RLHF-Visualizer/         # React app for pipeline visualization
    public/
    src/
    package.json

Personal branding note

This repo is now a personal project for meghanaNanuvala. All visible documentation and branding should reflect that intent.

About

An End-to-end RLHF pipeline with pretraining, SFT, and PPO using minGPT and reward modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors