This repository contains a complete pipeline to train reinforcement learning agents for text chunk selection, apply large language models (LLMs) for response generation, and evaluate the quality of those responses using various metrics.
🧠 This project is model-agnostic — it works with any LLM to improve its results through smarter input selection.
.
├── 1-dataset-download.py
├── 2-dataset-cleaning.ipynb
├── 3-collecting-sample-dataset.ipynb
├── 4-rl-model-parallel-train.py
├── 5-rl-training-charts.ipynb
├── 6-generate-response-for-query-with-selected-chunks.py
├── 7-calculating-metrics-ragas-bert-bleu-rouge-cosine.ipynb
├── llmFunctions.py
├── RL_environment.py
└── README.md
Downloads the CRAG dataset (split into parts), merges and extracts it, then converts all .json/.jsonl files to .parquet. It also combines them into a single file for easier processing.
Cleans and filters the raw dataset into a more usable format, preparing it for reinforcement learning and LLM response tasks.
Samples a subset of the dataset for per-query training (e.g., selecting 50 queries per domain). Produces a sampled .parquet file used in the RL training.
Trains reinforcement learning models (PPO, Recurrent PPO, DDPG, SAC) in parallel using multiple GPUs. Each model learns to select the most relevant chunks for a given query using a custom Gym environment (FixedChunkEnvGranularReward).
- Tracks emissions with CodeCarbon
- Uses
torch.multiprocessingfor parallelism - Saves TensorBoard logs and trained model checkpoints
Visualizes training metrics and performance (e.g., episode rewards, loss curves) for RL models using the saved logs.
Applies LLMs (Gemini or OpenAI-compatible) to generate responses for each query using the chunks selected (e.g., via BM25, FAISS, RL, or random). Can be parameterized for different selection strategies and models.
Evaluates the quality of generated LLM responses using:
- RAGAs
- BERTScore
- BLEU
- ROUGE
- Cosine similarity
Also computes token usage.
Defines wrapper functions for generating and cleaning LLM responses using:
- OpenAI-compatible models (e.g., DeepSeek)
- Google Vertex AI Gemini
Includes:
get_response_from_llmget_response_from_llm_geminiclean_responseclean_llm_response
Defines the custom Gym environment FixedChunkEnvGranularReward, which:
- Splits documents into chunks
- Rewards chunk selections based on similarity to the query
- Supports both discrete and continuous action spaces
- Implements a granular reward function
python 1-dataset-download.pyOpen and run:
2-dataset-cleaning.ipynb3-collecting-sample-dataset.ipynb
python 4-rl-model-parallel-train.pyOpen and run:
5-rl-training-charts.ipynb
Edit and run:
python 6-generate-response-for-query-with-selected-chunks.pyOpen and run:
7-calculating-metrics-ragas-bert-bleu-rouge-cosine.ipynb
pandastqdmtorch,stable-baselines3,sb3-contribgymnasium,spacycodecarbonpyarrowopenai,vertexai,google-authscikit-learn,evaluate,transformers, etc.
Use a requirements file or environment manager (like conda) for reproducibility.
- Trained RL models (
.zip) - TensorBoard logs
- Generated
.parquetfiles with LLM responses - Metric evaluation reports
-
You must configure your OpenAI or DeepSeek API keys and GCP credentials before running the LLM code.
-
Make sure
en_core_web_md(spaCy) is installed:python -m spacy download en_core_web_md
Let me know if you want this turned into a requirements.txt, environment.yml, or Makefile for easier automation!
