A reinforcement learning environment for optimizing Retrieval-Augmented
A reinforcement learning environment for optimizing Retrieval-Augmented
Generation (RAG) parameters. Agents learn to select optimal chunk_size and
top_k values to maximize retrieval performance.
This environment simulates a RAG parameter optimization task where agents must discover the optimal configuration:
- Optimal chunk_size: 300
- Optimal top_k: 5
The environment scores actions based on how close the selected parameters are to these optimal values.
- RL-style environment for parameter optimization
- Deterministic reward function based on distance from optimal configuration
- Multi-task setup with increasing difficulty
- LLM proxy integration for OpenEnv validation
- FastAPI + OpenEnv server for scalable interaction
- Docker-ready and deployable on Hugging Face Spaces
We define RAG tuning as:
- State : Current retrieval configuration
- Action : Adjust
(chunk_size, top_k) - Reward : Retrieval quality score
- Goal : Maximize reward to reach task-specific thresholds
Three tasks of varying difficulty:
| Task | Target Score | Description |
|---|---|---|
baseline_retrieval |
0.5 | Easy - suboptimal parameters work |
parameter_tuning |
0.7 | Medium - requires good parameters |
optimal_rag |
0.85 | Hard - requires near-optimal params |
chunk_size: int # Size of document chunks
top_k: int # Number of retrieved documentsOptimal configuration:
- chunk_size = 300
- top_k = 5
size_err = abs(chunk_size - 300) / 700.0
k_err = abs(top_k - 5) / 5.0
raw_score = 1.0 - (size_err + k_err) / 2.0Scores are clamped to (0.01, 0.99) to satisfy strict validator constraints.
- Python 3.11+
- Docker (for containerized deployment)
- UV package manager (recommended)
### Clone repository
git clone https://github.com/AyushGupta16/RAG_Optimizer.git
cd RAG_Optimizer
### Install dependencies
uv sync#### Or with pip
pip install -r requirements.txt
#### Set up environment variables
cp .env.example .env
#### Edit .env with your API credentials#### Required for LLM proxy (validator checks this)
API_BASE_URL=https://your-llm-proxy.com/v1
API_KEY=your_api_key_here
MODEL_NAME=gpt-4o-mini
# Local development
ENV_BASE=http://127.0.0.1:8000# Terminal 1: Start environment server
python server/app.py --host 127.0.0.1 --port 8000
# Terminal 2: Run inference
```bash
python inference.py[DEBUG] LLM proxy call succeeded for task=baseline_retrieval
[START] task=baseline_retrieval env=rag_optimizer model=gpt-4o-mini
[STEP] step=1 action={"chunk_size": 500, "top_k": 3} reward=0.657143 done=true error=none
[END] success=true steps=1 score=0.657143 rewards=0.657143
[DEBUG] LLM proxy call succeeded for task=parameter_tuning
[START] task=parameter_tuning env=rag_optimizer model=gpt-4o-mini
[STEP] step=1 action={"chunk_size": 350, "top_k": 4} reward=0.864286 done=true error=none
[END] success=true steps=1 score=0.864286 rewards=0.864286
[DEBUG] LLM proxy call succeeded for task=optimal_rag
[START] task=optimal_rag env=rag_optimizer model=gpt-4o-mini
[STEP] step=1 action={"chunk_size": 300, "top_k": 5} reward=0.990000 done=true error=none
[END] success=true steps=1 score=0.990000 rewards=0.990000Run local validation before submitting:
# Check environment setup
openenv validate
# Expected output:
# [OK] RAG_Optimizer: Ready for multi-mode deploymentinference.py
|
| 1. LLM proxy call using API_BASE_URL + API_KEY
| 2. POST /reset and POST /step
v
server/app.py
|
| OpenEnv create_app(...)
v
server/rag_optimizer_environment.py
|
| - restore persistent state
| - evaluate (chunk_size, top_k)
| - call LLM proxy for validator compliance
| - compute score and done flag
v
models.py
|
| - RagOptimizerAction
| - RagOptimizerObservation
| - RagOptimizerState
v
Reward + Observation returned to inference.py
RAG_Optimizer/
├── server/
│ ├── __init__.py
│ ├── app.py
│ ├── Dockerfile
│ ├── rag_optimizer_environment.py
│ └── requirements.txt
├── outputs/
├── .venv/
├── __init__.py
├── .env
├── .gitignore
├── client.py
├── inference.py
├── models.py
├── openenv.yaml
├── pyproject.toml
├── README.md
└── uv.lock
server/app.pycreates the FastAPI/OpenEnv server.server/rag_optimizer_environment.pycontains the environment logic, task targets, scoring, and state persistence.server/Dockerfilepackages the environment for Hugging Face Spaces and validator execution.server/requirements.txtlists runtime dependencies for server builds.inference.pyruns the benchmark tasks, calls the LLM proxy, and prints validator-compatible logs.models.pydefines action, observation, and state schemas.client.pyprovides a reusable client wrapper for interacting with the environment.openenv.yamlregisters the environment and task metadata.pyproject.tomldefines package metadata and dependencies.uv.locklocks dependency versions for reproducible builds..envis for local development only and should not be committed with secrets.outputs/stores local run artifacts if generated.
The environment uses class-level variables to persist state across HTTP requests:
class RagOptimizerEnvironment(Environment):
_current_target: float = 0.85
_current_episode_id: Optional[str] = None
_current_step_count: int = 0This ensures that episode state survives server framework instance recreation
between /reset and /step.
RAG_Optimizer/
├── server/
│ ├── app.py
│ ├── __init__.py
│ └── rag_optimizer_environment.py
├── client.py
├── inference.py
├── models.py
├── openenv.yaml
├── pyproject.toml
├── README.md
└── server/requirements.txt
The environment computes scores based on normalized distance from optimal parameters:
size_error = abs(chunk_size - 300) / 700
k_error = abs(top_k - 5) / 5
raw_score = 1.0 - (size_error + k_error) / 2Scores are clamped to (0.01, 0.99) to satisfy validator requirements
(strictly between 0 and 1).
Reset environment for a new episode.
Request:
{
"task_id": "optimal_rag",
"episode_id": "optional-uuid"
}Response:
{
"observation": {
"retrieval_score": 0.01,
"message": "Environment reset. Task: optimal_rag | target=0.85"
},
"reward": 0.01,
"done": false
}Execute an action in the environment.
Request:
{
"action": {
"chunk_size": 300,
"top_k": 5
}
}Response:
{
"observation": {
"retrieval_score": 0.99,
"message": "Step 1: Score 0.990000"
},
"reward": 0.99,
"done": true
}The project uses the injected validator credentials:
API_BASE_URLAPI_KEYMODEL_NAME
The proxy is called from:
inference.pyonce per taskserver/rag_optimizer_environment.pyduringstep()
This satisfies the LLM criteria checks while keeping the environment deterministic.
# Lint code
ruff check .
# Format code
ruff format .
# Type check
mypy .To add a new difficulty level:
-
Update
TASK_TARGETSinrag_optimizer_environment.py:TASK_TARGETS = { "baseline_retrieval": 0.5, "parameter_tuning": 0.7, "optimal_rag": 0.85, "expert_rag": 0.95, # New task }
-
Update
TASK_TARGETSininference.py -
Add test case to
inference.py:tasks = [ # ... existing tasks ... ("expert_rag", {"chunk_size": 300, "top_k": 5}), ]
- Check
API_BASE_URLandAPI_KEYin.env - Verify proxy endpoint is accessible
- Check API quota/rate limits
- Ensure environment clamps scores to
(0.01, 0.99) - Check
grader_safe_score()function - Verify no exact 0.0 or 1.0 values in logs
- Verify class variables are used (not instance variables)
- Check
_current_*variables are updated inreset() - Ensure
step()restores state from class variables
- Python
- FastAPI
- OpenEnv
- OpenAI Python SDK
- Docker
- Hugging Face Spaces
- ML Systems + RL environment design
- Backend engineering with FastAPI + OpenEnv
- Validator-compliant LLM proxy integration
- Stateful environment handling across HTTP requests
- Reproducible deployment through Docker and Hugging Face Spaces
- Let the LLM suggest actions dynamically instead of fixed task actions
- Integrate real vector databases (FAISS, Pinecone)
- Replace synthetic reward with retrieval metrics (Recall@k, MRR)
- Multi-step RL optimization (instead of single-step)
- Learnable policies via PPO / DQN
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
MIT License - see LICENSE file for details
- Built with OpenEnv
- Part of the OpenEnv Hackathon 2026
- Environment design inspired by RAG optimization research
- Author: Ayush Gupta
- GitHub: @AyushGupta16
- Email: ayushgupta0616@gmail.com
If you use this environment in your research, please cite:
@misc{rag-optimizer-2026,
author = {Gupta, Ayush},
title = {RAG Optimizer Environment},
year = {2026},
publisher = {GitHub},
url = {<https://github.com/AyushGupta16/RAG_Optimizer>}
}