AlphaApollo is an agentic reasoning framework that orchestrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a broad range of paradigms, including tool-integrated reasoning, agentic post-training (e.g., multi-turn supervised fine-tuning and reinforcement learning), and agentic self-evolution. The framework offers extensible environments and toolsets for easy customization, extension, and scalable deployment of agentic reasoning workflows.
- [2026.01] We are excited to release AlphaApollo, an agentic LLM reasoning system for advanced reasoning.
- [2025.10] Our technical report is released; see here for details.
conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo
git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo
bash installation.sh- Tool-integrated reasoning rollout with seamless environment interaction
- Dynamic memory updates for multi-turn reasoning
- Multi-turn supervised fine-tuning (SFT)
- Reinforcement learning algorithms: GRPO, PPO, DAPO, and more
- Multi-round, multi-model solution refinement with shared state
- Iterative improvement via feedback and executable checks
- Python interpreter
- Retrieval-Augmented Generation (RAG)
Detailed quick-start commands (including script entrypoints) are documented in quick-start.md.
Note: Before using the local RAG module, please follow RAG Service Setup.
# no-tool reasoning
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=math-ai/aime24# tool-integrated reasoning
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=math-ai/aime24 \
--env.informal_math.enable_python_code=true \
--env.informal_math.enable_local_rag=false \
--env.max_steps=4Single-question evaluation:
# Select specific dataset samples (e.g., the 0th AIME test question) and test
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.module=alphaapollo.data_preprocess.prepare_custom_data \
--preprocess.data_source=math-ai/aime24 \
--preprocess.splits=test \
--preprocess.sample_indices=0 \
--data.path=~/data/custom_data/test.parquet# Directly evaluate a plain text question (not from a dataset)
python3 -m alphaapollo.workflows.test \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.module=alphaapollo.data_preprocess.prepare_single_question \
--preprocess.question_text="What is the sum of integers from 1 to 1000?" \
--preprocess.ground_truth="500500" \
--data.path=~/data/single_question/test.parquet# multi-turn SFT
python3 -m alphaapollo.workflows.sft \
--model.partial_pretrain=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=AI-MO/NuminaMath-TIR# multi-turn RL
python3 -m alphaapollo.workflows.rl \
--model.path=Qwen/Qwen2.5-3B-Instruct \
--preprocess.data_source=HuggingFaceH4/MATH-500 \
--algorithm.adv_estimator=grpoBefore running the self-evolution scripts, make sure to serve the corresponding number of models.
python alphaapollo/utils/ray_serve_llm.py --model_path Qwen/Qwen3-4B-Instruct-2507 --gpus "0,1" --port 8000 --model_id "qwen3_4b_inst"# single-model evolution
python3 -m alphaapollo.workflows.evo \
--preprocess.data_source=math-ai/aime24 \
--run.dataset_name=aime24 \
--policy_model_cfg.model_name=qwen3_4b_inst \
--policy_model_cfg.base_url=http://localhost:8000/v1 \
--verifier_cfg.model_name=qwen3_4b_inst \
--verifier_cfg.base_url=http://localhost:8000/v1+------------------------------------------------------------------+
| alphaapollo/data_preprocess |
| (dataset preparation scripts) |
+------------------------------------------------------------------+
|
V
+------------------------------------------------------------------+
| alphaapollo/core |
| (core code) |
| |
| +----------------------+ +----------------------+ |
| | generation/ | | tools/ | |
| | | <----------> | - python_code | |
| | | | - rag/ | |
| +----------------------+ +----------------------+ |
| Λ |
| | |
| V |
| +------------------------------------------------------------+ |
| | environments/ | |
| | - informal_math_training/ | |
| | - informal_math_evolving/ | |
| | - memory/ | |
| | - prompts/ | |
| +------------------------------------------------------------+ |
+------------------------------------------------------------------+
- Environment package in alphaapollo/core/environments/informal_math_training/
- Prompts in alphaapollo/core/environments/prompts/informal_math_training.py
- Environment package in alphaapollo/core/environments/informal_math_evolving/
- Prompts in alphaapollo/core/environments/prompts/informal_math_evolving.py
- Python Code implementation: alphaapollo/core/tools/python_code.py
- RAG implementation: alphaapollo/core/tools/rag/
AlphaApollo is built upon the open-source projects verl, verl-agent, vllm, and sglang. We sincerely thank the contributors of these projects for their valuable work and support.
If you find AlphaApollo useful in your research, please consider citing our work:
@article{zhou2025alphaapollo,
title={AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning},
author={Zhou, Zhanke and Cao, Chentao and Feng, Xiao and Li, Xuan and Li, Zongze and Lu, Xiangyu and Yao, Jiangchao and Huang, Weikai and Xu, Linrui and Cheng, Tian and others},
journal={arXiv preprint arXiv:2510.06261},
year={2025}
}