AlphaApollo: A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning framework that orchestrates multiple models and tools to enable iterative, verifiable, and self-evolving reasoning. It supports a broad range of paradigms, including tool-integrated reasoning, agentic post-training (e.g., multi-turn supervised fine-tuning and reinforcement learning), and agentic self-evolution. The framework offers extensible environments and toolsets for easy customization, extension, and scalable deployment of agentic reasoning workflows.

News

[2026.01] We are excited to release AlphaApollo, an agentic LLM reasoning system for advanced reasoning.
[2025.10] Our technical report is released; see here for details.

Installation

conda create -n alphaapollo python==3.12 -y
conda activate alphaapollo

git clone https://github.com/tmlr-group/AlphaApollo.git
cd AlphaApollo

bash installation.sh

Supported features

Agentic reasoning

Tool-integrated reasoning rollout with seamless environment interaction
Dynamic memory updates for multi-turn reasoning

Agentic learning

Multi-turn supervised fine-tuning (SFT)
Reinforcement learning algorithms: GRPO, PPO, DAPO, and more

Agentic self-evolution

Multi-round, multi-model solution refinement with shared state
Iterative improvement via feedback and executable checks

Built-in tools

Python interpreter
Retrieval-Augmented Generation (RAG)

Quick-start recipes

Detailed quick-start commands (including script entrypoints) are documented in quick-start.md.

Note: Before using the local RAG module, please follow RAG Service Setup.

Agentic reasoning

# no-tool reasoning
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=math-ai/aime24

# tool-integrated reasoning
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=math-ai/aime24 \
  --env.informal_math.enable_python_code=true \
  --env.informal_math.enable_local_rag=false \
  --env.max_steps=4

Single-question evaluation:

# Select specific dataset samples (e.g., the 0th AIME test question) and test
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.module=alphaapollo.data_preprocess.prepare_custom_data \
  --preprocess.data_source=math-ai/aime24 \
  --preprocess.splits=test \
  --preprocess.sample_indices=0 \
  --data.path=~/data/custom_data/test.parquet

# Directly evaluate a plain text question (not from a dataset)
python3 -m alphaapollo.workflows.test \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.module=alphaapollo.data_preprocess.prepare_single_question \
  --preprocess.question_text="What is the sum of integers from 1 to 1000?" \
  --preprocess.ground_truth="500500" \
  --data.path=~/data/single_question/test.parquet

Agentic learning

# multi-turn SFT
python3 -m alphaapollo.workflows.sft \
  --model.partial_pretrain=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=AI-MO/NuminaMath-TIR

# multi-turn RL
python3 -m alphaapollo.workflows.rl \
  --model.path=Qwen/Qwen2.5-3B-Instruct \
  --preprocess.data_source=HuggingFaceH4/MATH-500 \
  --algorithm.adv_estimator=grpo

Agentic self-evolution

Before running the self-evolution scripts, make sure to serve the corresponding number of models.

python alphaapollo/utils/ray_serve_llm.py --model_path Qwen/Qwen3-4B-Instruct-2507 --gpus "0,1" --port 8000 --model_id "qwen3_4b_inst"

# single-model evolution
python3 -m alphaapollo.workflows.evo \
  --preprocess.data_source=math-ai/aime24 \
  --run.dataset_name=aime24 \
  --policy_model_cfg.model_name=qwen3_4b_inst \
  --policy_model_cfg.base_url=http://localhost:8000/v1 \
  --verifier_cfg.model_name=qwen3_4b_inst \
  --verifier_cfg.base_url=http://localhost:8000/v1

Code Structure

+------------------------------------------------------------------+
| alphaapollo/data_preprocess                                      |
| (dataset preparation scripts)                                    |
+------------------------------------------------------------------+
                               |
                               V
+------------------------------------------------------------------+
| alphaapollo/core                                                 |
| (core code)                                                      |
|                                                                  |
|  +----------------------+              +----------------------+  |
|  | generation/          |              | tools/               |  |
|  |                      | <----------> | - python_code        |  |
|  |                      |              | - rag/               |  |
|  +----------------------+              +----------------------+  |
|              Λ                                                   |
|              |                                                   |
|              V                                                   |
|  +------------------------------------------------------------+  |
|  | environments/                                              |  |
|  | - informal_math_training/                                  |  |
|  | - informal_math_evolving/                                  |  |
|  | - memory/                                                  |  |
|  | - prompts/                                                 |  |
|  +------------------------------------------------------------+  |
+------------------------------------------------------------------+

Acknowledgement

AlphaApollo is built upon the open-source projects verl, verl-agent, vllm, and sglang. We sincerely thank the contributors of these projects for their valuable work and support.

Cite

If you find AlphaApollo useful in your research, please consider citing our work:

@article{zhou2025alphaapollo,
  title={AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning},
  author={Zhou, Zhanke and Cao, Chentao and Feng, Xiao and Li, Xuan and Li, Zongze and Lu, Xiangyu and Yao, Jiangchao and Huang, Weikai and Xu, Linrui and Cheng, Tian and others},
  journal={arXiv preprint arXiv:2510.06261},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
alphaapollo		alphaapollo
docs		docs
examples		examples
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
installation.sh		installation.sh
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaApollo: A System for Deep Agentic Reasoning

News

Installation

Supported features

Agentic reasoning

Agentic learning

Agentic self-evolution

Built-in tools

Quick-start recipes

Agentic reasoning

Agentic learning

Agentic self-evolution

Code Structure

Informal Math Environment (Training):

Informal Math Environment (Evolving):

Tools (for reference)

Acknowledgement

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlphaApollo: A System for Deep Agentic Reasoning

News

Installation

Supported features

Agentic reasoning

Agentic learning

Agentic self-evolution

Built-in tools

Quick-start recipes

Agentic reasoning

Agentic learning

Agentic self-evolution

Code Structure

Informal Math Environment (Training):

Informal Math Environment (Evolving):

Tools (for reference)

Acknowledgement

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages