FlagScale leverages Hydra for configuration management. The configurations are organized into two levels: an outer experiment-level YAML file and an inner task-level YAML file.
-
The experiment-level YAML file defines the experiment directory, backend engine, task type, and other related environmental configurations.
-
The task-level YAML file specifies the model, dataset, and parameters for specific tasks such as training or inference.
All valid configurations in the task-level YAML file correspond to the arguments
used in backend engines such as Megatron-LM and vllm, with hyphens (-)
replaced by underscores (_).
For a complete list of available configurations, please refer to the backend engine documentation.
You can simply copy and modify the existing YAML files in the examples
folder to get started.
- Install backends
-
Inference/Serving backend
We recommend using the latest release of flagscale-inference image.
docker pull harbor.baai.ac.cn/flagscale/flagscale-inference:dev-cu128-py3.12-20260302102033 docker run -itd --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband --shm-size 512g --ulimit memlock=-1 --name <name> harbor.baai.ac.cn/flagscale/flagscale-inference:dev-cu128-py3.12-20260302102033 docker exec -it <name> /bin/bash conda activate flagscale-inference
vLLM:
pip install vllm==0.13.0
vLLM-plugin-FL:
pip install vllm-plugin-fl==0.1.0+vllm0.13.0 --extra-index-url https://resource.flagos.net/repository/flagos-pypi-hosted/simple
See more details in vllm-plugin-FL
FlagGems:
pip install -U scikit-build-core==0.11 pybind11 ninja cmake git clone https://github.com/flagos-ai/FlagGems cd FlagGems pip install --no-build-isolation .
See more details in FlagGems
-
Training backend
We recommend using the latest release of flagscale-train image.
docker pull harbor.baai.ac.cn/flagscale/flagscale-train:dev-cu128-py3.12-20260319182856 docker run -itd --gpus all --shm-size=500g --name <name> harbor.baai.ac.cn/flagscale/flagscale-train:dev-cu128-py3.12-20260319182856 /bin/bash docker exec -it <name> /bin/bash conda activate flagscale-train
Megatron-LM-FL:
pip install megatron_core==0.1.0+megatron0.15.0rc7 --extra-index-url https://resource.flagos.net/repository/flagos-pypi-hosted/simple
See more details in Megatron-LM-FL
TransformerEngine-FL:
pip install transformer_engine==0.1.0+te2.9.0 --extra-index-url https://resource.flagos.net/repository/flagos-pypi-hosted/simple
See more details in TransformerEngine-FL
-
RL backend
We recommend using the latest release of flagscale-train image.
docker pull harbor.baai.ac.cn/flagscale/flagscale-train:dev-cu128-py3.12-20260319182856 docker run -itd --gpus all --shm-size=500g --name <name> harbor.baai.ac.cn/flagscale/flagscale-train:dev-cu128-py3.12-20260319182856 /bin/bash docker exec -it <name> /bin/bash conda activate flagscale-train
verl-FL:
pip install verl==0.1.0+verl0.7.0 --extra-index-url https://resource.flagos.net/repository/flagos-pypi-hosted/simple
See more details in verl-FL to get full installation instructions.
-
Install FlagScale
Option 1: Install via pip
pip install flagscale --extra-index-url https://resource.flagos.net/repository/flagos-pypi-hosted/simple
Option 2: Install from source
git clone https://github.com/flagos-ai/FlagScale.git cd FlagScale pip install .
FlagScale provides a unified runner for various tasks, including training, inference and serving. Simply specify the configuration file to run the task with a single command. The runner will automatically load the configurations and execute the task. The following sections demonstrate how to run a distributed training task.
Require Megatron-LM-FL env
-
Prepare dataset demo and tokenizer:
-
dataset
We provide a small processed data (bin and idx) from the Pile dataset.
mkdir -p ./data && cd ./data wget https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/datasets/enron_emails_demo_text_document_qwen/enron_emails_demo_text_document_qwen.idx wget https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/datasets/enron_emails_demo_text_document_qwen/enron_emails_demo_text_document_qwen.bin
-
tokenizer
mkdir -p ./qwentokenizer && cd ./qwentokenizer wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/tokenizer_config.json" -O tokenizer_config.json wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/qwen.tiktoken" -O qwen.tiktoken wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/qwen_generation_utils.py" -O qwen_generation_utils.py wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/tokenization_qwen.py" -O tokenization_qwen.py
-
-
Edit config:
Modify the data_path and tokenizer_path in ./examples/qwen3/conf/train/0_6b.yaml
data: data_path: ./data/enron_emails_demo_text_document_qwen # modify data_path here split: 1 no_mmap_bin_files: true tokenizer: legacy_tokenizer: true tokenizer_type: QwenTokenizerFS tokenizer_path: ./qwentokenizer # modify tokenizer_path here vocab_size: 151936 make_vocab_size_divisible_by: 64
Modify config in ./examples/qwen3/conf/train.yaml
defaults: - _self_ - train: 0_6b # modify: train value must match its corresponding config file name
-
Start the distributed training job:
flagscale train qwen3 --config ./examples/qwen3/conf/train.yaml # or flagscale train qwen3 -c ./examples/qwen3/conf/train.yaml -
Stop the distributed training job:
flagscale train qwen3 --stop
Require vLLM-Plugin-FL env
-
Prepare model
modelscope download --model Qwen/Qwen3-4B --local_dir ./Qwen3-4B
-
Edit config
Modify model path in ./examples/qwen3/conf/inference/4b.yaml
llm: model: ./Qwen3-4B # modify: Set model directory trust_remote_code: true tensor_parallel_size: 1 pipeline_parallel_size: 1 gpu_memory_utilization: 0.9 seed: 1234
Modify config in ./examples/qwen3/conf/inference_fl.yaml
defaults: - _self_ - inference: 4b # modify: Inference value must match its corresponding config file name
-
Start inference:
flagscale inference qwen3 --config ./examples/qwen3/conf/inference_fl.yaml # or flagscale inference qwen3 -c ./examples/qwen3/conf/inference_fl.yaml
-
Prepare model
modelscope download --model Qwen/Qwen3-0.6B --local_dir ./Qwen3-0.6B
-
Edit Config
Modify model path in ./examples/qwen3/conf/serve/0_6b.yaml
- serve_id: vllm_model engine_args: model: ./Qwen3-0.6B # modify: Set model directory host: 0.0.0.0 max_model_len: 4096 max_num_seqs: 4 uvicorn_log_level: warning port: 30000 # A port available in your env, for example: 30000
Modify config in ./examples/qwen3/conf/serve.yaml
defaults: - _self_ - serve: 0_6b # modify: Serve value must match its corresponding config file name experiment: exp_name: qwen3-0.6b # modify as needed for test clarity exp_dir: outputs/${experiment.exp_name} task: type: serve backend: vllm runner: hostfile: null deploy: use_fs_serve: false envs: CUDA_VISIBLE_DEVICES: 0 CUDA_DEVICE_MAX_CONNECTIONS: 1
-
Start the server:
flagscale serve qwen3 --config ./examples/qwen3/conf/serve.yaml # or flagscale serve qwen3 -c ./examples/qwen3/conf/serve.yaml -
Stop the server:
flagscale serve qwen3 --stop
Require verl-FL env
-
Prepare model
modelscope download --model Qwen/Qwen3-0.6B --local_dir ./Qwen3-0.6B
-
Prepare dataset
mkdir gsm8k && cd gsm8k wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/rl/datasets/gsm8k/train.parquet" wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/rl/datasets/gsm8k/test.parquet" -
Edit config
Modify model path in ./examples/qwen3/conf/rl/0_6b.yaml
data: train_files: /workspace/data/gsm8k/train.parquet # modify: Set your train dataset val_files: /workspace/data/gsm8k/test.parquet # modify: Set your test dataset train_batch_size: 1024 max_prompt_length: 512 max_response_length: 1024 filter_overlong_prompts: true truncation: "error"
Modify model path in ./examples/qwen3/conf/rl/0_6b.yaml
actor_rollout_ref: model: path: /workspace/data/ckpt/Qwen3-0.6B # modify: Set your model checkpoint directory use_remove_padding: true enable_gradient_checkpointing: true trust_remote_code: true
Modify config in ./examples/qwen3/conf/rl.yaml for experiment
experiment: exp_name: 0_6b exp_dir: /workspace/qwen3-rl/ # modify: Set your experiment directory runner: runtime_env: /path/to/verl-FL/verl/trainer/runtime_env.yaml # modify: Set your runtime_env.yaml
-
Start rl:
flagscale rl qwen3 --config ./examples/qwen3/conf/rl.yaml # or flagscale rl qwen3 -c ./examples/qwen3/conf/rl.yaml
You can check the output in your experiment directory.
- Stop rl:
or force to stop ray cluster.
flagscale rl qwen3 --stop
ray stop