We recommend using the latest release of NGC's PyTorch container for setup.
- Clone the repository:
git clone https://github.com/FlagOpen/FlagScale.git- Install the requirements:
cd FlagScale/install
./install-requirements.sh --env train
./install-requirements.sh --env inferenceThe above instructions create two conda environments: flagscale-train and flagscale-inference, which contain the dependency environments for training and inference, respectively.
- Install the packages with customized extensions:
cd vllm
pip install .
pip install -e ./megatron-energon
cp -r megatron-energon/src/megatron/energon megatron/megatronFlagScale provides a unified runner for various tasks, including training, inference, and serving. Simply specify the configuration file to run the task with a single command.
Start the distributed training job
python run.py --config-path=examples/deepseek_v3/conf --config-name=config_deepseek_v3.yaml action=runStop the distributed training job
python run.py --config-path=examples/deepseek_v3/conf --config-name=config_deepseek_v3.yaml action=stopYAML Configuration FlagScale leverages Hydra for configuration management, which is organized into two levels: an outer experiment-level YAML file and an inner task-level YAML file.
In the OpenSeek project, we have open-sourced a DeepSeek model with a total parameter count of 16B and an activated parameter count of 2.4B. This model has been thoroughly validated on real-world datasets, and the loss curve will be released shortly.
-
Experiment-level YAML: The experiment-level YAML file defines the experiment directory, backend engine, task type, and other related environmental configurations. config_deepseek_v3.yaml
-
Task-level YAML: The task-level YAML file specifies the model, dataset, and parameters for specific tasks such as training or inference. train_deepseek_v3.yaml
- FlagScale supports the conversion of open-source models and checkpoints (CKPT) from HuggingFace to the Megatron format. Once the conversion is completed, the CKPT can be loaded, and distributed training can be initiated using FlagScale.
- For instance, the DeepSeek-V2-Lite 16B model, which is openly available on HuggingFace, can be converted into a CKPT format supported by FlagScale. Subsequently, the model can be directly warm-started by configuring the load option in
config_deepseek_v3.yaml. - FlagScale conversion supports tensor model parallelism, expert model parallelism, and pipeline model parallelism with even & uneven partitioning of pipeline stages during the checkpoint (CKPT) conversion.
cd FlagScale/tools
python convert.py \
--model-type deepseek_v3 \
--loader transformers \
--saver mcore \
--load-dir DeepSeek-V2-Lite \
--save-dir converted_mcore_bf16_model \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 2 \
--target-decoder-first-pipeline-num-layers 13 \
--target-expert-parallel-size 4 \
--target-params-dtype bf16 \
--true-vocab-size 151851Set the load field in the YAML file to the path of the converted checkpoint
system:
tensor_model_parallel_size: 2
pipeline_model_parallel_size: 2
expert_model_parallel_size: 2
context_parallel_size: 1
sequence_parallel: true
use_distributed_optimizer: true
...
checkpoint:
save_interval: 10000
load: converted_mcore_bf16_model # the save_dir after conversion
ckpt_format: torchpython run.py --config-path=examples/deepseek_v3/conf --config-name=config_deepseek_v3.yaml action=runFlagScale also supports the conversion of model checkpoints (CKPT) trained on FlagScale into the HuggingFace format, facilitating model release and evaluation.
python convert.py \
--model-type deepseek_v3 \
--loader mcore \
--saver transformers \
--load-dir bf16_model \
--save-dir converted_huggingface_model \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 1 \
--target-expert-parallel-size 1 \
--target-params-dtype bf16 \
--true-vocab-size 151851 \Currently, we have preliminarily reproduced the DeepSeek V3 pre-training code with the following features:
- Support for MLA and MTP structures: shared embedding.
- Support for DeepSeekMoE structure: shared expert, loss-free, etc.
- Support checkpoint conversion with the Hugging Face.
- Support for hybrid parallelism of TP (Tensor Parallelism), PP (Pipeline Parallelism), DP (Data Parallelism), and EP (Expert Parallelism).
The framework system side still has the following tasks. Everyone is welcome to participate and contribute. See the FlagAI-Open OpenSeek for a full list of proposed features .
- Enhance the distributed training documentation
- Improve the installation and usage
- Conversion ckpt between FlagScale and Huggingface parameters
- Research and design a solution can be easily implemented in FlagScale
- Implement a distributed log consolidation mechanism
- Improve the monitoring system of distributed training
- Performance analysis of current long sequence handling
- Performance analysis of the current DeepSeekMoE distributed training implementation
- Support for DeepSeek NAS or Kimi MoBA etc
- Integration of the FlagGems Triton operator library and corresponding training accuracy validation
- Implementation of the FP8 operators required in DeepSeek V3, with support for validation during the training process
- Implementation of a distributed reinforcement learning system to support efficient DeepSeek R1
- Develop tools for detecting slow nodes, faulty nodes, and NCCL errors in large-scale clusters
- Visualization of the communication flows and scheduling relationships in complex large-scale distributed clusters
- Support for DualPipe pipeline parallelism
- Achieve more efficient pipeline parallelism
- Improve communication algorithms to achieve more efficient MoE parallelism optimization
- Collaborate with algorithm teams to achieve more efficient long sequence optimization
- Implement customized parallel and optimization strategies for specific chips
- Implement more innovative FP8 training solutions
We warmly welcome contributions to the FlagScale project! If you would like to contribute, please follow these steps:
- Fork FlagScale to your own github repo
- Create a copy of the FlagScale repo under your account, with a URL like https://github.com/your-own-id/FlagScale
- Clone the forked repository to your local machine and navigate into the local FlagScale directory
git clone https://github.com/your-own-id/FlagScale.git cd FlagScale git config --global user.name XXX git config --global user.email XXX pre-commit install - Add the upstream repository to keep your fork updated with changes from the original FlagScale repository
git remote add upstream https://github.com/FlagOpen/FlagScale.git
- Sync updates from the upstream FlagScale repository
git pull upstream main:main
- Create a new branch and start your development
git checkout -b feature/my-new-feature
- Commit your changes
git add . git commit -m "Add my new feature"
- Push your new branch to your GitHub repository
git push origin feature/my-new-feature
- Create a pull request (PR) for FlagScale
- Open your GitHub repository page (
https://github.com/your-own-id/FlagScale) - You will see a prompt with a compare & pull request button for your newly pushed branch
- Please provide a title and a description for your pull request that succinctly describes the modifications you have made
- Click this button to proceed to the Pull Request page
- Wait for review and merge