Skip to content

mf2023/PiscesL1

⚖️ Legal Disclaimer

Compliance with AI regulations is the user's legal obligation.

Under applicable laws and regulations (including but not limited to China's "Interim Measures for the Management of Generative AI Services", EU "AI Act", US "AI Risk Management Framework", etc.), users are responsible for fulfilling compliance obligations. Non-compliant use may result in service termination, administrative penalties, or legal liability. Users assume all related risks.

This project is licensed under Apache 2.0, permitting commercial use.


PiscesL1

English | 简体中文

Security | Contributing | Code of Conduct

BiliBili X Gitee GitHub Hugging Face ModelScope

A high-performance multimodal Mixture-of-Experts (MoE) model featuring the Yv Architecture, supporting text, image, audio, video, document, and agent understanding. PiscesL1 (PiscesLx series, Dunimd Team) is designed for research and practical applications, capable of running on a single RTX 4090 GPU with scalable architecture up to 1T parameters.

Yv Architecture

🧠 YvUnifiedReasoner - Unified Reasoning System

YvUnifiedReasoner implements an intelligent routing framework that dynamically switches between Chain-of-Thought (CoT) and Multi-Path reasoning engines:

  • YvCoTMemoryReasoner: Memory-augmented chain-of-thought reasoner with adaptive depth control (1-3 layers), early stopping mechanism, and error analysis with self-correction
  • YvMultiPathReasoningEngine: Multi-path reasoning engine supporting up to 8 parallel hypothesis streams with dynamic fact verification and metacognitive uncertainty scoring
  • Intelligent Routing: Automatic selection of optimal reasoning path based on problem complexity and sequence length
  • Control Tokens: <|start_hypothesis|>, <|start_evidence|>, <|start_conclusion|>, <|hypothesis_split|>, <|hypothesis_merge|> enable external tools to precisely track the model's thinking path

🔧 Yv MoE Scaling - Mixture-of-Experts

Mixture-of-Experts implementation:

  • YvStableMoEGate: Stable gating with LSTM load predictor, supporting Top-K routing for 6-64 experts
  • Fine-grained Expert Segmentation: Each "expert" is a combination of multiple sub-experts for more flexible routing
  • Shared Expert Isolation: Shared experts that are always activated to process all tokens
  • Auxiliary Loss-free Load Balancing: Load balancing without traditional auxiliary losses that affect model quality
  • UltraMem TDQKR Optimization: Tucker Decomposed Query-Key Retrieval optimization, reducing routing complexity from O(N) to O(√N)
  • Dynamic Device Migration: Dynamic expert migration for efficient memory management of large expert pools

🌐 Multimodal Perception Stack

Six-modality unified perception architecture:

  • YvVisionEncoder: NaViT-style patch encoding with native resolution support (up to 2048px) and patch packing
  • YvVideoEncoder: Frame-level attention encoding with 3D RoPE spatio-temporal position encoding
  • YvAudioEncoder: Audio spectrum encoding with streaming audio processing support
  • YvDocEncoder: LayoutLMv3-style document encoding with layout-aware structural reasoning
  • YvAgenticEncoder: Agent state encoding with action space and state representation
  • YvCrossModalAttention: Cross-modal attention for deep inter-modal interaction

⚛️ YvDynamicModalFusion - Dynamic Modal Fusion

Token-level multimodal fusion system:

  • Cross-Modal Attention: Cross-modal attention for inter-modal information exchange
  • Modality-Aware Position Embeddings: Modality-aware position embeddings
  • Quality-Weighted Gating: Quality-weighted gating that dynamically adjusts weights based on fusion quality
  • YvEnhancedModalFusion: Enhanced fusion module with contrastive cross-modal alignment and online adaptive weights
  • Multiple Fusion Strategies: Support for inserting fusion tokens before text sequences, concatenating 3D features, or outputting compressed summaries

📏 Ultra-Long Context Fabric

Industry-leading 10M+ token context support:

  • YaRN RoPE + Dynamic NTK Scaling: YaRN position encoding with dynamic NTK scaling for 10M+ token extrapolation
  • H2O Heavy-Hitter Oracle Attention: Heavy-Hitter Oracle attention that retains important tokens for ultra-long context
  • Streaming Attention: Streaming attention for infinite-length generation
  • Sliding Window Attention: Sliding window attention combining local attention with global tokens
  • Linear Attention: O(n) complexity linear attention with ELU/Performer/Softmax feature mappings
  • Paged Attention: Paged attention for efficient KV cache management and sharing
  • Ring Attention: Ring attention for distributed ultra-long context processing
  • Attention Sinks: Attention sinks ensuring streaming inference stability

🔥 Hybrid Attention-SSM

Industry-frontier hybrid architecture implementation:

  • Mamba-3 Integration: Complete Mamba-3 SSM integration with trapezoidal discretization, complex states, and MIMO structure
  • YvSelectiveSSM: Selective State Space Model with input-dependent state transitions
  • Progressive Gating: Progressive gating for smooth transition from pure attention to hybrid mode, ensuring training stability
  • Adaptive Routing: Adaptive routing that dynamically selects attention or SSM based on sequence features
  • Jamba-style Interleaved Architecture: Jamba-style interleaved architecture with alternating attention and SSM layers

🎯 Advanced Attention Mechanisms

Complete attention mechanism implementations:

  • Flash Attention 2/3: GPU-optimized efficient attention supporting Ampere+ and Hopper+ architectures
  • Multi-Head Latent Attention (MLA): Low-rank KV compression for significantly reduced KV cache
  • Grouped Query Attention (GQA): Grouped query attention balancing quality and efficiency
  • ALiBi Position Encoding: Attention with Linear Biases position encoding without position embeddings
  • QK Normalization: Query-Key normalization for improved large model training stability

🚀 Training Envelope & Optimization

Complete training optimization suite:

  • GaLore Optimization: Low-rank gradient projection optimization with adaptive rank adjustment and multimodal module optimization
  • K-FAC Enhanced Gradient Clipping: K-FAC enhanced gradient clipping with layer coordination
  • Multi-bit Quantization (2/4/8-bit): Multi-bit quantization support for extreme memory savings
  • LoRA/QLoRA: Low-rank adaptation fine-tuning supporting all linear layers
  • Speculative Decoding: Speculative decoding for 2-3x inference acceleration
  • Multi-Token Prediction (MTP): Multi-token prediction for improved generation quality
  • Smart Gradient Accumulation: Smart gradient accumulation with adaptive memory management
  • Multi-task Learning: Multi-task learning support with adaptive task weights

Reference Configuration

Core components are located in model/ and model/multimodal/, with default hyperparameters stored in configs/model/*.json.

Model Size Layers Hidden Heads KV Heads MoE Experts Top-K Context MLA Rank
0.5B 16 640 10 5 6 2 256K 256
1.5B 16 896 14 7 6 2 256K 256
7B 28 3584 32 8 8 2 1M 512
32B 64 5120 40 8 8 2 1M 512
64B 80 6656 52 8 8 2 10M 1024
70B 80 8192 64 8 8 2 10M 1024
128B 120 10240 80 8 8 2 10M 1536
314B 160 12288 96 12 16 4 10M 2048
671B 200 16384 128 16 32 6 10M 2048
1T 240 20480 160 20 64 8 10M 2560

Note: Default quantization values inherit from their respective config files and can be directly overridden in training commands via --force_quant --quant_bits {2,4,8}, --force_lora.

# 2-bit quantization (experimental, extreme memory saving)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 2 --force_lora

# 4-bit quantization (balanced)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 4 --force_lora

# 8-bit quantization (stable)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 8 --force_lora

🛠️ Installation & Environment

  • Python: Recommended 3.11+
  • CUDA: 11.8+ (for GPU training and inference)
  • Dependencies: See requirements.txt

Quick Setup

git clone https://gitee.com/dunimd/piscesl1.git
# or
git clone https://github.com/mf2023/piscesl1.git
cd piscesl1
python manage.py setup

⚡ Quick Start

Basic Environment Setup

# 1. Clone repository
git clone https://gitee.com/dunimd/piscesl1.git
# or
git clone https://github.com/mf2023/piscesl1.git
cd piscesl1

# 2. Environment setup
python manage.py setup

# 3. Download default dataset
python manage.py download

Core Commands

All commands through:

python manage.py <command>

View help:

python manage.py help
Command Description
setup Environment setup and dependency installation
train Train model (support quantization / LoRA / RLHF / GaLore)
serve Start OpenAI-compatible backend inference service
test Project health check (8-stage validation)
monitor System monitoring (GPU/CPU/memory)
download Download dataset
benchmark Model evaluation and benchmarking
mcp MCP tool management (status / warmup / refresh-cache)
watermark Watermark detection (text/file/image/audio/video/model weights)
action Background process management (submit/status/control)
dev Developer mode for training (vim-style command interface)
cache Cache management for .pisceslx directory
publish Package and publish models as Docker images
help Show help information

Quick Experience

# Train 0.5B model
python manage.py train --model_size 0.5B

# Start backend service
python manage.py serve --model_size 7B --port 8000

API Usage Examples

# Chat Completion
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "pisceslx-7b", "messages": [{"role": "user", "content": "Hello, introduce yourself"}]}'

# Streaming Response
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "pisceslx-7b", "messages": [...], "stream": true}'

# Embedding Generation
curl http://localhost:8000/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model": "pisceslx-7b", "input": "Hello world"}'

Common Examples

# Dataset management
python manage.py download --max_samples 50000

# Training examples
python manage.py train --model_size 0.5B --dataset Chinese2
python manage.py train --model_size 1B --dataset Chinese2 --resume_ckpt runs/last.pt --reset_lr
python manage.py train --model_size 7B --dataset Chinese2 --force_quant --quant_bits 4 --force_lora
python manage.py train --model_size 7B --dataset Chinese2 --rlhf --rlhf_dataset dunimd/human_feedback --rlhf_lr 1e-5

# Backend service
python manage.py serve --model_size 7B --port 8000
python manage.py serve --model_size 14B --host 0.0.0.0 --port 8080 --workers 4
python manage.py serve --model_size 72B

# Benchmark examples
python manage.py benchmark --list
python manage.py benchmark --info mmlu
python manage.py benchmark --benchmark mmlu --config configs/0.5B.json --seq_len 4096 --model ckpt/model.pt
python manage.py benchmark --perf --config configs/0.5B.json --selftest

# MCP tools
python manage.py mcp --mcp_action status
python manage.py mcp --mcp_action warmup
python manage.py mcp --mcp_action refresh-cache

# Watermark detection
python manage.py watermark --text "Detect text watermark"
python manage.py watermark --file document.txt
python manage.py watermark --image-file image.png
python manage.py watermark --audio-file audio.wav
python manage.py watermark --video-file video.mp4
python manage.py watermark --model-file model.pt
python manage.py watermark --weights-verify --ckpt model.pt

# Background process management
python manage.py action submit train configs/train.json
python manage.py action submit train configs/train.json --gpu_count 2 --priority high
python manage.py action submit serve configs/serve.json
python manage.py action status
python manage.py action logs <run_id>
python manage.py action control <run_id> pause
python manage.py action control <run_id> resume
python manage.py action control <run_id> stop
python manage.py action list
python manage.py action list --running

# GPU resource management
python manage.py action gpu list
python manage.py action gpu status
python manage.py action gpu status --gpu_id 0
python manage.py action gpu release --task_id <run_id>

# Task queue management
python manage.py action queue list
python manage.py action queue stats
python manage.py action queue clear --priority low

# System resources
python manage.py action resources status
python manage.py action resources utilization

# Task recovery
python manage.py action recover <run_id>
python manage.py action recover <run_id> --checkpoint runs/<run_id>/ckpt.pt

# Developer mode (vim-style command interface for training)
python manage.py dev enable    # Enable developer mode
python manage.py dev disable   # Disable developer mode
python manage.py dev status    # Check developer mode status

# Available commands during training:
#   Memory: :mem, :mem-gpu, :mem-cpu
#   Model: :layer, :layers, :grad, :grad-norm
#   Training: :pause, :resume, :save, :lr, :batch
#   Config: :config, :config-model, :config-data
#   Monitoring: :watch, :watch-clear, :profile
#   Intervention: :inject, :freeze, :unfreeze, :nan-check
#   Other: :help, :q

# Cache management for .pisceslx directory
python manage.py cache         # Show cache status
python manage.py cache clean   # Clean all cache (settings/ protected)

# Publish models as Docker images with inference engine
python manage.py publish --publish_action full --publish_model_size 7B --publish_registry docker.io
python manage.py publish --publish_action full --publish_model_size 7B --publish_model_path ./ckpt/7B.pt
python manage.py publish --publish_action export --publish_model_size 7B --publish_output_dir ./export/
python manage.py publish --publish_action build --publish_model_size 7B --publish_template gpu
python manage.py publish --publish_action push --publish_registry ghcr.io --publish_registry_namespace myuser
python manage.py publish --publish_action validate --publish_model_size 7B
python manage.py publish --publish_action info --publish_model_size 7B
python manage.py publish --publish_action list

📦 Dataset

Dataset is configured by configs/dataset.yaml and downloaded through:

python manage.py download
  • Default download priority: ModelScope → HuggingFace (automatic mirroring when inaccessible)

  • Complete list see configs/dataset.yaml


❓ Frequently Asked Questions (FAQ)

  • How to view available commands? python manage.py help
  • How to add new dataset? Edit configs/dataset.yaml and run python manage.py download. Custom dataset recommend JSONL (text) or Parquet (input_ids/labels).
  • Insufficient GPU memory? Use smaller model, reduce sequence length, or enable 4-bit quantization (--force_quant --quant_bits 4, usually with --force_lora).
  • How to resume training? --resume_ckpt path/to/ckpt.pt (optional --reset_lr)
  • CPU only? Can use --device cpu (slower performance).
  • How to perform evaluation? python manage.py benchmark ..., with --config, --seq_len, --model and other parameters.

🌏 Community & Citation


📚 Academic References

This project implements algorithms from the following academic papers. We sincerely thank the authors for their contributions.

Attention Mechanisms

Algorithm Paper Authors Venue Year Code
ALiBi Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Press et al. ICLR 2022 attention.py
Attention Sink Efficient Streaming Language Models with Attention Sinks Xiao et al. ICLR 2024 attention.py
QK Normalization Query-Key Normalization for Transformers Henry et al. ICLR 2020 attention.py
Linear Attention Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Katharopoulos et al. ICML 2020 attention.py
S4 Efficiently Modeling Long Sequences with Structured State Spaces Gu et al. ICLR 2022 attention.py
Longformer Longformer: The Long-Document Transformer Beltagy et al. - 2020 attention.py
BigBird Big Bird: Transformers for Longer Sequences Zaheer et al. NeurIPS 2020 attention.py
Ring Attention Ring Attention with Blockwise Transformers for Near-Infinite Context Liu et al. ICLR 2024 attention.py
MQA Fast Transformer Decoding: One Write-Head is All You Need Shazeer - 2019 attention.py
H2O H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Zhang et al. ICLR 2024 attention.py
LongRoPE LongRoPE: Extending LLM Context Window Beyond 2M Tokens Ding et al. ICML 2024 attention.py
PagedAttention Efficient Memory Management for Large Language Model Serving with PagedAttention Kwon et al. SOSP 2023 attention.py
Flash Attention FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Dao et al. NeurIPS 2022 attention.py
Flash Attention 2 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Dao - 2023 attention.py
Flash Attention 3 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Blockwise Parallelism Dao et al. - 2024 flash_attention.py
CoPE Context-aware Position Encoding for Better Length Extrapolation Yang et al. arXiv 2024 attention.py

Position Encoding

Algorithm Paper Authors Venue Year Code
Sinusoidal PE Attention Is All You Need Vaswani et al. NeurIPS 2017 embedding.py
RoPE RoFormer: Enhanced Transformer with Rotary Position Embedding Su et al. - 2021 norms.py
YaRN YaRN: Efficient Context Window Extension of Large Language Models Peng et al. - 2023 norms.py

Normalization & Activation

Algorithm Paper Authors Venue Year Code
RMSNorm Root Mean Square Layer Normalization Zhang & Sennrich NeurIPS 2019 norms.py
Adaptive LayerNorm Scalable Diffusion Models with Transformers (DiT) Peebles & Xie ICCV 2023 norms.py
LayerScale Going deeper with Image Transformers Touvron et al. ICCV 2021 blocks.py
SwiGLU GLU Variants Improve Transformer Shazeer - 2020 blocks.py
GeGLU GLU Variants Improve Transformer Shazeer - 2020 blocks.py
Group Normalization Group Normalization Wu & He ECCV 2018 norms.py

State Space Models

Algorithm Paper Authors Venue Year Code
Mamba Mamba: Linear-Time Sequence Modeling with Selective State Spaces Gu & Dao arXiv 2023 blocks.py
Mamba-2 Mamba-2: Transforming Transformers Dao et al. arXiv 2024 blocks.py

Mixture of Experts

Algorithm Paper Authors Venue Year Code
UltraMem TDQKR UltraMem ByteDance ICLR 2025 layer.py
DeepSeekMoE DeepSeek-V3 Technical Report DeepSeek Team - 2024 expert.py, layer.py

Inference Optimization

Algorithm Paper Authors Venue Year Code
Speculative Decoding Fast Inference from Transformers via Speculative Decoding Leviathan et al. ICML 2023 cache.py
BLIP-2 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Li et al. ICML 2023 cache.py

Training Optimization

Algorithm Paper Authors Venue Year Code
K-FAC Optimizing Neural Networks with Kronecker-factored Approximate Curvature Martens & Grosse ICML 2015 kfac.py
K-FAC for Conv A Kronecker-factored Approximate Fisher Matrix for Convolution Layers Grosse & Martens ICML 2016 kfac.py
Multi-Task Uncertainty Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics Kendall et al. CVPR 2018 multitask_uncertainty.py
SGDR SGDR: Stochastic Gradient Descent with Warm Restarts Loshchilov & Hutter arXiv 2016 modality_scheduler.py
Chinchilla Scaling Training Compute-Optimal Large Language Models Hoffmann et al. - 2022 scaling/init.py

Alignment & Reinforcement Learning

Algorithm Paper Authors Venue Year Code
DPO Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafailov et al. NeurIPS 2023 dpo.py
GRPO DeepSeek R1 Technical Report DeepSeek Team arXiv 2024 grpo.py
RLVR DeepSeek R1 Technical Report / OpenAI o1 DeepSeek / OpenAI arXiv 2024/2025 rlvr.py
TPO Test-Time Preference Optimization: On-the-fly Alignment via Iterative Textual Feedback - arXiv 2025 tpo.py

Reasoning & Agentic

Algorithm Paper Authors Venue Year Code
ReAct ReAct: Synergizing Reasoning and Acting in Language Models Yao et al. ICLR 2023 react_agentic.py
Chain-of-Thought Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Wei et al. NeurIPS 2022 react_agentic.py

Optimizers

Algorithm Paper Authors Venue Year Code
GaLore GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Zhao et al. arXiv 2024 galore.py
ROOT ROOT: Robust Orthogonalized Optimizer for Neural Network Training Huawei Noah's Ark Lab arXiv 2024 root.py
FP4 Training Optimizing Large Language Model Training Using FP4 Quantization - arXiv 2025 fp4.py

Quantization

Algorithm Paper Authors Venue Year Code
GPTQ GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Frantar et al. ICLR 2023 orchestrator.py

Citation

If you use this project in your research, please cite:

@misc{piscesl1,
  author = {Wenze Wei, Dunimd Team},
  title = {PiscesL1: A High-Performance Multimodal Mixture-of-Experts Model},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/mf2023/piscesl1}
}

📄 License & Open Source Agreements

🏛️ Project License

Apache License 2.0

This project uses Apache License 2.0 open source agreement, see LICENSE file.

📋 Dependency Package Open Source Agreements

Open source packages and their agreement information used by this project:

📦 Package 📜 License 📦 Package 📜 License
torch BSD-style torchvision BSD-style
torchaudio BSD-style torch-directml MIT
transformers Apache 2.0 tokenizers Apache 2.0
huggingface-hub Apache 2.0 modelscope Apache 2.0
numpy BSD 3-Clause scipy BSD 3-Clause
scikit-learn BSD 3-Clause addict MIT
accelerate Apache 2.0 einops MIT
timm Apache 2.0 pytorch-lightning Apache 2.0
pillow HPND PyMuPDF AGPL 3.0
python-docx MIT python-pptx MIT
pdfplumber MIT pdf2image MIT
ocrmypdf MPL 2.0 bitsandbytes MIT
peft Apache 2.0 wheel MIT
xformers BSD 3-Clause trl Apache 2.0
nvidia-ml-py3 BSD 3-Clause fastapi MIT
uvicorn BSD 3-Clause python-multipart Apache 2.0
pydantic MIT httpx BSD 3-Clause
pandas BSD 3-Clause gradio Apache 2.0
ijson BSD 3-Clause pyarrow Apache 2.0
tqdm MIT jsonlines MIT
windows-curses BSD 3-Clause psutil BSD 3-Clause
streamlit Apache 2.0 PyYAML MIT
GitPython BSD 3-Clause opencv-python MIT
av BSD 3-Clause decord Apache 2.0
imageio BSD 3-Clause imageio-ffmpeg BSD 3-Clause
openai Apache 2.0 requests Apache 2.0
beautifulsoup4 MIT psutil BSD 3-Clause
pytz MIT pywin32 PSF
duckduckgo-search MIT plotly MIT
evalscope Apache 2.0 safetensors Apache 2.0
deepspeed Apache 2.0 aiofiles Apache 2.0
pathlib2 MIT textual MIT
dmsc Apache 2.0 datasets Apache 2.0
rich MIT omegaconf BSD 3-Clause
hydra-core MIT wandb MIT
tensorboard Apache 2.0 mlflow Apache 2.0
lm-eval MIT rouge-score Apache 2.0
sacrebleu Apache 2.0 bert-score MIT
librosa ISC soundfile BSD 3-Clause
audioread MIT pydub MIT
flash-attn BSD 3-Clause triton MIT
mamba-ssm Apache 2.0 causal-conv1d Apache 2.0
docker Apache 2.0

✅ Success. Connection established.

About

A high-performance multimodal Mixture-of-Experts (MoE) model featuring the Yv Architecture, supporting text, image, audio, video, document, and agent understanding. PiscesL1 (PiscesLx series, Dunimd Team) is designed for research and practical applications, capable of running on a single RTX 4090 GPU with scalable architecture up to 1T parameters.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages