⚖️ Legal Disclaimer

Compliance with AI regulations is the user's legal obligation.

Under applicable laws and regulations (including but not limited to China's "Interim Measures for the Management of Generative AI Services", EU "AI Act", US "AI Risk Management Framework", etc.), users are responsible for fulfilling compliance obligations. Non-compliant use may result in service termination, administrative penalties, or legal liability. Users assume all related risks.

This project is licensed under Apache 2.0, permitting commercial use.

PiscesL1

English | 简体中文

Security | Contributing | Code of Conduct

A high-performance multimodal Mixture-of-Experts (MoE) model featuring the Yv Architecture, supporting text, image, audio, video, document, and agent understanding. PiscesL1 (PiscesLx series, Dunimd Team) is designed for research and practical applications, capable of running on a single RTX 4090 GPU with scalable architecture up to 1T parameters.

Yv Architecture

🧠 YvUnifiedReasoner - Unified Reasoning System

YvUnifiedReasoner implements an intelligent routing framework that dynamically switches between Chain-of-Thought (CoT) and Multi-Path reasoning engines:

YvCoTMemoryReasoner: Memory-augmented chain-of-thought reasoner with adaptive depth control (1-3 layers), early stopping mechanism, and error analysis with self-correction
YvMultiPathReasoningEngine: Multi-path reasoning engine supporting up to 8 parallel hypothesis streams with dynamic fact verification and metacognitive uncertainty scoring
Intelligent Routing: Automatic selection of optimal reasoning path based on problem complexity and sequence length
Control Tokens: <|start_hypothesis|>, <|start_evidence|>, <|start_conclusion|>, <|hypothesis_split|>, <|hypothesis_merge|> enable external tools to precisely track the model's thinking path

🔧 Yv MoE Scaling - Mixture-of-Experts

Mixture-of-Experts implementation:

YvStableMoEGate: Stable gating with LSTM load predictor, supporting Top-K routing for 6-64 experts
Fine-grained Expert Segmentation: Each "expert" is a combination of multiple sub-experts for more flexible routing
Shared Expert Isolation: Shared experts that are always activated to process all tokens
Auxiliary Loss-free Load Balancing: Load balancing without traditional auxiliary losses that affect model quality
UltraMem TDQKR Optimization: Tucker Decomposed Query-Key Retrieval optimization, reducing routing complexity from O(N) to O(√N)
Dynamic Device Migration: Dynamic expert migration for efficient memory management of large expert pools

🌐 Multimodal Perception Stack

Six-modality unified perception architecture:

YvVisionEncoder: NaViT-style patch encoding with native resolution support (up to 2048px) and patch packing
YvVideoEncoder: Frame-level attention encoding with 3D RoPE spatio-temporal position encoding
YvAudioEncoder: Audio spectrum encoding with streaming audio processing support
YvDocEncoder: LayoutLMv3-style document encoding with layout-aware structural reasoning
YvAgenticEncoder: Agent state encoding with action space and state representation
YvCrossModalAttention: Cross-modal attention for deep inter-modal interaction

⚛️ YvDynamicModalFusion - Dynamic Modal Fusion

Token-level multimodal fusion system:

Cross-Modal Attention: Cross-modal attention for inter-modal information exchange
Modality-Aware Position Embeddings: Modality-aware position embeddings
Quality-Weighted Gating: Quality-weighted gating that dynamically adjusts weights based on fusion quality
YvEnhancedModalFusion: Enhanced fusion module with contrastive cross-modal alignment and online adaptive weights
Multiple Fusion Strategies: Support for inserting fusion tokens before text sequences, concatenating 3D features, or outputting compressed summaries

📏 Ultra-Long Context Fabric

Industry-leading 10M+ token context support:

YaRN RoPE + Dynamic NTK Scaling: YaRN position encoding with dynamic NTK scaling for 10M+ token extrapolation
H2O Heavy-Hitter Oracle Attention: Heavy-Hitter Oracle attention that retains important tokens for ultra-long context
Streaming Attention: Streaming attention for infinite-length generation
Sliding Window Attention: Sliding window attention combining local attention with global tokens
Linear Attention: O(n) complexity linear attention with ELU/Performer/Softmax feature mappings
Paged Attention: Paged attention for efficient KV cache management and sharing
Ring Attention: Ring attention for distributed ultra-long context processing
Attention Sinks: Attention sinks ensuring streaming inference stability

🔥 Hybrid Attention-SSM

Industry-frontier hybrid architecture implementation:

Mamba-3 Integration: Complete Mamba-3 SSM integration with trapezoidal discretization, complex states, and MIMO structure
YvSelectiveSSM: Selective State Space Model with input-dependent state transitions
Progressive Gating: Progressive gating for smooth transition from pure attention to hybrid mode, ensuring training stability
Adaptive Routing: Adaptive routing that dynamically selects attention or SSM based on sequence features
Jamba-style Interleaved Architecture: Jamba-style interleaved architecture with alternating attention and SSM layers

🎯 Advanced Attention Mechanisms

Complete attention mechanism implementations:

Flash Attention 2/3: GPU-optimized efficient attention supporting Ampere+ and Hopper+ architectures
Multi-Head Latent Attention (MLA): Low-rank KV compression for significantly reduced KV cache
Grouped Query Attention (GQA): Grouped query attention balancing quality and efficiency
ALiBi Position Encoding: Attention with Linear Biases position encoding without position embeddings
QK Normalization: Query-Key normalization for improved large model training stability

🚀 Training Envelope & Optimization

Complete training optimization suite:

GaLore Optimization: Low-rank gradient projection optimization with adaptive rank adjustment and multimodal module optimization
K-FAC Enhanced Gradient Clipping: K-FAC enhanced gradient clipping with layer coordination
Multi-bit Quantization (2/4/8-bit): Multi-bit quantization support for extreme memory savings
LoRA/QLoRA: Low-rank adaptation fine-tuning supporting all linear layers
Speculative Decoding: Speculative decoding for 2-3x inference acceleration
Multi-Token Prediction (MTP): Multi-token prediction for improved generation quality
Smart Gradient Accumulation: Smart gradient accumulation with adaptive memory management
Multi-task Learning: Multi-task learning support with adaptive task weights

Reference Configuration

Core components are located in model/ and model/multimodal/, with default hyperparameters stored in configs/model/*.json.

Model Size	Layers	Hidden	Heads	KV Heads	MoE Experts	Top-K	Context	MLA Rank
0.5B	16	640	10	5	6	2	256K	256
1.5B	16	896	14	7	6	2	256K	256
7B	28	3584	32	8	8	2	1M	512
32B	64	5120	40	8	8	2	1M	512
64B	80	6656	52	8	8	2	10M	1024
70B	80	8192	64	8	8	2	10M	1024
128B	120	10240	80	8	8	2	10M	1536
314B	160	12288	96	12	16	4	10M	2048
671B	200	16384	128	16	32	6	10M	2048
1T	240	20480	160	20	64	8	10M	2560

Note: Default quantization values inherit from their respective config files and can be directly overridden in training commands via --force_quant --quant_bits {2,4,8}, --force_lora.

# 2-bit quantization (experimental, extreme memory saving)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 2 --force_lora

# 4-bit quantization (balanced)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 4 --force_lora

# 8-bit quantization (stable)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 8 --force_lora

🛠️ Installation & Environment

Python: Recommended 3.11+
CUDA: 11.8+ (for GPU training and inference)
Dependencies: See requirements.txt

Quick Setup

git clone https://gitee.com/dunimd/piscesl1.git
# or
git clone https://github.com/mf2023/piscesl1.git
cd piscesl1
python manage.py setup

⚡ Quick Start

Basic Environment Setup

# 1. Clone repository
git clone https://gitee.com/dunimd/piscesl1.git
# or
git clone https://github.com/mf2023/piscesl1.git
cd piscesl1

# 2. Environment setup
python manage.py setup

# 3. Download default dataset
python manage.py download

Core Commands

All commands through:

python manage.py <command>

View help:

python manage.py help

Command	Description
setup	Environment setup and dependency installation
train	Train model (support quantization / LoRA / RLHF / GaLore)
serve	Start OpenAI-compatible backend inference service
test	Project health check (8-stage validation)
monitor	System monitoring (GPU/CPU/memory)
download	Download dataset
benchmark	Model evaluation and benchmarking
mcp	MCP tool management (status / warmup / refresh-cache)
watermark	Watermark detection (text/file/image/audio/video/model weights)
action	Background process management (submit/status/control)
dev	Developer mode for training (vim-style command interface)
cache	Cache management for .pisceslx directory
publish	Package and publish models as Docker images
help	Show help information

Quick Experience

# Train 0.5B model
python manage.py train --model_size 0.5B

# Start backend service
python manage.py serve --model_size 7B --port 8000

API Usage Examples

# Chat Completion
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "pisceslx-7b", "messages": [{"role": "user", "content": "Hello, introduce yourself"}]}'

# Streaming Response
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "pisceslx-7b", "messages": [...], "stream": true}'

# Embedding Generation
curl http://localhost:8000/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model": "pisceslx-7b", "input": "Hello world"}'

Common Examples

# Dataset management
python manage.py download --max_samples 50000

# Training examples
python manage.py train --model_size 0.5B --dataset Chinese2
python manage.py train --model_size 1B --dataset Chinese2 --resume_ckpt runs/last.pt --reset_lr
python manage.py train --model_size 7B --dataset Chinese2 --force_quant --quant_bits 4 --force_lora
python manage.py train --model_size 7B --dataset Chinese2 --rlhf --rlhf_dataset dunimd/human_feedback --rlhf_lr 1e-5

# Backend service
python manage.py serve --model_size 7B --port 8000
python manage.py serve --model_size 14B --host 0.0.0.0 --port 8080 --workers 4
python manage.py serve --model_size 72B

# Benchmark examples
python manage.py benchmark --list
python manage.py benchmark --info mmlu
python manage.py benchmark --benchmark mmlu --config configs/0.5B.json --seq_len 4096 --model ckpt/model.pt
python manage.py benchmark --perf --config configs/0.5B.json --selftest

# MCP tools
python manage.py mcp --mcp_action status
python manage.py mcp --mcp_action warmup
python manage.py mcp --mcp_action refresh-cache

# Watermark detection
python manage.py watermark --text "Detect text watermark"
python manage.py watermark --file document.txt
python manage.py watermark --image-file image.png
python manage.py watermark --audio-file audio.wav
python manage.py watermark --video-file video.mp4
python manage.py watermark --model-file model.pt
python manage.py watermark --weights-verify --ckpt model.pt

# Background process management
python manage.py action submit train configs/train.json
python manage.py action submit train configs/train.json --gpu_count 2 --priority high
python manage.py action submit serve configs/serve.json
python manage.py action status
python manage.py action logs <run_id>
python manage.py action control <run_id> pause
python manage.py action control <run_id> resume
python manage.py action control <run_id> stop
python manage.py action list
python manage.py action list --running

# GPU resource management
python manage.py action gpu list
python manage.py action gpu status
python manage.py action gpu status --gpu_id 0
python manage.py action gpu release --task_id <run_id>

# Task queue management
python manage.py action queue list
python manage.py action queue stats
python manage.py action queue clear --priority low

# System resources
python manage.py action resources status
python manage.py action resources utilization

# Task recovery
python manage.py action recover <run_id>
python manage.py action recover <run_id> --checkpoint runs/<run_id>/ckpt.pt

# Developer mode (vim-style command interface for training)
python manage.py dev enable    # Enable developer mode
python manage.py dev disable   # Disable developer mode
python manage.py dev status    # Check developer mode status

# Available commands during training:
#   Memory: :mem, :mem-gpu, :mem-cpu
#   Model: :layer, :layers, :grad, :grad-norm
#   Training: :pause, :resume, :save, :lr, :batch
#   Config: :config, :config-model, :config-data
#   Monitoring: :watch, :watch-clear, :profile
#   Intervention: :inject, :freeze, :unfreeze, :nan-check
#   Other: :help, :q

# Cache management for .pisceslx directory
python manage.py cache         # Show cache status
python manage.py cache clean   # Clean all cache (settings/ protected)

# Publish models as Docker images with inference engine
python manage.py publish --publish_action full --publish_model_size 7B --publish_registry docker.io
python manage.py publish --publish_action full --publish_model_size 7B --publish_model_path ./ckpt/7B.pt
python manage.py publish --publish_action export --publish_model_size 7B --publish_output_dir ./export/
python manage.py publish --publish_action build --publish_model_size 7B --publish_template gpu
python manage.py publish --publish_action push --publish_registry ghcr.io --publish_registry_namespace myuser
python manage.py publish --publish_action validate --publish_model_size 7B
python manage.py publish --publish_action info --publish_model_size 7B
python manage.py publish --publish_action list

📦 Dataset

Dataset is configured by configs/dataset.yaml and downloaded through:

python manage.py download

Default download priority: ModelScope → HuggingFace (automatic mirroring when inaccessible)
Complete list see configs/dataset.yaml

❓ Frequently Asked Questions (FAQ)

How to view available commands? python manage.py help
How to add new dataset? Edit configs/dataset.yaml and run python manage.py download. Custom dataset recommend JSONL (text) or Parquet (input_ids/labels).
Insufficient GPU memory? Use smaller model, reduce sequence length, or enable 4-bit quantization (--force_quant --quant_bits 4, usually with --force_lora).
How to resume training? --resume_ckpt path/to/ckpt.pt (optional --reset_lr)
CPU only? Can use --device cpu (slower performance).
How to perform evaluation? python manage.py benchmark ..., with --config, --seq_len, --model and other parameters.

🌏 Community & Citation

Welcome to submit Issues and PRs!
Gitee: https://gitee.com/dunimd/piscesl1.git
GitHub: https://github.com/mf2023/piscesl1.git
ModelScope: https://www.modelscope.cn/models/mfchina2024/PiscesL1

📚 Academic References

This project implements algorithms from the following academic papers. We sincerely thank the authors for their contributions.

Attention Mechanisms

Algorithm	Paper	Authors	Venue	Year	Code
ALiBi	Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation	Press et al.	ICLR	2022	attention.py
Attention Sink	Efficient Streaming Language Models with Attention Sinks	Xiao et al.	ICLR	2024	attention.py
QK Normalization	Query-Key Normalization for Transformers	Henry et al.	ICLR	2020	attention.py
Linear Attention	Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention	Katharopoulos et al.	ICML	2020	attention.py
S4	Efficiently Modeling Long Sequences with Structured State Spaces	Gu et al.	ICLR	2022	attention.py
Longformer	Longformer: The Long-Document Transformer	Beltagy et al.	-	2020	attention.py
BigBird	Big Bird: Transformers for Longer Sequences	Zaheer et al.	NeurIPS	2020	attention.py
Ring Attention	Ring Attention with Blockwise Transformers for Near-Infinite Context	Liu et al.	ICLR	2024	attention.py
MQA	Fast Transformer Decoding: One Write-Head is All You Need	Shazeer	-	2019	attention.py
H2O	H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models	Zhang et al.	ICLR	2024	attention.py
LongRoPE	LongRoPE: Extending LLM Context Window Beyond 2M Tokens	Ding et al.	ICML	2024	attention.py
PagedAttention	Efficient Memory Management for Large Language Model Serving with PagedAttention	Kwon et al.	SOSP	2023	attention.py
Flash Attention	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	Dao et al.	NeurIPS	2022	attention.py
Flash Attention 2	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Dao	-	2023	attention.py
Flash Attention 3	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Blockwise Parallelism	Dao et al.	-	2024	flash_attention.py
CoPE	Context-aware Position Encoding for Better Length Extrapolation	Yang et al.	arXiv	2024	attention.py

Position Encoding

Algorithm	Paper	Authors	Venue	Year	Code
Sinusoidal PE	Attention Is All You Need	Vaswani et al.	NeurIPS	2017	embedding.py
RoPE	RoFormer: Enhanced Transformer with Rotary Position Embedding	Su et al.	-	2021	norms.py
YaRN	YaRN: Efficient Context Window Extension of Large Language Models	Peng et al.	-	2023	norms.py

Normalization & Activation

Algorithm	Paper	Authors	Venue	Year	Code
RMSNorm	Root Mean Square Layer Normalization	Zhang & Sennrich	NeurIPS	2019	norms.py
Adaptive LayerNorm	Scalable Diffusion Models with Transformers (DiT)	Peebles & Xie	ICCV	2023	norms.py
LayerScale	Going deeper with Image Transformers	Touvron et al.	ICCV	2021	blocks.py
SwiGLU	GLU Variants Improve Transformer	Shazeer	-	2020	blocks.py
GeGLU	GLU Variants Improve Transformer	Shazeer	-	2020	blocks.py
Group Normalization	Group Normalization	Wu & He	ECCV	2018	norms.py

State Space Models

Algorithm	Paper	Authors	Venue	Year	Code
Mamba	Mamba: Linear-Time Sequence Modeling with Selective State Spaces	Gu & Dao	arXiv	2023	blocks.py
Mamba-2	Mamba-2: Transforming Transformers	Dao et al.	arXiv	2024	blocks.py

Mixture of Experts

Algorithm	Paper	Authors	Venue	Year	Code
UltraMem TDQKR	UltraMem	ByteDance	ICLR	2025	layer.py
DeepSeekMoE	DeepSeek-V3 Technical Report	DeepSeek Team	-	2024	expert.py, layer.py

Inference Optimization

Algorithm	Paper	Authors	Venue	Year	Code
Speculative Decoding	Fast Inference from Transformers via Speculative Decoding	Leviathan et al.	ICML	2023	cache.py
BLIP-2	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Li et al.	ICML	2023	cache.py

Training Optimization

Algorithm	Paper	Authors	Venue	Year	Code
K-FAC	Optimizing Neural Networks with Kronecker-factored Approximate Curvature	Martens & Grosse	ICML	2015	kfac.py
K-FAC for Conv	A Kronecker-factored Approximate Fisher Matrix for Convolution Layers	Grosse & Martens	ICML	2016	kfac.py
Multi-Task Uncertainty	Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics	Kendall et al.	CVPR	2018	multitask_uncertainty.py
SGDR	SGDR: Stochastic Gradient Descent with Warm Restarts	Loshchilov & Hutter	arXiv	2016	modality_scheduler.py
Chinchilla Scaling	Training Compute-Optimal Large Language Models	Hoffmann et al.	-	2022	scaling/init.py

Alignment & Reinforcement Learning

Algorithm	Paper	Authors	Venue	Year	Code
DPO	Direct Preference Optimization: Your Language Model is Secretly a Reward Model	Rafailov et al.	NeurIPS	2023	dpo.py
GRPO	DeepSeek R1 Technical Report	DeepSeek Team	arXiv	2024	grpo.py
RLVR	DeepSeek R1 Technical Report / OpenAI o1	DeepSeek / OpenAI	arXiv	2024/2025	rlvr.py
TPO	Test-Time Preference Optimization: On-the-fly Alignment via Iterative Textual Feedback	-	arXiv	2025	tpo.py

Reasoning & Agentic

Algorithm	Paper	Authors	Venue	Year	Code
ReAct	ReAct: Synergizing Reasoning and Acting in Language Models	Yao et al.	ICLR	2023	react_agentic.py
Chain-of-Thought	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	Wei et al.	NeurIPS	2022	react_agentic.py

Optimizers

Algorithm	Paper	Authors	Venue	Year	Code
GaLore	GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection	Zhao et al.	arXiv	2024	galore.py
ROOT	ROOT: Robust Orthogonalized Optimizer for Neural Network Training	Huawei Noah's Ark Lab	arXiv	2024	root.py
FP4 Training	Optimizing Large Language Model Training Using FP4 Quantization	-	arXiv	2025	fp4.py

Quantization

Algorithm	Paper	Authors	Venue	Year	Code
GPTQ	GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Frantar et al.	ICLR	2023	orchestrator.py

Citation

If you use this project in your research, please cite:

@misc{piscesl1,
  author = {Wenze Wei, Dunimd Team},
  title = {PiscesL1: A High-Performance Multimodal Mixture-of-Experts Model},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/mf2023/piscesl1}
}

📄 License & Open Source Agreements

🏛️ Project License

This project uses Apache License 2.0 open source agreement, see LICENSE file.

📋 Dependency Package Open Source Agreements

Open source packages and their agreement information used by this project:

📦 Package	📜 License	📦 Package	📜 License
torch	BSD-style	torchvision	BSD-style
torchaudio	BSD-style	torch-directml	MIT
transformers	Apache 2.0	tokenizers	Apache 2.0
huggingface-hub	Apache 2.0	modelscope	Apache 2.0
numpy	BSD 3-Clause	scipy	BSD 3-Clause
scikit-learn	BSD 3-Clause	addict	MIT
accelerate	Apache 2.0	einops	MIT
timm	Apache 2.0	pytorch-lightning	Apache 2.0
pillow	HPND	PyMuPDF	AGPL 3.0
python-docx	MIT	python-pptx	MIT
pdfplumber	MIT	pdf2image	MIT
ocrmypdf	MPL 2.0	bitsandbytes	MIT
peft	Apache 2.0	wheel	MIT
xformers	BSD 3-Clause	trl	Apache 2.0
nvidia-ml-py3	BSD 3-Clause	fastapi	MIT
uvicorn	BSD 3-Clause	python-multipart	Apache 2.0
pydantic	MIT	httpx	BSD 3-Clause
pandas	BSD 3-Clause	gradio	Apache 2.0
ijson	BSD 3-Clause	pyarrow	Apache 2.0
tqdm	MIT	jsonlines	MIT
windows-curses	BSD 3-Clause	psutil	BSD 3-Clause
streamlit	Apache 2.0	PyYAML	MIT
GitPython	BSD 3-Clause	opencv-python	MIT
av	BSD 3-Clause	decord	Apache 2.0
imageio	BSD 3-Clause	imageio-ffmpeg	BSD 3-Clause
openai	Apache 2.0	requests	Apache 2.0
beautifulsoup4	MIT	psutil	BSD 3-Clause
pytz	MIT	pywin32	PSF
duckduckgo-search	MIT	plotly	MIT
evalscope	Apache 2.0	safetensors	Apache 2.0
deepspeed	Apache 2.0	aiofiles	Apache 2.0
pathlib2	MIT	textual	MIT
dmsc	Apache 2.0	datasets	Apache 2.0
rich	MIT	omegaconf	BSD 3-Clause
hydra-core	MIT	wandb	MIT
tensorboard	Apache 2.0	mlflow	Apache 2.0
lm-eval	MIT	rouge-score	Apache 2.0
sacrebleu	Apache 2.0	bert-score	MIT
librosa	ISC	soundfile	BSD 3-Clause
audioread	MIT	pydub	MIT
flash-attn	BSD 3-Clause	triton	MIT
mamba-ssm	Apache 2.0	causal-conv1d	Apache 2.0
docker	Apache 2.0

✅ Success. Connection established.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.xi		.xi
configs		configs
model		model
opss		opss
tokenizer		tokenizer
tools		tools
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
SECURITY.md		SECURITY.md
manage.py		manage.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

⚖️ Legal Disclaimer

PiscesL1

Yv Architecture

🧠 YvUnifiedReasoner - Unified Reasoning System

🔧 Yv MoE Scaling - Mixture-of-Experts

🌐 Multimodal Perception Stack

⚛️ YvDynamicModalFusion - Dynamic Modal Fusion

📏 Ultra-Long Context Fabric

🔥 Hybrid Attention-SSM

🎯 Advanced Attention Mechanisms

🚀 Training Envelope & Optimization

Reference Configuration

🛠️ Installation & Environment

Quick Setup

⚡ Quick Start

Basic Environment Setup

Core Commands

Quick Experience

API Usage Examples

Common Examples

📦 Dataset

❓ Frequently Asked Questions (FAQ)

🌏 Community & Citation

📚 Academic References

Attention Mechanisms

Position Encoding

Normalization & Activation

State Space Models

Mixture of Experts

Inference Optimization

Training Optimization

Alignment & Reinforcement Learning

Reasoning & Agentic

Optimizers

Quantization

Citation

📄 License & Open Source Agreements

🏛️ Project License

📋 Dependency Package Open Source Agreements

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages