Compliance with AI regulations is the user's legal obligation.
Under applicable laws and regulations (including but not limited to China's "Interim Measures for the Management of Generative AI Services", EU "AI Act", US "AI Risk Management Framework", etc.), users are responsible for fulfilling compliance obligations. Non-compliant use may result in service termination, administrative penalties, or legal liability. Users assume all related risks.
This project is licensed under Apache 2.0, permitting commercial use.
English | 简体中文
Security | Contributing | Code of Conduct
A high-performance multimodal Mixture-of-Experts (MoE) model featuring the Yv Architecture, supporting text, image, audio, video, document, and agent understanding. PiscesL1 (PiscesLx series, Dunimd Team) is designed for research and practical applications, capable of running on a single RTX 4090 GPU with scalable architecture up to 1T parameters.
YvUnifiedReasoner implements an intelligent routing framework that dynamically switches between Chain-of-Thought (CoT) and Multi-Path reasoning engines:
- YvCoTMemoryReasoner: Memory-augmented chain-of-thought reasoner with adaptive depth control (1-3 layers), early stopping mechanism, and error analysis with self-correction
- YvMultiPathReasoningEngine: Multi-path reasoning engine supporting up to 8 parallel hypothesis streams with dynamic fact verification and metacognitive uncertainty scoring
- Intelligent Routing: Automatic selection of optimal reasoning path based on problem complexity and sequence length
- Control Tokens:
<|start_hypothesis|>,<|start_evidence|>,<|start_conclusion|>,<|hypothesis_split|>,<|hypothesis_merge|>enable external tools to precisely track the model's thinking path
Mixture-of-Experts implementation:
- YvStableMoEGate: Stable gating with LSTM load predictor, supporting Top-K routing for 6-64 experts
- Fine-grained Expert Segmentation: Each "expert" is a combination of multiple sub-experts for more flexible routing
- Shared Expert Isolation: Shared experts that are always activated to process all tokens
- Auxiliary Loss-free Load Balancing: Load balancing without traditional auxiliary losses that affect model quality
- UltraMem TDQKR Optimization: Tucker Decomposed Query-Key Retrieval optimization, reducing routing complexity from O(N) to O(√N)
- Dynamic Device Migration: Dynamic expert migration for efficient memory management of large expert pools
Six-modality unified perception architecture:
- YvVisionEncoder: NaViT-style patch encoding with native resolution support (up to 2048px) and patch packing
- YvVideoEncoder: Frame-level attention encoding with 3D RoPE spatio-temporal position encoding
- YvAudioEncoder: Audio spectrum encoding with streaming audio processing support
- YvDocEncoder: LayoutLMv3-style document encoding with layout-aware structural reasoning
- YvAgenticEncoder: Agent state encoding with action space and state representation
- YvCrossModalAttention: Cross-modal attention for deep inter-modal interaction
Token-level multimodal fusion system:
- Cross-Modal Attention: Cross-modal attention for inter-modal information exchange
- Modality-Aware Position Embeddings: Modality-aware position embeddings
- Quality-Weighted Gating: Quality-weighted gating that dynamically adjusts weights based on fusion quality
- YvEnhancedModalFusion: Enhanced fusion module with contrastive cross-modal alignment and online adaptive weights
- Multiple Fusion Strategies: Support for inserting fusion tokens before text sequences, concatenating 3D features, or outputting compressed summaries
Industry-leading 10M+ token context support:
- YaRN RoPE + Dynamic NTK Scaling: YaRN position encoding with dynamic NTK scaling for 10M+ token extrapolation
- H2O Heavy-Hitter Oracle Attention: Heavy-Hitter Oracle attention that retains important tokens for ultra-long context
- Streaming Attention: Streaming attention for infinite-length generation
- Sliding Window Attention: Sliding window attention combining local attention with global tokens
- Linear Attention: O(n) complexity linear attention with ELU/Performer/Softmax feature mappings
- Paged Attention: Paged attention for efficient KV cache management and sharing
- Ring Attention: Ring attention for distributed ultra-long context processing
- Attention Sinks: Attention sinks ensuring streaming inference stability
Industry-frontier hybrid architecture implementation:
- Mamba-3 Integration: Complete Mamba-3 SSM integration with trapezoidal discretization, complex states, and MIMO structure
- YvSelectiveSSM: Selective State Space Model with input-dependent state transitions
- Progressive Gating: Progressive gating for smooth transition from pure attention to hybrid mode, ensuring training stability
- Adaptive Routing: Adaptive routing that dynamically selects attention or SSM based on sequence features
- Jamba-style Interleaved Architecture: Jamba-style interleaved architecture with alternating attention and SSM layers
Complete attention mechanism implementations:
- Flash Attention 2/3: GPU-optimized efficient attention supporting Ampere+ and Hopper+ architectures
- Multi-Head Latent Attention (MLA): Low-rank KV compression for significantly reduced KV cache
- Grouped Query Attention (GQA): Grouped query attention balancing quality and efficiency
- ALiBi Position Encoding: Attention with Linear Biases position encoding without position embeddings
- QK Normalization: Query-Key normalization for improved large model training stability
Complete training optimization suite:
- GaLore Optimization: Low-rank gradient projection optimization with adaptive rank adjustment and multimodal module optimization
- K-FAC Enhanced Gradient Clipping: K-FAC enhanced gradient clipping with layer coordination
- Multi-bit Quantization (2/4/8-bit): Multi-bit quantization support for extreme memory savings
- LoRA/QLoRA: Low-rank adaptation fine-tuning supporting all linear layers
- Speculative Decoding: Speculative decoding for 2-3x inference acceleration
- Multi-Token Prediction (MTP): Multi-token prediction for improved generation quality
- Smart Gradient Accumulation: Smart gradient accumulation with adaptive memory management
- Multi-task Learning: Multi-task learning support with adaptive task weights
Core components are located in model/ and model/multimodal/, with default hyperparameters stored in configs/model/*.json.
| Model Size | Layers | Hidden | Heads | KV Heads | MoE Experts | Top-K | Context | MLA Rank |
|---|---|---|---|---|---|---|---|---|
| 0.5B | 16 | 640 | 10 | 5 | 6 | 2 | 256K | 256 |
| 1.5B | 16 | 896 | 14 | 7 | 6 | 2 | 256K | 256 |
| 7B | 28 | 3584 | 32 | 8 | 8 | 2 | 1M | 512 |
| 32B | 64 | 5120 | 40 | 8 | 8 | 2 | 1M | 512 |
| 64B | 80 | 6656 | 52 | 8 | 8 | 2 | 10M | 1024 |
| 70B | 80 | 8192 | 64 | 8 | 8 | 2 | 10M | 1024 |
| 128B | 120 | 10240 | 80 | 8 | 8 | 2 | 10M | 1536 |
| 314B | 160 | 12288 | 96 | 12 | 16 | 4 | 10M | 2048 |
| 671B | 200 | 16384 | 128 | 16 | 32 | 6 | 10M | 2048 |
| 1T | 240 | 20480 | 160 | 20 | 64 | 8 | 10M | 2560 |
Note: Default quantization values inherit from their respective config files and can be directly overridden in training commands via --force_quant --quant_bits {2,4,8}, --force_lora.
# 2-bit quantization (experimental, extreme memory saving)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 2 --force_lora
# 4-bit quantization (balanced)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 4 --force_lora
# 8-bit quantization (stable)
python manage.py train --model_size 1.5B --dataset Chinese2 --force_quant --quant_bits 8 --force_lora- Python: Recommended 3.11+
- CUDA: 11.8+ (for GPU training and inference)
- Dependencies: See
requirements.txt
git clone https://gitee.com/dunimd/piscesl1.git
# or
git clone https://github.com/mf2023/piscesl1.git
cd piscesl1
python manage.py setup# 1. Clone repository
git clone https://gitee.com/dunimd/piscesl1.git
# or
git clone https://github.com/mf2023/piscesl1.git
cd piscesl1
# 2. Environment setup
python manage.py setup
# 3. Download default dataset
python manage.py downloadAll commands through:
python manage.py <command>View help:
python manage.py help| Command | Description |
|---|---|
| setup | Environment setup and dependency installation |
| train | Train model (support quantization / LoRA / RLHF / GaLore) |
| serve | Start OpenAI-compatible backend inference service |
| test | Project health check (8-stage validation) |
| monitor | System monitoring (GPU/CPU/memory) |
| download | Download dataset |
| benchmark | Model evaluation and benchmarking |
| mcp | MCP tool management (status / warmup / refresh-cache) |
| watermark | Watermark detection (text/file/image/audio/video/model weights) |
| action | Background process management (submit/status/control) |
| dev | Developer mode for training (vim-style command interface) |
| cache | Cache management for .pisceslx directory |
| publish | Package and publish models as Docker images |
| help | Show help information |
# Train 0.5B model
python manage.py train --model_size 0.5B
# Start backend service
python manage.py serve --model_size 7B --port 8000# Chat Completion
curl http://localhost:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "pisceslx-7b", "messages": [{"role": "user", "content": "Hello, introduce yourself"}]}'
# Streaming Response
curl http://localhost:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "pisceslx-7b", "messages": [...], "stream": true}'
# Embedding Generation
curl http://localhost:8000/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"model": "pisceslx-7b", "input": "Hello world"}'# Dataset management
python manage.py download --max_samples 50000
# Training examples
python manage.py train --model_size 0.5B --dataset Chinese2
python manage.py train --model_size 1B --dataset Chinese2 --resume_ckpt runs/last.pt --reset_lr
python manage.py train --model_size 7B --dataset Chinese2 --force_quant --quant_bits 4 --force_lora
python manage.py train --model_size 7B --dataset Chinese2 --rlhf --rlhf_dataset dunimd/human_feedback --rlhf_lr 1e-5
# Backend service
python manage.py serve --model_size 7B --port 8000
python manage.py serve --model_size 14B --host 0.0.0.0 --port 8080 --workers 4
python manage.py serve --model_size 72B
# Benchmark examples
python manage.py benchmark --list
python manage.py benchmark --info mmlu
python manage.py benchmark --benchmark mmlu --config configs/0.5B.json --seq_len 4096 --model ckpt/model.pt
python manage.py benchmark --perf --config configs/0.5B.json --selftest
# MCP tools
python manage.py mcp --mcp_action status
python manage.py mcp --mcp_action warmup
python manage.py mcp --mcp_action refresh-cache
# Watermark detection
python manage.py watermark --text "Detect text watermark"
python manage.py watermark --file document.txt
python manage.py watermark --image-file image.png
python manage.py watermark --audio-file audio.wav
python manage.py watermark --video-file video.mp4
python manage.py watermark --model-file model.pt
python manage.py watermark --weights-verify --ckpt model.pt
# Background process management
python manage.py action submit train configs/train.json
python manage.py action submit train configs/train.json --gpu_count 2 --priority high
python manage.py action submit serve configs/serve.json
python manage.py action status
python manage.py action logs <run_id>
python manage.py action control <run_id> pause
python manage.py action control <run_id> resume
python manage.py action control <run_id> stop
python manage.py action list
python manage.py action list --running
# GPU resource management
python manage.py action gpu list
python manage.py action gpu status
python manage.py action gpu status --gpu_id 0
python manage.py action gpu release --task_id <run_id>
# Task queue management
python manage.py action queue list
python manage.py action queue stats
python manage.py action queue clear --priority low
# System resources
python manage.py action resources status
python manage.py action resources utilization
# Task recovery
python manage.py action recover <run_id>
python manage.py action recover <run_id> --checkpoint runs/<run_id>/ckpt.pt
# Developer mode (vim-style command interface for training)
python manage.py dev enable # Enable developer mode
python manage.py dev disable # Disable developer mode
python manage.py dev status # Check developer mode status
# Available commands during training:
# Memory: :mem, :mem-gpu, :mem-cpu
# Model: :layer, :layers, :grad, :grad-norm
# Training: :pause, :resume, :save, :lr, :batch
# Config: :config, :config-model, :config-data
# Monitoring: :watch, :watch-clear, :profile
# Intervention: :inject, :freeze, :unfreeze, :nan-check
# Other: :help, :q
# Cache management for .pisceslx directory
python manage.py cache # Show cache status
python manage.py cache clean # Clean all cache (settings/ protected)
# Publish models as Docker images with inference engine
python manage.py publish --publish_action full --publish_model_size 7B --publish_registry docker.io
python manage.py publish --publish_action full --publish_model_size 7B --publish_model_path ./ckpt/7B.pt
python manage.py publish --publish_action export --publish_model_size 7B --publish_output_dir ./export/
python manage.py publish --publish_action build --publish_model_size 7B --publish_template gpu
python manage.py publish --publish_action push --publish_registry ghcr.io --publish_registry_namespace myuser
python manage.py publish --publish_action validate --publish_model_size 7B
python manage.py publish --publish_action info --publish_model_size 7B
python manage.py publish --publish_action listDataset is configured by configs/dataset.yaml and downloaded through:
python manage.py download-
Default download priority: ModelScope → HuggingFace (automatic mirroring when inaccessible)
-
Complete list see
configs/dataset.yaml
- How to view available commands?
python manage.py help - How to add new dataset? Edit
configs/dataset.yamland runpython manage.py download. Custom dataset recommend JSONL (text) or Parquet (input_ids/labels). - Insufficient GPU memory? Use smaller model, reduce sequence length, or enable 4-bit quantization (
--force_quant --quant_bits 4, usually with--force_lora). - How to resume training?
--resume_ckpt path/to/ckpt.pt(optional--reset_lr) - CPU only? Can use
--device cpu(slower performance). - How to perform evaluation?
python manage.py benchmark ..., with--config,--seq_len,--modeland other parameters.
- Welcome to submit Issues and PRs!
- Gitee: https://gitee.com/dunimd/piscesl1.git
- GitHub: https://github.com/mf2023/piscesl1.git
- ModelScope: https://www.modelscope.cn/models/mfchina2024/PiscesL1
This project implements algorithms from the following academic papers. We sincerely thank the authors for their contributions.
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| ALiBi | Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | Press et al. | ICLR | 2022 | attention.py |
| Attention Sink | Efficient Streaming Language Models with Attention Sinks | Xiao et al. | ICLR | 2024 | attention.py |
| QK Normalization | Query-Key Normalization for Transformers | Henry et al. | ICLR | 2020 | attention.py |
| Linear Attention | Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention | Katharopoulos et al. | ICML | 2020 | attention.py |
| S4 | Efficiently Modeling Long Sequences with Structured State Spaces | Gu et al. | ICLR | 2022 | attention.py |
| Longformer | Longformer: The Long-Document Transformer | Beltagy et al. | - | 2020 | attention.py |
| BigBird | Big Bird: Transformers for Longer Sequences | Zaheer et al. | NeurIPS | 2020 | attention.py |
| Ring Attention | Ring Attention with Blockwise Transformers for Near-Infinite Context | Liu et al. | ICLR | 2024 | attention.py |
| MQA | Fast Transformer Decoding: One Write-Head is All You Need | Shazeer | - | 2019 | attention.py |
| H2O | H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models | Zhang et al. | ICLR | 2024 | attention.py |
| LongRoPE | LongRoPE: Extending LLM Context Window Beyond 2M Tokens | Ding et al. | ICML | 2024 | attention.py |
| PagedAttention | Efficient Memory Management for Large Language Model Serving with PagedAttention | Kwon et al. | SOSP | 2023 | attention.py |
| Flash Attention | FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Dao et al. | NeurIPS | 2022 | attention.py |
| Flash Attention 2 | FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | Dao | - | 2023 | attention.py |
| Flash Attention 3 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Blockwise Parallelism | Dao et al. | - | 2024 | flash_attention.py |
| CoPE | Context-aware Position Encoding for Better Length Extrapolation | Yang et al. | arXiv | 2024 | attention.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| Sinusoidal PE | Attention Is All You Need | Vaswani et al. | NeurIPS | 2017 | embedding.py |
| RoPE | RoFormer: Enhanced Transformer with Rotary Position Embedding | Su et al. | - | 2021 | norms.py |
| YaRN | YaRN: Efficient Context Window Extension of Large Language Models | Peng et al. | - | 2023 | norms.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| RMSNorm | Root Mean Square Layer Normalization | Zhang & Sennrich | NeurIPS | 2019 | norms.py |
| Adaptive LayerNorm | Scalable Diffusion Models with Transformers (DiT) | Peebles & Xie | ICCV | 2023 | norms.py |
| LayerScale | Going deeper with Image Transformers | Touvron et al. | ICCV | 2021 | blocks.py |
| SwiGLU | GLU Variants Improve Transformer | Shazeer | - | 2020 | blocks.py |
| GeGLU | GLU Variants Improve Transformer | Shazeer | - | 2020 | blocks.py |
| Group Normalization | Group Normalization | Wu & He | ECCV | 2018 | norms.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| Mamba | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | Gu & Dao | arXiv | 2023 | blocks.py |
| Mamba-2 | Mamba-2: Transforming Transformers | Dao et al. | arXiv | 2024 | blocks.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| UltraMem TDQKR | UltraMem | ByteDance | ICLR | 2025 | layer.py |
| DeepSeekMoE | DeepSeek-V3 Technical Report | DeepSeek Team | - | 2024 | expert.py, layer.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| Speculative Decoding | Fast Inference from Transformers via Speculative Decoding | Leviathan et al. | ICML | 2023 | cache.py |
| BLIP-2 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | Li et al. | ICML | 2023 | cache.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| K-FAC | Optimizing Neural Networks with Kronecker-factored Approximate Curvature | Martens & Grosse | ICML | 2015 | kfac.py |
| K-FAC for Conv | A Kronecker-factored Approximate Fisher Matrix for Convolution Layers | Grosse & Martens | ICML | 2016 | kfac.py |
| Multi-Task Uncertainty | Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics | Kendall et al. | CVPR | 2018 | multitask_uncertainty.py |
| SGDR | SGDR: Stochastic Gradient Descent with Warm Restarts | Loshchilov & Hutter | arXiv | 2016 | modality_scheduler.py |
| Chinchilla Scaling | Training Compute-Optimal Large Language Models | Hoffmann et al. | - | 2022 | scaling/init.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| DPO | Direct Preference Optimization: Your Language Model is Secretly a Reward Model | Rafailov et al. | NeurIPS | 2023 | dpo.py |
| GRPO | DeepSeek R1 Technical Report | DeepSeek Team | arXiv | 2024 | grpo.py |
| RLVR | DeepSeek R1 Technical Report / OpenAI o1 | DeepSeek / OpenAI | arXiv | 2024/2025 | rlvr.py |
| TPO | Test-Time Preference Optimization: On-the-fly Alignment via Iterative Textual Feedback | - | arXiv | 2025 | tpo.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| ReAct | ReAct: Synergizing Reasoning and Acting in Language Models | Yao et al. | ICLR | 2023 | react_agentic.py |
| Chain-of-Thought | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Wei et al. | NeurIPS | 2022 | react_agentic.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| GaLore | GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | Zhao et al. | arXiv | 2024 | galore.py |
| ROOT | ROOT: Robust Orthogonalized Optimizer for Neural Network Training | Huawei Noah's Ark Lab | arXiv | 2024 | root.py |
| FP4 Training | Optimizing Large Language Model Training Using FP4 Quantization | - | arXiv | 2025 | fp4.py |
| Algorithm | Paper | Authors | Venue | Year | Code |
|---|---|---|---|---|---|
| GPTQ | GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | Frantar et al. | ICLR | 2023 | orchestrator.py |
If you use this project in your research, please cite:
@misc{piscesl1,
author = {Wenze Wei, Dunimd Team},
title = {PiscesL1: A High-Performance Multimodal Mixture-of-Experts Model},
year = {2026},
publisher = {GitHub},
url = {https://github.com/mf2023/piscesl1}
}This project uses Apache License 2.0 open source agreement, see LICENSE file.
Open source packages and their agreement information used by this project:
| 📦 Package | 📜 License | 📦 Package | 📜 License |
|---|---|---|---|
| torch | BSD-style | torchvision | BSD-style |
| torchaudio | BSD-style | torch-directml | MIT |
| transformers | Apache 2.0 | tokenizers | Apache 2.0 |
| huggingface-hub | Apache 2.0 | modelscope | Apache 2.0 |
| numpy | BSD 3-Clause | scipy | BSD 3-Clause |
| scikit-learn | BSD 3-Clause | addict | MIT |
| accelerate | Apache 2.0 | einops | MIT |
| timm | Apache 2.0 | pytorch-lightning | Apache 2.0 |
| pillow | HPND | PyMuPDF | AGPL 3.0 |
| python-docx | MIT | python-pptx | MIT |
| pdfplumber | MIT | pdf2image | MIT |
| ocrmypdf | MPL 2.0 | bitsandbytes | MIT |
| peft | Apache 2.0 | wheel | MIT |
| xformers | BSD 3-Clause | trl | Apache 2.0 |
| nvidia-ml-py3 | BSD 3-Clause | fastapi | MIT |
| uvicorn | BSD 3-Clause | python-multipart | Apache 2.0 |
| pydantic | MIT | httpx | BSD 3-Clause |
| pandas | BSD 3-Clause | gradio | Apache 2.0 |
| ijson | BSD 3-Clause | pyarrow | Apache 2.0 |
| tqdm | MIT | jsonlines | MIT |
| windows-curses | BSD 3-Clause | psutil | BSD 3-Clause |
| streamlit | Apache 2.0 | PyYAML | MIT |
| GitPython | BSD 3-Clause | opencv-python | MIT |
| av | BSD 3-Clause | decord | Apache 2.0 |
| imageio | BSD 3-Clause | imageio-ffmpeg | BSD 3-Clause |
| openai | Apache 2.0 | requests | Apache 2.0 |
| beautifulsoup4 | MIT | psutil | BSD 3-Clause |
| pytz | MIT | pywin32 | PSF |
| duckduckgo-search | MIT | plotly | MIT |
| evalscope | Apache 2.0 | safetensors | Apache 2.0 |
| deepspeed | Apache 2.0 | aiofiles | Apache 2.0 |
| pathlib2 | MIT | textual | MIT |
| dmsc | Apache 2.0 | datasets | Apache 2.0 |
| rich | MIT | omegaconf | BSD 3-Clause |
| hydra-core | MIT | wandb | MIT |
| tensorboard | Apache 2.0 | mlflow | Apache 2.0 |
| lm-eval | MIT | rouge-score | Apache 2.0 |
| sacrebleu | Apache 2.0 | bert-score | MIT |
| librosa | ISC | soundfile | BSD 3-Clause |
| audioread | MIT | pydub | MIT |
| flash-attn | BSD 3-Clause | triton | MIT |
| mamba-ssm | Apache 2.0 | causal-conv1d | Apache 2.0 |
| docker | Apache 2.0 |
✅ Success. Connection established.