Skip to content

XP-PY/llm-prep-2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Models Learning

Model Key Points
CLIP Vision-Language Model: Contrastive Pre-training, zero-shot transfer, image-text encoder fusion
SigLIP Vision-Language Model: Sigmoid Pairwise Loss, improved training efficiency over CLIP
Gemma 3 Vision-Language Model (on Decoder-only LLM): vision encoder (SigLIP) + GQA + 5:1 Local/Global Attention Interleaving
Gemma 4 Multimodal Open Model Family: Hybrid local/global attention + Per-Layer Embeddings (PLE) + variable-resolution vision token budgets + Dense / MoE deployment scaling
DeepSeek-VL Vision-Language Model (on Decoder-only LLM): Hybrid vision encoder (SigLIP semantic + SAM-B high-res details) → fixed-token high-res processing, gradual modality-balanced pretraining to preserve language strength
DeepSeek-VL2 Vision-Language Model (MoE Decoder-only LLM): Single SigLIP dynamic tiling (global thumbnail + local tiles) → arbitrary high-res/aspect ratios with controlled tokens, DeepSeekMoE backbone with MLA
DeepSeek-V2 Decoder-only Transformer: MLA + DeepSeekMoE
DeepSeek-V3 Decoder-only Transformer: MLA + DeepSeekMoE with auxiliary-loss-free + Multi-token prediction (MTP)
DeepSeek-V3.2 Decoder-only Transformer (Long-Context + Agentic RL): DeepSeek Sparse Attention (DSA) (Lightning Indexer → Top-k KV selection; O(L·k) core attention for 128K) + MLA; MQA-mode integration for efficient sparse KV sharing + scaled post-training RL (GRPO) (>10% pretrain compute) + large-scale agent/tool-use task synthesis (verified environments)
DeepSeek-R1 Reasoning MoE on DeepSeek-V3-Base: R1-Zero shows pure RL can induce long-CoT reasoning; R1 adds cold-start SFT + multi-stage RL to improve readability, language consistency, and general assistant behavior

LLM Knowledge System

A topic-based map of this repo. This section is organized by knowledge domains rather than learning phases.

Visual Map

flowchart TD
    A[LLM Knowledge System]

    A --> B[Foundations]
    A --> C[Architecture and Scaling]
    A --> D[Adaptation and Alignment]
    A --> E[Inference and Serving]
    A --> G[Model Case Studies]

    B --> B1[SVD / dtypes / AdamW]
    B --> B2[Attention: MHA / MQA / GQA]
    B --> B3[RoPE / SwiGLU]

    C --> C1[FlashAttention / MLA]
    C --> C2[DeepSeekMoE]
    C --> C3[TP / PP / EP]

    D --> D1[LoRA / QLoRA / DoRA]
    D --> D2[Specialized LoRA Variants]
    D --> D3[SFT / RLHF / DPO / PPO / GRPO]

    E --> E1[Speculative Decoding]
    E --> E2[Continuous Batching / PagedAttention]
    E --> E3[AWQ / GPTQ / TensorRT-LLM]
    E --> E4[Hallucination Mitigation]

    G --> G1[DeepSeek-V2]
    G --> G2[DeepSeek-V3]
    G --> G3[DeepSeek-V3.2]

    G1 -. combines .-> C1
    G1 -. combines .-> C2
    G2 -. combines .-> C1
    G2 -. combines .-> C2
Loading

The diagram gives a high-level overview; the sections below act as the detailed index.

Domain Focus Core Topics
Foundations Math, optimization, losses, normalization, and Transformer building blocks SVD, dtypes, AdamW, Sigmoid, GELU, LayerNorm, RMSNorm, BatchNorm, MHA/MQA/GQA, RoPE, SwiGLU
Architecture & Scaling Efficient training and large-scale model design FlashAttention, MLA, DeepSeekMoE, TP/PP/EP
Adaptation & Alignment Task adaptation and preference learning LoRA family, SFT, RLHF, DPO, PPO, GRPO
Agent Systems Retrieval, memory, tool use, API interfaces, and task orchestration Agent Basics, Memory Systems, RAG Systems, OpenAI API Interfaces
Inference & Serving Latency, memory, and deployment efficiency Speculative Decoding, Continuous Batching, Quantization, TensorRT-LLM, Hallucination Mitigation
VLA & Robotics Vision-language-action policies and embodied control RT-1

1. Foundations

2. Architecture & Scaling

3. Adaptation & Alignment

4. Inference & Serving

5. Agent Systems

6. VLA & Robotics

  • Vision-language-action policies: RT-1

File Structure of /docs

docs/
|-- Agent_Systems/
|   |-- Agent_Basics.md
|   |-- MCP_Protocol.md
|   |-- Memory_Systems.md
|   |-- OpenAI_API_Interface_Format.md
|   |-- RAG_Systems.md
|   |-- Skill_Systems.md
|   `-- Tool_Registry_and_Function_Calling.md
|-- Activation_Layers/
|   |-- GELU.md
|   |-- Sigmoid.md
|   `-- SwiGLU.md
|-- Attention_Machanisms/
|   |-- FlashAttention.md
|   |-- GQA.md
|   |-- MHA.md
|   |-- MLA.md
|   |-- MQA.md
|   `-- SVD_Attention.md
|-- Inference_Optimization/
|   |-- continuous_batching.md
|   |-- hallucination_mitigation.md
|   |-- quantization_inference.md
|   |-- speculative_decoding.md
|   `-- tensorrt_multilora.md
|-- Large_Models/
|   |-- CLIP.md
|   |-- DeepSeek_R1.md
|   |-- DeepSeek_V2.md
|   |-- DeepSeek_V3.md
|   |-- DeepSeek_V32.md
|   |-- DeepSeek_VL.md
|   |-- DeepSeek_VL2.md
|   |-- Gemma_3.md
|   |-- Gemma_4.md
|   `-- SigLIP.md
|-- Math/
|   |-- Memory_Estimation.md
|   |-- SVD.md
|   `-- dtypes.md
|-- MoE/
|   `-- DeepSeekMoE.md
|-- Norm/
|   |-- BatchNorm.md
|   |-- RMSNorm.md
|   `-- LayerNorm.md
|-- Optimizer/
|   `-- AdamW.md
|-- PEFT/
|   |-- DoRA.md
|   |-- LoRA.md
|   |-- QLoRA.md
|   `-- Specialized_LoRA.md
|-- Parallelism/
|   |-- EP.md
|   |-- PP.md
|   `-- TP.md
|-- Position_Embeding/
|   `-- RoPE.md
|-- Preference_Alignment/
|   |-- DPO.md
|   |-- GRPO.md
|   |-- PPO.md
|   |-- RLHF.md
|   `-- SFT.md
|-- Loss/
|-- VLAs/
|   `-- RT_1.md
`-- Resource/
    |-- Text_Color_Table.md
    `-- pics/
        `-- ...

Learning Resource Recommendation

About

Preparations For Large Model Application Engineer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors