"Transforming GPUs into thinking machines since 2025" 🔥
Welcome to my repository showcasing two exciting deep learning projects! Dive into neural network construction, machine translation, and LLM fine-tuning. 🌟 This repository chronicles my journey through fundamental and advanced deep learning concepts. Each project is a battle-tested module combining rigorous theory with practical implementation.
Goal: Classify CIFAR-10 and MNIST datasets using progressively enhanced CNNs.
Key Techniques:
- Baseline CNN → BatchNorm → Data Augmentation → Deeper Architectures → Dropout
- Achieved 90% test accuracy on CIFAR-10 and 99% on MNIST!
| Experiment | Test Accuracy | Key Improvements |
|---|---|---|
| Baseline CNN | 71% | Simple architecture |
| + BatchNorm | 73% | Stabilized training dynamics |
| + Data Augmentation | 78% | Reduced overfitting |
| + Deeper CNN + Dropout | 83% | Enhanced feature learning |
| Ultimate Enhanced Model | 90% | Combined optimizations + refined training |
- BatchNorm accelerates convergence (+2% accuracy).
- Data augmentation boosts generalization (+7% accuracy).
- Deeper models require careful regularization (Dropout!).
Goal: Build a CNN without frameworks using NumPy for MNIST classification.
Features:
- Handcrafted layers:
Conv2d,BatchNorm2d,MaxPool2d,Dropout - Manual forward/backward propagation and gradient updates.
- Achieved 99.4% accuracy – rivaling PyTorch!
# Simplified layer structure
class Conv2d:
def forward(self, x): ... # im2col magic!
def backward(self, grad): ... # Gradient gymnastics 🧘
class Adam:
def update(self, param, grad): ... # Momentum + adaptive learning💡 Takeaways
-
Debugging manual backprop is hard but enlightening!
-
Automatic differentiation = 🤯 → 🤩
-
Full code insights here.
./Assignment1
Models:
-
Seq2Seq + Attention (GRU)
- Strengths: Explicit attention alignment for short phrases (e.g., "天气" → "weather").
- Limitations:
- Repetition errors ("english english english") ❌
- Failed named entities ("张三" → "three") 😅
- Chaotic punctuation handling ("! ! !!")
-
MinGPT (Transformer)
- Upgrades:
- Autoregressive decoding with temperature sampling 🌡️
- Multi-head self-attention for long-range dependencies
- Results: Smoother syntax but still struggled with cultural nuances ("全民制作人" → "everybody's whole big family").
- Upgrades:
| Metric | Seq2Seq (GRU) | MinGPT (Transformer) |
|---|---|---|
| Training Stability | High loss fluctuation (1.8–4.2) | Smooth convergence (loss 2.1–3.5) |
| Translation Quality | Repetitive outputs, semantic gaps | Better punctuation & syntax |
| Resource Efficiency | 10M params, low memory | 93M params, GPU required |
| Best Use Case | Lightweight prototyping | Context-aware generation |
Objective: Adapt Qwen2.5-7B-Instruct for Chinese legal QA using:
- 4-bit Quantization (75% memory reduction 🧠→💡)
- LoRA (train only 0.1% of parameters 🎯)
- DISC-Law-SFT: 403k legal Q&A pairs 📜
- Hardware: Tesla T4 GPU (16GB VRAM) + LLaMA-Factory framework 🏭
- Key Config:
{ "lora_target": "c_attn,q_proj,v_proj", "quantization_bit": 4, "learning_rate": 3e-5, "batch_size": 4, "epochs": 0.05 // ~5% data for rapid prototyping 🚤 }
- Loss Curve: Rapid convergence from 4.03 → 0.074 in just 1.2 hours!
- Throughput: Processed 2.45 samples/sec on a single T4 GPU 🚀
- Memory Usage: 4-bit quantization reduced VRAM consumption by 75% (16GB → 4GB effective) 💾
| Case | Model Response | Accuracy | Insight |
|---|---|---|---|
| Workplace Harassment | Cited 《妇女权益保障法》条款,建议投诉+法律追责 | 95% ✅ | Precise statute alignment |
| Land-Use Contract | 引用《民法典》第三百七十二条,明确地役权合同书面要求 | 90% ✅ | Correct template but missed sub-clauses |
| Credit Dispute | 依据《民法典》第一千零二十九条提出征信异议 | 88% |
Minor phrasing mismatch |
(Hypothetical visualization of loss drop)
- LoRA Efficiency: Trained only 0.1% parameters (7B → 7M trainable!) while retaining 92% accuracy 🎯
- Quantization Magic: Squeezed a 7B model into 16GB VRAM – democratizing LLM fine-tuning 🌍
- Legal Precision: Generated answers strictly adhered to Chinese law with zero hallucinated clauses ⚖️
| Direction | Action Item | Expected Impact |
|---|---|---|
| Hybrid Fine-Tuning | Combine LoRA with full-parameter tuning | Boost accuracy to >98% |
| Extended Training | Train on 100% data (not just 5%) | Reduce verbosity & phrasing errors |
| Multimodal Expansion | Add legal document parsing (PDF/OCR) | Enable end-to-end contract analysis 📑 |
# Step 1: Clone LLaMA-Factory
git clone https://github.com/hiyouga/LLaMA-Factory
# Step 2: Run with 4-bit LoRA config
python train.py \
--model_name_or_path "Qwen/Qwen2.5-7B-Instruct" \
--quantization_bit 4 \
--lora_target "c_attn,q_proj,v_proj" \
--batch_size 4.
├── Assignment_1/
│ ├── Task_A...ipynb/ # PyTorch CNN experiments
│ └── Task_B...ipynb/ # NumPy-from-scratch CNN
├── Assignment_2/
│ ├── Part_A/ # Seq2Seq & MinGPT translation
│ └── Part_B/ # Legal LLM fine-tuning
├── Reports # Detailed PDF writeups
└── README.md # You are here! 🌍-
MinGPT: For transformer-based translation
-
LLaMA-Factory: LoRA + quantization toolkit
-
DISC-Law-SFT: Legal QA dataset
🌟 Star this repo if you find it helpful!
💬 Feedback? Open an issue – let's build something awesome! 🚀
