@@ -130,6 +130,26 @@ uv run python -m analysis.generate_tables
130130uv run python -m analysis.generate_figures
131131```
132132
133+ ### Results Directory Structure
134+
135+ Experiments are organized into two directories:
136+
137+ - ** ` results/raw/ ` ** : 72 standard training experiments (FP32 baselines, BitNet baselines, ablations)
138+ - ** ` results/raw_kd/ ` ** : 63 knowledge distillation experiments (FP32+KD, BitNet+KD, recipe variants)
139+
140+ See [ ` results/README.md ` ] ( results/README.md ) for detailed structure and naming conventions.
141+
142+ ** Quick analysis** :
143+
144+ ``` bash
145+ # Aggregate all 135 experiments into DataFrame
146+ uv run python -m analysis.aggregate_results
147+
148+ # Generate paper tables and figures
149+ uv run python -m analysis.generate_tables
150+ uv run python -m analysis.generate_figures
151+ ```
152+
133153## Supported Models
134154
135155| Model | timm name |
@@ -170,12 +190,88 @@ uv run ruff check .
170190uv run mypy .
171191```
172192
193+ ## Documentation
194+
195+ Root-level documentation files for reviewers and reproducibility:
196+
197+ - ** [ README.md] ( README.md ) ** - This file; main project documentation
198+ - ** [ reproduce.sh] ( reproduce.sh ) ** - Quick validation script (10 minutes)
199+ - ** [ EXPERIMENTS_REFERENCE.sh] ( EXPERIMENTS_REFERENCE.sh ) ** - Full reproduction commands (920 GPU-hours)
200+ - ** [ METHODOLOGY.md] ( METHODOLOGY.md ) ** - Experimental design and phase structure
201+ - ** [ PROJECT_SETUP.md] ( PROJECT_SETUP.md ) ** - Quick start guide and project structure
202+ - ** [ REPRODUCE.md] ( REPRODUCE.md ) ** - Detailed reproduction guide
203+ - ** [ TTQ_VERIFICATION.md] ( TTQ_VERIFICATION.md ) ** - Technical verification of TTQ comparison
204+
173205## Reproducibility
174206
175- - Fixed random seeds (42, 123, 456)
176- - Deterministic CUDA operations
177- - Complete environment in ` uv.lock `
178- - Hardware: 2x NVIDIA RTX A6000
207+ This work follows strict reproducibility standards with full code, data, and analysis pipeline publicly available.
208+
209+ ### Quick Validation (10 minutes)
210+
211+ Regenerate all paper artifacts from pre-computed results:
212+
213+ ``` bash
214+ ./reproduce.sh
215+ ```
216+
217+ This will:
218+
219+ 1 . Set up the environment (` uv sync ` )
220+ 2 . Aggregate 153 experiment results (` analysis/aggregate_results.py ` )
221+ 3 . Generate 12 LaTeX tables (` analysis/generate_tables.py ` )
222+ 4 . Generate 6 PDF figures (` analysis/generate_figures.py ` )
223+ 5 . Compile the paper PDF (` paper/main.pdf ` )
224+
225+ ** Verification:**
226+
227+ - ` results/processed/aggregated.csv ` should match committed version (bit-exact)
228+ - ` paper/main.pdf ` should compile to 28 pages, ~ 550 KB
229+ - All tables/figures should match paper exactly
230+
231+ ### Full Experimental Reproduction (920 GPU-hours)
232+
233+ To re-run all 153 experiments from scratch, see ` EXPERIMENTS_REFERENCE.sh ` for exact commands.
234+
235+ ** Requirements:**
236+
237+ - 2× RTX 4090 or A100 GPUs
238+ - 50 GB disk space (checkpoints + TensorBoard logs)
239+ - 2-3 weeks wall-clock time on consumer GPUs
240+
241+ ** Phases:**
242+
243+ 1 . FP32 Baselines (18 experiments)
244+ 2 . FP32+KD Control (9 experiments)
245+ 3 . BitNet Baselines (18 experiments)
246+ 4 . BitNet + Recipe (18 experiments)
247+ 5 . Statistical Power (14 experiments, n=10 seeds)
248+ 6 . TTQ Comparison (18 experiments)
249+
250+ ### Results Directory Structure
251+
252+ ```
253+ results/
254+ ├── raw/ # 72 standard training experiments
255+ │ └── {dataset}/{model}/{version}_s{seed}/
256+ │ ├── config.json # Training hyperparameters
257+ │ └── results.json # Final metrics
258+ ├── raw_kd/ # 63 knowledge distillation experiments
259+ │ └── [same structure]
260+ └── processed/
261+ └── aggregated.csv # All 153 experiments aggregated
262+ ```
263+
264+ ** Note:** Pre-computed results include only ` config.json ` and ` results.json ` per experiment. Full checkpoints (` best_model.pth ` ) and TensorBoard logs are not included due to size (8.9 GB total).
265+
266+ ### Deterministic Training
267+
268+ All experiments use fixed seeds with deterministic settings:
269+ - Seeds: 42, 123, 456 (main experiments)
270+ - Additional seeds: 789, 1011, 1213, 1415, 1617, 1819, 2021 (statistical power)
271+ - PyTorch: ` torch.manual_seed(seed) ` , ` cudnn.deterministic=True `
272+ - NumPy: ` np.random.seed(seed) `
273+
274+ Re-running experiments with the same seed produces bit-exact checkpoint MD5 hashes.
179275
180276## Experiment Plan
181277
0 commit comments