Questions about reproducing training results with default configuration

**Reproduction Environment**
- GPU: H20*4
- Configuration: Codebase default settings
- Dataset: First 15k SAT samples (as per default config)

**My Results**

Figure 1. My training curve
![Image](https://github.com/user-attachments/assets/d0de4bf1-db18-4b76-ab93-163ad2598ed6)

Figure 2. My test curve 
![Image](https://github.com/user-attachments/assets/ebd63e67-43a4-489a-99db-f04ab09e3c51)

- My reproduction (base+GRPO): 58.4 (step 1000)
- Qwen2-VL instruct  model: 61.6

**Results from [report](https://turningpointai.notion.site/the-multimodal-aha-moment-on-2b-model)**

Figure 3. Test curve from report
![Image](https://github.com/user-attachments/assets/ac5860db-db55-433c-bef3-cf420e9a725b)

**Key Questions**
1. Performance gap (58.4 vs ~59.5) between my reproduction and reported results.
2. Inconsistent qwen2-VL instruct model performance (61.6 locally vs ~56 in report).
3. Abnormal trend in reproduced SFT curve and GRPO curve (Figure 2) .
4. Why does default config only use first 15k SAT samples instead of full dataset?
5. According to your experience, what are the possible reasons for the abnormal reproduction results?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about reproducing training results with default configuration #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about reproducing training results with default configuration #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions