Incorrect opt_end_learning_rate for GBS 64 in gpt-oss-20b RCPs

While collecting submission logs and comparing against RCPs, we found that the GBS 64 RCPs use an LR of "1e-05" while using a MIN_LR of "4e-05". This would violate the rule for the hyperparameters of this benchmark:
https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#91-hyperparameters

```
gpt_oss_20b | adamw | opt_end_learning_rate | opt_base_learning_rate * 0.1 
```

Logs from reference RCPs:
https://github.com/mlcommons/training/blob/master/small_llm_moe_pretraining/primus/rcp_logs/gbs64/run_0.log#L30

Logs snippet:
```
:::MLLOG {"namespace": "", "time_ms": 1773021880374, "event_type": "POINT_IN_TIME", "key": "opt_adamw_epsilon", "value": 1e-05, "metadata": {"file": "/opt/venv/lib/python3.10/site-packages/primus_mllog/mlperf_pre_training.py", "lineno": 65}}
:::MLLOG {"namespace": "", "time_ms": 1773021880374, "event_type": "POINT_IN_TIME", "key": "opt_adamw_weight_decay", "value": 0.1, "metadata": {"file": "/opt/venv/lib/python3.10/site-packages/primus_mllog/mlperf_pre_training.py", "lineno": 65}}
:::MLLOG {"namespace": "", "time_ms": 1773021880374, "event_type": "POINT_IN_TIME", "key": "opt_gradient_clip_norm", "value": 1.0, "metadata": {"file": "/opt/venv/lib/python3.10/site-packages/primus_mllog/mlperf_pre_training.py", "lineno": 65}}
:::MLLOG {"namespace": "", "time_ms": 1773021880374, "event_type": "POINT_IN_TIME", "key": opt_end_learning_rate", "value": 4e-05, "metadata": {"file": "/opt/venv/lib/python3.10/site-packages/primus_mllog/mlperf_pre_training.py", "lineno": 65}}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect opt_end_learning_rate for GBS 64 in gpt-oss-20b RCPs #458

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect opt_end_learning_rate for GBS 64 in gpt-oss-20b RCPs #458

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions