Skip to content

Characterize model scaling with additional training data #87

@forklady42

Description

@forklady42

Overview

Understand how model performance scales as the amount of training data increases. This will inform data collection priorities and set expectations for future training runs.

Tasks

  • Define evaluation metric(s) to track (e.g., validation loss, MAE on charge density)
  • Train models on increasing subsets of available data (e.g., 10%, 25%, 50%, 75%, 100%)
  • Plot learning curves as a function of dataset size
  • Identify whether the model is data-limited or compute-limited at current scale
  • Summarize findings and recommend next steps for data acquisition if needed

Acceptance Criteria

  • Scaling curves produced and documented
  • Clear conclusion on whether more data is expected to yield meaningful gains

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions