Skip to content

jakehemmerle/evolutionary-mnist

Repository files navigation

evolutionary-mnist

Hyperparmaeter tuning via LLM for a two layer CNN on the MNIST dataset. Also an example of something that can be done, but probably shouldn't.

Uses:

Example run

Validation accuracy:

Accuracy per generation

Reminder: no regression is used here. The improvements learned come strictly from the LLM's reasoning and analysis on the previous runs:

Accuracy vs learning rate

Example LLM reasoning (which leaves a lot of room for improvement):

Screenshot

Setup

uv sync

The MNIST dataset is downloaded from Hugging Face: https://huggingface.co/datasets/ylecun/mnist

uvx --from huggingface_hub hf download ylecun/mnist --repo-type dataset --local-dir data

Use the scripts/prepare_data.py script to split the dataset into train and validation sets.

Running Experiments

This example runs for 5 generations with 4 training runs per generation.

uv run evolutionary-mnist experiments/evo-mini-v3.toml

Future work:

  • Improve system prompt.
  • Neural Architecture Search (NAS) for on-the-fly architecture exploration.
  • Keep training time constant per run within each generation.

About

Training CNN's on MNIST using an LLM for hyperparameter optimization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages