Agentic Science Worker

An autonomous AI researcher for computational materials science. Not a tool that executes commands—an independent lab member that takes ownership of research problems.

Current Status: 81/86 benchmarks passing (100% pass rate) | View Dashboard

Vision

Today: Handle defined workflows and research questions autonomously.

Tomorrow: Receive a group meeting transcript, work independently for hours or days, return with research contributions—hypotheses tested, literature synthesized, calculations completed.

The goal is not to help humans do science faster. It's to build a system that does science: forms hypotheses, tests them computationally, identifies what's interesting, and generates knowledge that didn't exist before.

Recent Highlights

Novel Material Discovery: Autonomously discovered 9 novel Li-ion cathode materials including Li₂Ni(PO₄)(SO₄) with 5.1V voltage (T10-001: 75)
Cross-Modal Reasoning: Determined crystal structure from XRD pattern using computational methods (T10-002: 72)
Cloud GPU Integration: Full VAST.ai lifecycle management - provision, execute, cleanup (T17: 97/91/92)
Publication-Ready Analysis: MSD/diffusion calculations, thermodynamic parsing (T18: 92/92)

See real examples: Showcases | Novel Cathode Discovery | XRD Structure Determination

Overview

The Agentic Science Worker can:

Run molecular dynamics simulations (LAMMPS) with literature-sourced parameters
Perform DFT calculations (Quantum ESPRESSO) for electronic structure
Use ML interatomic potentials (MACE, CHGNet, M3GNet) for fast screening
Provision cloud GPUs (VAST.ai) for overflow compute
Search scientific literature and extract methodology/parameters
Query materials databases (Materials Project) for structures and properties
Analyze results with publication-quality figures and error propagation

Given a scientific question, it researches the methodology, finds parameters, runs simulations, verifies results against literature, and iterates until achieving physically reasonable results—like a competent graduate student working independently.

Supported Coding Agents

Works with multiple coding agents (Claude Code, Aider, Cursor). Each reads AGENTS.md as primary context, gaining the knowledge and principles to work as an autonomous researcher.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Coding Agent                            │
│    Claude Code │ Aider │ OpenAI Codex │ Cursor              │
│     (AGENTS.md defines researcher behavior and methodology) │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                        Skills                               │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ LAMMPS   │ │ QE/DFT   │ │ VAST.ai  │ │ MLIP     │       │
│  │ Sim      │ │ Calc     │ │ Cloud    │ │ Potentials│      │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘       │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │Literature│ │Materials │ │ Data     │ │ Resource │       │
│  │ Search   │ │ Database │ │ Analysis │ │ Acquire  │       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    External Tools                           │
│  LAMMPS │ Quantum ESPRESSO │ MACE/CHGNet │ VAST.ai │ Web   │
└─────────────────────────────────────────────────────────────┘

Supported Agents

Agent	Status	Configuration
Claude Code	Full Support	`AGENTS.md`, `.claude/`
Aider	Full Support	`AGENTS.md`, `configs/aider/`
OpenAI Codex	Planned	`AGENTS.md`
Cursor	Full Support	`AGENTS.md`, `.cursorrules`

All agents read AGENTS.md (the industry standard) as their primary context file.

Quick Start

Prerequisites

A supported coding agent:
- Claude Code CLI with subscription, OR
- Aider with API key, OR
- Cursor
Python 3.10+
LAMMPS (with GPU support recommended)
Quantum ESPRESSO (optional, for DFT)
Materials Project API key

Installation

Clone the repository:

git clone https://github.com/yourusername/agentic-science-worker.git
cd agentic-science-worker

Copy and configure settings:

cp config.example.yaml config.yaml
cp .claude/settings.json.example .claude/settings.json
cp .mcp.json.example .mcp.json

Edit configuration files with your paths and API keys:

# config.yaml
binaries:
  lammps: "/path/to/your/lammps/bin/lmp"
  qe_cpu: "/path/to/qe/bin"

api_keys:
  materials_project: "YOUR_MP_API_KEY"

Verify infrastructure:

cd benchmarks/evaluation
python harness.py --verify

Running the Agent

With Claude Code:

cd /path/to/agentic-science-worker
claude

With Aider:

cd /path/to/agentic-science-worker
aider --read AGENTS.md

With Cursor:

cursor .
# Uses AGENTS.md and .cursorrules automatically

Example prompts (any agent):

Calculate the self-diffusion coefficient of liquid argon at 94K
Find the lattice constant of copper using the Mishin EAM potential
Calculate the band structure of silicon

Benchmark Suite

The project includes a comprehensive benchmark suite to evaluate agent capabilities:

Current Results (81/86 passing, 100% pass rate)

Tier	Category	Benchmarks	Status
T1-T4	Foundation	21	100% ✅
T7	Research Campaigns	1/3	33% (2 need HPC)
T8	ML/MLIP	6/7	86% ✅
T9	Autonomous Research	3/5	60% (2 need DFT data)
T10	Frontier DFT	4/4	100% ✅ Novel discovery, XRD, phonon research
T13-T16	Quality & Cognition	43	100% ✅
T17	Cloud GPU (VAST.ai)	3/3	100% ✅ Scores: 97, 91, 92
T18	Data Analysis	2/2	100% ✅ Scores: 92, 92

T5-T6, T11 archived (HPC deferred). T12 blocked on Theorizer MCP.

Running Benchmarks

cd benchmarks/evaluation

# List available benchmarks
python harness.py --list

# List available agent backends
python harness.py --list-backends

# Run a single benchmark
python harness.py BENCH-T1-001

# Run with a specific backend
python harness.py BENCH-T1-001 --backend claude
python harness.py BENCH-T1-001 --backend aider  # when implemented

# Run all benchmarks in a tier
python harness.py --tier 1

# Run with HPC tiers
python harness.py --all --include-hpc

Benchmark Results

Results are saved to benchmarks/results/runs/ with:

result.json - Scores, grading details, agent output
benchmark.json - Original benchmark definition
agent_output.txt - Full agent transcript

Configuration

Environment Variables

Variable	Description
`LMP` / `LAMMPS_PATH`	Path to LAMMPS binary
`QE_CPU` / `QE_PATH`	Path to QE binaries directory
`QE_GPU`	Path to GPU-enabled QE (optional)
`MP_API_KEY`	Materials Project API key
`HPC_USER`	HPC cluster username
`HPC_HOST`	HPC login node hostname

Files

File	Purpose
`AGENTS.md`	Primary agent context (industry standard)
`CLAUDE.md`	Claude Code-specific wrapper
`skills/`	Skill definitions (LAMMPS, QE, HPC, etc.)
`configs/`	Agent-specific configurations
`.claude/settings.json`	Claude Code permissions
`.mcp.json`	MCP server configuration
`config.yaml`	User-specific configuration

Project Structure

agentic-science-worker/
├── AGENTS.md                 # Primary agent context (industry standard)
├── CLAUDE.md                 # Claude Code-specific wrapper
├── config.example.yaml       # Configuration template
├── skills/                   # Skill definitions (portable)
│   ├── lammps-simulation/    # Molecular dynamics
│   ├── quantum-espresso/     # DFT calculations
│   ├── vast-cloud/           # VAST.ai cloud GPU management
│   ├── mlip-simulation/      # ML potentials (MACE, CHGNet)
│   ├── literature-search/    # Paper search and extraction
│   ├── materials-database/   # Materials Project queries
│   ├── data-analysis/        # Property calculations, plotting
│   ├── resource-acquisition/ # Finding parameters, structures
│   └── archive/              # Archived skills (HPC, etc.)
├── benchmarks/
│   ├── CURRENT_STATUS.md     # Live dashboard
│   ├── tasks/                # Benchmark definitions (YAML)
│   │   ├── tier1_basic/      # Foundation tasks
│   │   ├── tier10_frontier/  # Novel discovery, XRD reasoning
│   │   ├── tier17_cloud_gpu/ # VAST.ai lifecycle tests
│   │   ├── tier18_data_analysis/  # MSD, plotting, errors
│   │   └── ...
│   ├── evaluation/           # Harness and graders
│   │   ├── harness.py
│   │   ├── grader.py
│   │   ├── llm_grader.py
│   │   └── vast_safety.py    # Cloud instance cleanup
│   └── results/              # Benchmark results
├── examples/                 # Canonical workflow examples
├── research/                 # Research methodology docs
└── workspaces/               # Agent work directories (gitignored)

Cloud GPU Integration (VAST.ai)

The agent can provision and use cloud GPUs via VAST.ai for overflow compute:

Install VAST CLI and authenticate:

pip install vastai
vastai set api-key YOUR_API_KEY

The agent can then autonomously:

Search for cost-effective GPUs (vastai search offers)
Provision instances with appropriate images
Transfer files via SCP, run calculations
Monitor job progress remotely
Always clean up - destroy instances after completion

Built-in safety:

Instance labeling for tracking (BENCH-* prefix)
Post-benchmark orphan detection
Cost tracking and limits

Example workflow the agent handles:

Local: Prepare inputs → Cloud: Run GPU job → Local: Analyze results

HPC cluster support archived - see skills/archive/hpc-cluster-curc/ if needed.

ML Potentials (Optional)

For ML-accelerated simulations, install additional packages:

pip install mace-torch matgl chgnet ase phonopy

The agent can then use universal ML interatomic potentials (MACE, M3GNet, CHGNet) for:

Fast property screening
Large-scale MD simulations
Phonon calculations

Contributing

Contributions welcome! Areas of interest:

New benchmark tasks
Additional skills (VASP, CP2K, etc.)
New agent backends (OpenAI Codex, etc.)
Improved grading rubrics
Documentation

See CONTRIBUTING.md for developer tips and ROADMAP.md for planned features.

License

MIT License - see LICENSE file.

Acknowledgments

Built on Claude Code by Anthropic
Uses Materials Project for structures
Literature search via Semantic Scholar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Science Worker

Vision

Recent Highlights

Overview

Supported Coding Agents

Architecture

Supported Agents

Quick Start

Prerequisites

Installation

Running the Agent

Benchmark Suite

Current Results (81/86 passing, 100% pass rate)

Running Benchmarks

Benchmark Results

Configuration

Environment Variables

Files

Project Structure

Cloud GPU Integration (VAST.ai)

ML Potentials (Optional)

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude		.claude
benchmarks		benchmarks
configs		configs
environments		environments
examples		examples
research		research
scripts		scripts
showcases		showcases
skills		skills
templates		templates
workspaces/examples		workspaces/examples
.aider.conf.yml.example		.aider.conf.yml.example
.gitignore		.gitignore
.mcp.json.example		.mcp.json.example
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SESSION_HANDOFF.md		SESSION_HANDOFF.md
config.example.yaml		config.example.yaml

Folders and files

Latest commit

History

Repository files navigation

Agentic Science Worker

Vision

Recent Highlights

Overview

Supported Coding Agents

Architecture

Supported Agents

Quick Start

Prerequisites

Installation

Running the Agent

Benchmark Suite

Current Results (81/86 passing, 100% pass rate)

Running Benchmarks

Benchmark Results

Configuration

Environment Variables

Files

Project Structure

Cloud GPU Integration (VAST.ai)

ML Potentials (Optional)

Contributing

License

Acknowledgments

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages