TMC-Llama is a language model fine-tuned from Meta's open-source Llama3 (Llama-3.2-1b-Instruct) for generating transition metal complexes (TMCs) in SMILES notation. It uses TMC-SMILES (Rasmussen et al.), a format designed for RDKit-compatible metal–organic structures. Given target chemical properties in the supervised fine-tuning (SFT) prompts, TMC-Llama generates TMCs in desired chemical regions, supporting discovery and screening workflows.
The accompanying paper analyzes unparsable SMILES (see Notebook 2) and describes failure modes of generated TMCs. We link these failure modes to molecular features and use them to improve SFT protocols and post-generation corrections. These insights can support future tools for generating chemically valid TMCs.
TMC-Llama inference requires PyTorch, Transformers, and RDKit:
- PyTorch: torch. TMC-Llama uses CUDA (version 11.8) to run PyTorch.
- Transformers: Huggingface transformers. You can set custom
CACHEdirectories if needed. - RDKit: RDKit
Inference utilities are in libllama/, adapted from the SmileyLlama project. The virtual environment setup matches SmileyLlama.
The notebooks rely on code in libTMC/ and libllama/. Make sure RDKit and the other prerequisites above are installed. libTMC/ provides Python utilities for:
- Detecting transition metal centers
- Extracting ligands
- Fixing redundant dative bonds
- Correcting improper valences and unclosed rings
- Parse TMC-SMILES, redirect I/O streams, and identify errors
Example datasets and outputs (.csv files) are in the data/ directory.
Inference notebook 4 generate SMILES strings in example text format (such as the example.txt in txt/). Cleaned TMC-SMILES (removing identical strings) and parsability errors are in E_*.csv and B_*.csv files in par/.
TMC-Llama is built on SmileyLlama. Install axolotl following the Installation guide. The fine-tuning dataset and SFT prompts are available on FigShare.
To run inference:
- Download the trained models from FigShare
- Follow the instructions in Notebook 4 (inference guideline)
See the LICENSE file for details.
We thank all contributors who developed TMC-Llama and built this project. Related Llama3 applications for chemistry are available in SmileyLlama and SynLlama.
If you use this code in your research, please cite:
@misc{tmc_llama_2025,
title = {Exploring Transition Metal Complexes with Large Language Models},
url = {https://chemrxiv.org/engage/chemrxiv/article-details/69136d39a10c9f5ca1c14847},
doi = {10.26434/chemrxiv-2025-hm3zb},
publisher = {ChemRxiv},
author = {Liu, Yunsheng and Cavanagh, Joseph and Sun, Kunyang and Toney, Jacob and Yuan, Chung-Yueh and Smith, Andrew and St Michel II, Roland and Graggs, Paul and Toste, F Dean and Kulik, Heather and Head-Gordon, Teresa},
month = nov,
year = {2025}}