TMC-Llama: Exploring Transition Metal Complexes with Large Language Models

📖 Introduction

TMC-Llama is a language model fine-tuned from Meta's open-source Llama3 (Llama-3.2-1b-Instruct) for generating transition metal complexes (TMCs) in SMILES notation. It uses TMC-SMILES (Rasmussen et al.), a format designed for RDKit-compatible metal–organic structures. Given target chemical properties in the supervised fine-tuning (SFT) prompts, TMC-Llama generates TMCs in desired chemical regions, supporting discovery and screening workflows.

The accompanying paper analyzes unparsable SMILES (see Notebook 2) and describes failure modes of generated TMCs. We link these failure modes to molecular features and use them to improve SFT protocols and post-generation corrections. These insights can support future tools for generating chemically valid TMCs.

🔍 How to Use

📕 Llama3 Environment

TMC-Llama inference requires PyTorch, Transformers, and RDKit:

PyTorch: torch. TMC-Llama uses CUDA (version 11.8) to run PyTorch.
Transformers: Huggingface transformers. You can set custom CACHE directories if needed.
RDKit: RDKit

Inference utilities are in libllama/, adapted from the SmileyLlama project. The virtual environment setup matches SmileyLlama.

📗 Running Jupyter Notebook Demonstrations

The notebooks rely on code in libTMC/ and libllama/. Make sure RDKit and the other prerequisites above are installed. libTMC/ provides Python utilities for:

Detecting transition metal centers
Extracting ligands
Fixing redundant dative bonds
Correcting improper valences and unclosed rings
Parse TMC-SMILES, redirect I/O streams, and identify errors

Example datasets and outputs (.csv files) are in the data/ directory. Inference notebook 4 generate SMILES strings in example text format (such as the example.txt in txt/). Cleaned TMC-SMILES (removing identical strings) and parsability errors are in E_*.csv and B_*.csv files in par/.

📘 Fine-Tuning TMC-Llama

TMC-Llama is built on SmileyLlama. Install axolotl following the Installation guide. The fine-tuning dataset and SFT prompts are available on FigShare.

📙 Inference

To run inference:

Download the trained models from FigShare
Follow the instructions in Notebook 4 (inference guideline)

📄 License

See the LICENSE file for details.

🙏 Acknowledgments

We thank all contributors who developed TMC-Llama and built this project. Related Llama3 applications for chemistry are available in SmileyLlama and SynLlama.

📝 Citation

If you use this code in your research, please cite:

@misc{tmc_llama_2025,
    title = {Exploring Transition Metal Complexes with Large Language Models},  
    url = {https://chemrxiv.org/engage/chemrxiv/article-details/69136d39a10c9f5ca1c14847},
    doi = {10.26434/chemrxiv-2025-hm3zb},
    publisher = {ChemRxiv},
    author = {Liu, Yunsheng and Cavanagh, Joseph and Sun, Kunyang and Toney, Jacob and Yuan, Chung-Yueh and Smith, Andrew and St Michel II, Roland and Graggs, Paul and Toste, F Dean and Kulik, Heather and Head-Gordon, Teresa},
    month = nov,
    year = {2025}}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
libTMC		libTMC
libllama		libllama
par		par
txt		txt
LICENSE		LICENSE
Notebook1_obtain_prop.ipynb		Notebook1_obtain_prop.ipynb
Notebook2_fix_unparsable_smiles.ipynb		Notebook2_fix_unparsable_smiles.ipynb
Notebook3_processing_smiles_and_errors.ipynb		Notebook3_processing_smiles_and_errors.ipynb
Notebook4_inference_demo.ipynb		Notebook4_inference_demo.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TMC-Llama: Exploring Transition Metal Complexes with Large Language Models

📖 Introduction

🔍 How to Use

📕 Llama3 Environment

📗 Running Jupyter Notebook Demonstrations

📘 Fine-Tuning TMC-Llama

📙 Inference

📄 License

🙏 Acknowledgments

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TMC-Llama: Exploring Transition Metal Complexes with Large Language Models

📖 Introduction

🔍 How to Use

📕 Llama3 Environment

📗 Running Jupyter Notebook Demonstrations

📘 Fine-Tuning TMC-Llama

📙 Inference

📄 License

🙏 Acknowledgments

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages