This repo is for designing, organizing and running LLM experiments with Python and LangChain (a language model framework, LMF). It has a modular structure for building just about any type of chatbot or generative LLM task supported by LangChain.
We use LMF for doing applied linguistics research. See our conference article for eLex 2025.
The repo is managed with a Makefile and the UV python package. The Makefile has a few commands for defining dependencies, running tests, and getting started with an example project and configurations.
So far, running Ollama and HuggingFace (locally) and OpenAI (paid API) is implemented.
The main command lmf is available with a virtual environment .venv activated and the repo's dependencies installed.
See lmf --help for overall usage and lmf <command> --help for individual commands.
lmf has a few primary commands:
prepareto prepare final prompts: from no-frills system prompts to more advanced techniques like few-shot semantic similarity example selection with a separate embeddings model and vector storequeryto send prompts to models: configurable to allow for multiple runs, hyperparameters, chat model types, and other optionsclearto delete data generated byprepareandquery
We designed LMF to conduct applied linguistics research, with all its specific needs and quirks. Hopefully it's easy to use and modify, but it should not be considered a stable dependency. Forking the repo and reviewing new commits would be prudents.
Any LLM experiment/job/set of input data is referred to as a project. Projects are located in the project/ directory and generally require three files:
examples.ymlwith examples for compiling few-shot tasks (left empty if none)inputs.ymlwith human prompt(s) to send to the LLMsystem.ymlwith a system prompt
Projects are independent from other configurations to allow for easily swapping LLMs, changing configurations, and testing performance. Each project directory is a self-contained set of data.
Here is what the wizard-of-math project looks like. It's adapted from LangChain's dynamic example selector documentation.
This project is intended for use with a semantic similarity selector, with the goal of showing an LLM how to do math with the + symbol replaced with a bird emoji and to respond to a question about horses.
An embeddings model (separate from the chat model) is used to provide the best examples to each input. With the lmf prepare command, the final prompts are compiled, including the system prompt and dynamically selected examples for each input. Then lmf query is executed, sending final prompts to the desired chat model.
# examples.yml
- input: 2 🦜 2
output: "4"
- input: 2 🦜 3
output: "5"
- input: 2 🦜 4
output: "6"
- input: What did the cow say to the moon?
output: Nothing at all.
- input: Write me a poem about the moon.
output: One for the moon, and one for me, who are we to talk about the moon?
- input: Tell me about horses.
output: Horses are mammals.
# inputs.yml
- input: About horses...
- input: What's 3 🦜 3?
# system.yml
You are a wondrous wizard of math.To run a task, first download the required models. LMF's default embeddings model is from HuggingFace and the chat model is from Ollama.
ollama pull qwen3:1.7b
huggingface-cli download Qwen/Qwen3-Embedding-4BThe task can then be executed in a single line:
lmf -r 3 -p wizard-of-math -f temperature-0 prepare query --temperature 0.0Configuration:
-r 3defines how many runs (repeated executions) should be completed-p wizard-of-mathdefines the current project directory-f temperature-0sets a filename prefix for the current commandpreparegenerates the final promptsquery --temperature 0.0runs the task with the default model with a temperature of 0
Modified versions of the task using different models or other parameters can also be run. Outputs are saved to /project/wizard-of-math/output/. For example:
lmf -p wizard-of-math -f gemma3-temp0.5 prepare query --model gemma3:12b --temperature 0.5Each run of each version of the executed task is saved separately: just make sure filenames are set to be unique, as existing files get overwritten.
To do a more systematic evaluation of how LLMs complete a task, run a series of commands, where each command tests one configuration. For example, these commands generate final prompts with lmf prepare using a number of LLMs. We can inspect the generated prompts to determine which embeddings model achieves the best dynamic example selector results.
lmf -p wizard-of-math clear
lmf -p wizard-of-math -f 1-qwen3-e-0.6B prepare
lmf -p wizard-of-math -f 2-nomic-embed-text prepare --embeddings Ollama --model nomic-embed-text:latest
lmf -p wizard-of-math -f 3-ollama-qwen3-1.7b prepare --embeddings Ollama --model qwen3:1.7bThe example project gets us started and doesn't require writing any code or changing underlying components of LMF. For more in-depth modifications, run lmf COMMAND --help to see what can be defined by each command. A few components are available as-is, but adding new ones to the Python modules is straightforward.
For example, query accepts different chat model providers (Ollama, OpenAI), which must be set to access the models each provider has. Also, default outputs are unstructured (a typical chatbot conversation), but structured outputs can be set to return data as Python objects/JSON data. For example, the SemanticRelationTriple structured output could be used for entity-relation extraction tasks.
More likely, you'll need to define your own component classes. New components can be added to the respective Python module, such as schema.py for structured outputs. Append your own modifications above the line ### add new classes above this line ###, using the default classes as a reference, and your new component will automatically be available in the CLI, e.g., by executing lmf ... query --output-structure MyNewStructuredOutput.
Usage: lmf query [OPTIONS]
Executes LLM final prompts with a model, model provider and output
structure.
Options:
-m, --model TEXT Name of model (download models beforehand)
[default: qwen3:1.7b]
--chat-model CHAT_MODEL.PY A chat model chat model class from chat.py
[default: Ollama]
--chat-model-param TEXT A parameter to pass to the chat model in the
format 'key=value'
-o, --output-structure SCHEMA.PY
A structured output class from schema.py
[default: Unstructured]
--sample INTEGER Sample size (run first N prompts in a file;
0 == all) [default: 0]
--random / --no-random Toggle sample randomization [default: no-
random]
--temperature FLOAT RANGE Model temperature (0.0 = more deterministic
/ 1.0 = more variable) [default: 0.0;
0.0<=x<=1.0]
--timeout INTEGER Response timeout (for cloud providers)
[default: 300]
--max-tokens INTEGER Model maximum tokens per response [default:
10000]
--think / --no-think Toggle model thinking [default: no-think]
--rate-limiter RATE_LIMITER.PY A rate limiter class from rate_limiter.py
[default: NoRateLimiter]
--help Show this message and exit.
RECIPES *case insensitive*
Chat_models:
- Ollama
- OpenAI
Output_structures:
- Unstructured
- UnstructuredThink
- Hypernym
- Entity
- EntityList
- SemanticRelationTriple
- EntityRelationExtractor
Rate_limiters:
- NoRateLimiter
- MemorySetting environment variables may be necessary, like the example below.
# use huggingface offline
HF_HUB_OFFLINE=1
# pytorch settings
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# API keys for external providers
OPENAI_API_KEY=Please cite this paper:
@inproceedings{
address = {Bled, Slovenia},
title = {Inductive {Categorization} for {Conceptual} {Analysis} with {LLMs}: {A} {Case} {Study} from the {Humanitarian} {Encyclopedia}},
url = {https://elex.link/elex2025/wp-content/uploads/eLex2025-50-Isaacs_etal.pdf},
booktitle = {Electronic lexicography in the 21st century ({eLex} 2023): {Intelligent} lexicography. {Proceedings} of the {eLex} 2025 conference},
publisher = {Lexical Computing},
author = {Isaacs, Loryn and Chambó, Santiago and León-Araúz, Pilar},
editor = {Kosem, Iztok and Jakubíček, Miloš and Medveď, Marek and Zgaga, Karolina and Arhar Holdt, Špela and Munda, Tina and Salgado, Ana},
year = {2025},
pages = {866--887},
}