Skip to content

engisalor/lmf

Repository files navigation

LMF: a CLI for generative language model experiments with LangChain

This repo is for designing, organizing and running LLM experiments with Python and LangChain (a language model framework, LMF). It has a modular structure for building just about any type of chatbot or generative LLM task supported by LangChain.

We use LMF for doing applied linguistics research. See our conference article for eLex 2025.

Introduction

The repo is managed with a Makefile and the UV python package. The Makefile has a few commands for defining dependencies, running tests, and getting started with an example project and configurations.

So far, running Ollama and HuggingFace (locally) and OpenAI (paid API) is implemented.

CLI basics

The main command lmf is available with a virtual environment .venv activated and the repo's dependencies installed.

See lmf --help for overall usage and lmf <command> --help for individual commands.

lmf has a few primary commands:

  • prepare to prepare final prompts: from no-frills system prompts to more advanced techniques like few-shot semantic similarity example selection with a separate embeddings model and vector store
  • query to send prompts to models: configurable to allow for multiple runs, hyperparameters, chat model types, and other options
  • clear to delete data generated by prepare and query

Design and stability

We designed LMF to conduct applied linguistics research, with all its specific needs and quirks. Hopefully it's easy to use and modify, but it should not be considered a stable dependency. Forking the repo and reviewing new commits would be prudents.

Understanding projects

Any LLM experiment/job/set of input data is referred to as a project. Projects are located in the project/ directory and generally require three files:

  • examples.yml with examples for compiling few-shot tasks (left empty if none)
  • inputs.yml with human prompt(s) to send to the LLM
  • system.yml with a system prompt

Projects are independent from other configurations to allow for easily swapping LLMs, changing configurations, and testing performance. Each project directory is a self-contained set of data.

Example project

Here is what the wizard-of-math project looks like. It's adapted from LangChain's dynamic example selector documentation.

This project is intended for use with a semantic similarity selector, with the goal of showing an LLM how to do math with the + symbol replaced with a bird emoji and to respond to a question about horses.

An embeddings model (separate from the chat model) is used to provide the best examples to each input. With the lmf prepare command, the final prompts are compiled, including the system prompt and dynamically selected examples for each input. Then lmf query is executed, sending final prompts to the desired chat model.

Initial project data

# examples.yml
- input: 2 🦜 2
  output: "4"
- input: 2 🦜 3
  output: "5"
- input: 2 🦜 4
  output: "6"
- input: What did the cow say to the moon?
  output: Nothing at all.
- input: Write me a poem about the moon.
  output: One for the moon, and one for me, who are we to talk about the moon?
- input: Tell me about horses.
  output: Horses are mammals.

# inputs.yml
- input: About horses...
- input: What's 3 🦜 3?

# system.yml
You are a wondrous wizard of math.

Task execution

To run a task, first download the required models. LMF's default embeddings model is from HuggingFace and the chat model is from Ollama.

ollama pull qwen3:1.7b
huggingface-cli download Qwen/Qwen3-Embedding-4B

The task can then be executed in a single line:

lmf -r 3 -p wizard-of-math -f temperature-0 prepare query --temperature 0.0

Configuration:

  • -r 3 defines how many runs (repeated executions) should be completed
  • -p wizard-of-math defines the current project directory
  • -f temperature-0 sets a filename prefix for the current command
  • prepare generates the final prompts
  • query --temperature 0.0 runs the task with the default model with a temperature of 0

Modified versions of the task using different models or other parameters can also be run. Outputs are saved to /project/wizard-of-math/output/. For example:

lmf -p wizard-of-math -f gemma3-temp0.5 prepare query --model gemma3:12b --temperature 0.5

Each run of each version of the executed task is saved separately: just make sure filenames are set to be unique, as existing files get overwritten.

To do a more systematic evaluation of how LLMs complete a task, run a series of commands, where each command tests one configuration. For example, these commands generate final prompts with lmf prepare using a number of LLMs. We can inspect the generated prompts to determine which embeddings model achieves the best dynamic example selector results.

lmf -p wizard-of-math clear
lmf -p wizard-of-math -f 1-qwen3-e-0.6B prepare
lmf -p wizard-of-math -f 2-nomic-embed-text prepare --embeddings Ollama --model nomic-embed-text:latest
lmf -p wizard-of-math -f 3-ollama-qwen3-1.7b prepare --embeddings Ollama --model qwen3:1.7b

Components and recipes

The example project gets us started and doesn't require writing any code or changing underlying components of LMF. For more in-depth modifications, run lmf COMMAND --help to see what can be defined by each command. A few components are available as-is, but adding new ones to the Python modules is straightforward.

For example, query accepts different chat model providers (Ollama, OpenAI), which must be set to access the models each provider has. Also, default outputs are unstructured (a typical chatbot conversation), but structured outputs can be set to return data as Python objects/JSON data. For example, the SemanticRelationTriple structured output could be used for entity-relation extraction tasks.

More likely, you'll need to define your own component classes. New components can be added to the respective Python module, such as schema.py for structured outputs. Append your own modifications above the line ### add new classes above this line ###, using the default classes as a reference, and your new component will automatically be available in the CLI, e.g., by executing lmf ... query --output-structure MyNewStructuredOutput.

query arguments and underlying components

Usage: lmf query [OPTIONS]

  Executes LLM final prompts with a model, model provider and output
  structure.

Options:
  -m, --model TEXT                Name of model (download models beforehand)
                                  [default: qwen3:1.7b]
  --chat-model CHAT_MODEL.PY      A chat model chat model class from chat.py
                                  [default: Ollama]
  --chat-model-param TEXT         A parameter to pass to the chat model in the
                                  format 'key=value'
  -o, --output-structure SCHEMA.PY
                                  A structured output class from schema.py
                                  [default: Unstructured]
  --sample INTEGER                Sample size (run first N prompts in a file;
                                  0 == all)  [default: 0]
  --random / --no-random          Toggle sample randomization  [default: no-
                                  random]
  --temperature FLOAT RANGE       Model temperature (0.0 = more deterministic
                                  / 1.0 = more variable)  [default: 0.0;
                                  0.0<=x<=1.0]
  --timeout INTEGER               Response timeout (for cloud providers)
                                  [default: 300]
  --max-tokens INTEGER            Model maximum tokens per response  [default:
                                  10000]
  --think / --no-think            Toggle model thinking  [default: no-think]
  --rate-limiter RATE_LIMITER.PY  A rate limiter class from rate_limiter.py
                                  [default: NoRateLimiter]
  --help                          Show this message and exit.

  RECIPES *case insensitive*
  Chat_models:
  - Ollama
  - OpenAI
  Output_structures:
  - Unstructured
  - UnstructuredThink
  - Hypernym
  - Entity
  - EntityList
  - SemanticRelationTriple
  - EntityRelationExtractor
  Rate_limiters:
  - NoRateLimiter
  - Memory

Environment variables

Setting environment variables may be necessary, like the example below.

# use huggingface offline
HF_HUB_OFFLINE=1
# pytorch settings
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# API keys for external providers
OPENAI_API_KEY=

Citing

Please cite this paper:

@inproceedings{
	address = {Bled, Slovenia},
	title = {Inductive {Categorization} for {Conceptual} {Analysis} with {LLMs}: {A} {Case} {Study} from the {Humanitarian} {Encyclopedia}},
	url = {https://elex.link/elex2025/wp-content/uploads/eLex2025-50-Isaacs_etal.pdf},
	booktitle = {Electronic lexicography in the 21st century ({eLex} 2023): {Intelligent} lexicography. {Proceedings} of the {eLex} 2025 conference},
	publisher = {Lexical Computing},
	author = {Isaacs, Loryn and Chambó, Santiago and León-Araúz, Pilar},
	editor = {Kosem, Iztok and Jakubíček, Miloš and Medveď, Marek and Zgaga, Karolina and Arhar Holdt, Špela and Munda, Tina and Salgado, Ana},
	year = {2025},
	pages = {866--887},
}

About

LMF-CLI: run LLM tasks with LangChain

Topics

Resources

License

Stars

Watchers

Forks

Contributors