Reference implementation of Cost-Effective Communication: Auction-based Language Agent Interaction (Fan et al., 2025).
Multi-agent LLM systems tend to over-communicate: every agent talks every round, every message costs tokens, and the bill grows super-linearly with the number of agents. DALA reframes inter-agent communication as an economic resource-allocation problem:
- Each agent drafts a candidate message and submits a bid proportional to the estimated informational utility of that message.
- A central auctioneer runs a (first-price / second-price / softmax) auction and picks the winning speaker(s) for the round.
- Losing messages are suppressed — their completion tokens are never charged to the global token budget, only the prompt tokens needed to generate the bid. The winner pays their completion tokens.
- The loop repeats until the budget is exhausted or
max_roundselapses.
A small learnable head (one bid-aggressiveness coefficient α per agent) is
tuned via a REINFORCE-style signal on the validation reward
accuracy − λ · tokens/budget, matching the ablation in §4.3 of the paper.
DALA/
├── src/dala/
│ ├── agents/ # bidder agent + auctioneer dialogue driver
│ │ ├── base.py
│ │ ├── bidder.py
│ │ └── auctioneer.py
│ ├── auction/ # mechanisms, budget tracker, utility estimators
│ │ ├── mechanism.py # FirstPrice / SecondPrice / Softmax
│ │ ├── budget.py # BudgetTracker + BudgetExhausted
│ │ └── utility.py # Logprob / Entropy / Heuristic
│ ├── llm/ # pluggable backends
│ │ ├── base.py
│ │ ├── mock_backend.py # deterministic, for CI / smoke runs
│ │ ├── openai_backend.py # OpenAI chat-completions
│ │ ├── openrouter_backend.py # OpenRouter (any upstream model)
│ │ ├── hf_backend.py # transformers + apply_chat_template
│ │ └── registry.py
│ ├── data/ # Lightning DataModules for MMLU, GSM8K, HumanEval
│ ├── models/ # DALALightningModule
│ ├── metrics/ # accuracy + token-cost + cost-efficiency
│ ├── utils/ # seeding, logging, answer parsing
│ └── config.py # YAML → DataModule + LightningModule
├── configs/ # default.yaml, mmlu.yaml, gsm8k.yaml, humaneval.yaml,
│ # budget_sweep.yaml, openrouter.yaml
├── scripts/ # train.py, eval.py, run_experiment.py, budget_sweep.py
├── tests/ # 33 unit + end-to-end tests
└── experiments/ # smoke.sh
pip install -e .[dev]
pytest -q # 33 tests, all green
bash experiments/smoke.sh # 4-example smoke run
python scripts/run_experiment.py --config configs/default.yamlEverything above runs fully offline on the built-in MockBackend — no
API keys required. That's what makes CI deterministic and lets you iterate
on the auction logic without spending a cent.
python scripts/train.py --config configs/mmlu.yamlThis fits the per-agent α coefficients on the training split and reports
val/accuracy, val/mean_tokens, and val/cost_efficiency.
python scripts/budget_sweep.py --config configs/budget_sweep.yaml \
--output outputs/budget_sweep.jsonRuns DALA at budgets {256, 512, 1024, 1536, 2048} tokens/example and dumps
the accuracy-cost curve to JSON.
| Backend | YAML key | Install | Auth |
|---|---|---|---|
| Mock | mock |
(built-in) | — |
| OpenAI | openai |
pip install dala[openai] |
OPENAI_API_KEY |
| OpenRouter | openrouter |
pip install dala[openai] |
OPENROUTER_API_KEY |
| HuggingFace | hf |
pip install dala[hf] |
— |
OpenRouter is a single OpenAI-compatible endpoint that proxies dozens of upstream providers (Anthropic, OpenAI, Mistral, Together, Groq, DeepSeek, …). One API key, one line of YAML, any model:
model:
backend: openrouter
backend_kwargs:
model: anthropic/claude-3.5-sonnet # or openai/gpt-4o-mini, meta-llama/llama-3.1-70b-instruct, ...
http_referer: https://github.com/waltstephen/Cost-Effective-Communication
app_title: DALAexport OPENROUTER_API_KEY=sk-or-...
python scripts/run_experiment.py --config configs/openrouter.yamlThe backend gracefully falls back to a word-count token estimate when an upstream provider doesn't return usage metadata, so budget accounting stays honest.
from dala.llm import get_backend, list_backends
print(list_backends()) # ['hf', 'mock', 'openai', 'openrouter']
b = get_backend("openrouter", model="openai/gpt-4o-mini")Every experiment is a single YAML file composed of three sections:
data:
name: mmlu # mmlu | gsm8k | humaneval
max_examples: 64
factory:
use_hf: true # flip to false for offline fallback
model:
n_agents: 3
mechanism: second_price # first_price | second_price | softmax
max_rounds: 4
top_k: 1 # speakers per round
budget: 2048 # global token budget per example
backend: mock # mock | openai | openrouter | hf
specialties: [generalist, math, code]
alpha_init: 1.0
lr: 0.01
trainer: # forwarded to pl.Trainer
max_epochs: 3
accelerator: cpuDALA tracks three things per run:
- Accuracy — task-specific (MCQ letter match for MMLU, numeric match
for GSM8K,
return-presence heuristic for HumanEval smoke runs). - Token cost — total prompt+completion tokens charged to the
BudgetTracker, broken down per agent. - Cost efficiency — the headline number:
accuracy / (mean_tokens / 1000).
pytest -q # 33 tests, ~1s on CPU
pytest --cov=dala --cov-report=term-missingTests cover the auction mechanisms (winner selection, clearing price, reserve price, top-k, softmax determinism), the budget tracker (charging, exhaustion, reset), utility estimators, the mock backend's determinism, the answer parsers, each dataset's offline fallback, the backend registry, and an end-to-end auctioneer → Lightning-module smoke path.
@article{fan2025dala,
title = {Cost-Effective Communication: Auction-based Language Agent Interaction},
author = {Fan, Yijia and Zhang, Jusheng and Cai, Kaitong and Yang, Jing
and Tang, Chengpei and Wang, Jian and Wang, Keze},
journal = {arXiv preprint arXiv:2511.13193},
year = {2025}
}MIT — see LICENSE.