Skip to content

waltstephen/Cost-Effective-Communication

Repository files navigation

DALA — Dynamic Auction-based Language Agent

Reference implementation of Cost-Effective Communication: Auction-based Language Agent Interaction (Fan et al., 2025).

Multi-agent LLM systems tend to over-communicate: every agent talks every round, every message costs tokens, and the bill grows super-linearly with the number of agents. DALA reframes inter-agent communication as an economic resource-allocation problem:

  1. Each agent drafts a candidate message and submits a bid proportional to the estimated informational utility of that message.
  2. A central auctioneer runs a (first-price / second-price / softmax) auction and picks the winning speaker(s) for the round.
  3. Losing messages are suppressed — their completion tokens are never charged to the global token budget, only the prompt tokens needed to generate the bid. The winner pays their completion tokens.
  4. The loop repeats until the budget is exhausted or max_rounds elapses.

A small learnable head (one bid-aggressiveness coefficient α per agent) is tuned via a REINFORCE-style signal on the validation reward accuracy − λ · tokens/budget, matching the ablation in §4.3 of the paper.


📐 Repository layout

DALA/
├── src/dala/
│   ├── agents/        # bidder agent + auctioneer dialogue driver
│   │   ├── base.py
│   │   ├── bidder.py
│   │   └── auctioneer.py
│   ├── auction/       # mechanisms, budget tracker, utility estimators
│   │   ├── mechanism.py   # FirstPrice / SecondPrice / Softmax
│   │   ├── budget.py      # BudgetTracker + BudgetExhausted
│   │   └── utility.py     # Logprob / Entropy / Heuristic
│   ├── llm/           # pluggable backends
│   │   ├── base.py
│   │   ├── mock_backend.py        # deterministic, for CI / smoke runs
│   │   ├── openai_backend.py      # OpenAI chat-completions
│   │   ├── openrouter_backend.py  # OpenRouter (any upstream model)
│   │   ├── hf_backend.py          # transformers + apply_chat_template
│   │   └── registry.py
│   ├── data/          # Lightning DataModules for MMLU, GSM8K, HumanEval
│   ├── models/        # DALALightningModule
│   ├── metrics/       # accuracy + token-cost + cost-efficiency
│   ├── utils/         # seeding, logging, answer parsing
│   └── config.py      # YAML → DataModule + LightningModule
├── configs/           # default.yaml, mmlu.yaml, gsm8k.yaml, humaneval.yaml,
│                      # budget_sweep.yaml, openrouter.yaml
├── scripts/           # train.py, eval.py, run_experiment.py, budget_sweep.py
├── tests/             # 33 unit + end-to-end tests
└── experiments/       # smoke.sh

🚀 Quick start

pip install -e .[dev]
pytest -q                                                # 33 tests, all green
bash experiments/smoke.sh                                # 4-example smoke run
python scripts/run_experiment.py --config configs/default.yaml

Everything above runs fully offline on the built-in MockBackend — no API keys required. That's what makes CI deterministic and lets you iterate on the auction logic without spending a cent.

Train the bidding policy

python scripts/train.py --config configs/mmlu.yaml

This fits the per-agent α coefficients on the training split and reports val/accuracy, val/mean_tokens, and val/cost_efficiency.

Budget-vs-accuracy sweep (Figure 3)

python scripts/budget_sweep.py --config configs/budget_sweep.yaml \
                               --output outputs/budget_sweep.json

Runs DALA at budgets {256, 512, 1024, 1536, 2048} tokens/example and dumps the accuracy-cost curve to JSON.


🔌 LLM backends

Backend YAML key Install Auth
Mock mock (built-in)
OpenAI openai pip install dala[openai] OPENAI_API_KEY
OpenRouter openrouter pip install dala[openai] OPENROUTER_API_KEY
HuggingFace hf pip install dala[hf]

OpenRouter

OpenRouter is a single OpenAI-compatible endpoint that proxies dozens of upstream providers (Anthropic, OpenAI, Mistral, Together, Groq, DeepSeek, …). One API key, one line of YAML, any model:

model:
  backend: openrouter
  backend_kwargs:
    model: anthropic/claude-3.5-sonnet      # or openai/gpt-4o-mini, meta-llama/llama-3.1-70b-instruct, ...
    http_referer: https://github.com/waltstephen/Cost-Effective-Communication
    app_title: DALA
export OPENROUTER_API_KEY=sk-or-...
python scripts/run_experiment.py --config configs/openrouter.yaml

The backend gracefully falls back to a word-count token estimate when an upstream provider doesn't return usage metadata, so budget accounting stays honest.

Switching backends from code

from dala.llm import get_backend, list_backends

print(list_backends())            # ['hf', 'mock', 'openai', 'openrouter']
b = get_backend("openrouter", model="openai/gpt-4o-mini")

🧩 Configuration

Every experiment is a single YAML file composed of three sections:

data:
  name: mmlu                       # mmlu | gsm8k | humaneval
  max_examples: 64
  factory:
    use_hf: true                   # flip to false for offline fallback

model:
  n_agents: 3
  mechanism: second_price          # first_price | second_price | softmax
  max_rounds: 4
  top_k: 1                         # speakers per round
  budget: 2048                     # global token budget per example
  backend: mock                    # mock | openai | openrouter | hf
  specialties: [generalist, math, code]
  alpha_init: 1.0
  lr: 0.01

trainer:                           # forwarded to pl.Trainer
  max_epochs: 3
  accelerator: cpu

📊 Metrics

DALA tracks three things per run:

  • Accuracy — task-specific (MCQ letter match for MMLU, numeric match for GSM8K, return-presence heuristic for HumanEval smoke runs).
  • Token cost — total prompt+completion tokens charged to the BudgetTracker, broken down per agent.
  • Cost efficiency — the headline number: accuracy / (mean_tokens / 1000).

🧪 Testing

pytest -q                          # 33 tests, ~1s on CPU
pytest --cov=dala --cov-report=term-missing

Tests cover the auction mechanisms (winner selection, clearing price, reserve price, top-k, softmax determinism), the budget tracker (charging, exhaustion, reset), utility estimators, the mock backend's determinism, the answer parsers, each dataset's offline fallback, the backend registry, and an end-to-end auctioneer → Lightning-module smoke path.


📄 Citation

@article{fan2025dala,
  title   = {Cost-Effective Communication: Auction-based Language Agent Interaction},
  author  = {Fan, Yijia and Zhang, Jusheng and Cai, Kaitong and Yang, Jing
             and Tang, Chengpei and Wang, Jian and Wang, Keze},
  journal = {arXiv preprint arXiv:2511.13193},
  year    = {2025}
}

📜 License

MIT — see LICENSE.

About

Offical implementation of Cost-Effective Communication: Auction-based Language Agent Interaction (Fan et al., 2025).

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors