Skip to content

Latest commit

 

History

History
76 lines (52 loc) · 2.45 KB

File metadata and controls

76 lines (52 loc) · 2.45 KB

InteractiveBench

The official repository for the paper Interactive Benchmarks (https://huggingface.co/papers/2603.04737).

Repository Overview

  • src/situation_puzzle/: Situation-based reasoning.
  • src/math/: Interactive math evaluation pipeline: naive solving vs. Interactive-Proof-style solving, with pass@k evaluation as a comparison baseline.
  • src/trust_game/: Trust Game tournament (baseline + LLM agents).

Quick Start

Requirements

  • Python 3.10+
  • A valid model endpoint is required (most scripts in this repository default to using the OpenRouter OpenAI-compatible API).

Unified Environment Variables (Recommended)

Most scripts read the following environment variables (you may define them in a .env file inside each subdirectory, or export them directly):

  • OPENROUTER_API_KEY: Required
  • OPENROUTER_BASE_URL: Optional (default: https://openrouter.ai/api/v1)

Example:

export OPENROUTER_API_KEY="sk-..."
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"

Installing Dependencies

pip install -r requirements.txt

Note: Different tasks require only subsets of dependencies. Please refer to each subdirectory’s README for details.

Directory Structure

InteractiveBench/
  README.md
  LICENSE
  src/
    trust_game/
    situation_puzzle/
    math/
    poker/

Results and Reproducibility

  • Result Outputs: Most scripts write results to a results/ directory (or a specified output path) within their respective folders, and include reproducibility metadata whenever possible (e.g., model name, hyperparameters).
  • Resume Support: Most scripts support resume functionality (i.e., skipping completed samples/matches if output files already exist). See each subdirectory’s README for specifics.

Contributing

  • Contribution guidelines are provided in CONTRIBUTING.md (including requirements for adding new benchmark subdirectories, result formats, README standards, etc.).

Citation / License

  • License: MIT (see LICENSE)
  • If you find our work useful in your research, please consider citing our paper!
@article{yue2026interactive,
  title={Interactive Benchmarks},
  author={Yue, Baoqing and Zhu, Zihan and Yutong Han and Qian Sun and Feng, Jichen and Yang, Hufei and Zhang, Yifan and Wang, Mengdi},
  journal={arXiv preprint arXiv:2603.04737},
  year={2026}
}