CoBRA/README.md at master · AISmithLab/CoBRA

Toward Precise and Consistent Agent Behaviors across Models Anchored by Validated Social Science Knowledge

🌐 Project Page: cobra.clawder.ai | 📄 Paper: arXiv 2509.13588

If you find CoBRA useful, please star ⭐ this repo to help others discover it!

Demo_Video.mp4

💡 What is Cognitive Bias?

Systematic deviations from rational judgment in human cognition and decision-making. For example, Framing Effect: "90% survival rate" vs. "10% mortality rate" — logically identical, yet people make different choices based on how information is framed.

Reproducibility and controllability are fundamental to scientific research. Yet implicit natural language descriptions — the dominant approach for specifying social agent behaviors in nearly all LLM-based social simulations — often fail to yield consistent behavior across models or capture the nuances of the descriptions.

CoBRA (Cognitive Bias Regulator for Social Agents) is a novel toolkit that lets researchers explicitly specify desired nuances in LLM-based agents and obtain consistent behavior across models.

Through CoBRA, we show how to operationalize validated social science knowledge as reusable "gym" environments for AI — an approach that generalizes to richer social and affective simulations.

The problem and our solution: from inconsistent agent behaviors under implicit specifications to explicit, quantitative control.

At the heart of CoBRA is a novel closed-loop system with two core components:

Cognitive Bias Index — measures the cognitive bias of a social agent by quantifying its reactions in validated classic social science experiments
Behavioral Regulation Engine — aligns the agent's behavior to exhibit controlled cognitive bias, via three control methods:
- Prompt Engineering (input space control)
- Representation Engineering (activation space control)
- Fine-tuning (parameter space control)

Example: A researcher specifies a target bias level → CoBRA measures it via classic experiments → iteratively adjusts the agent until it reliably exhibits the desired bias.

Quick Start (3 Steps)

# 1. Install dependencies
pip install -r requirements.txt

# 2. Navigate to the unified bias control module
cd examples/unified_bias

# 3. Run a bias experiment
python pipelines.py --bias authority --method repe-linear --model Mistral-7B

That's it. The system will measure and control the agent's Authority Effect bias.

Repository Structure

CoBRA/
├── control/                    # Core bias control engine
├── examples/
│   ├── unified_bias/           # Main entry point (START HERE)
│   │   ├── pipelines.py        # Unified experiment runner
│   │   ├── run_pipelines.py    # CLI interface
│   │   ├── ablation/           # Ablation studies
│   │   └── README.md           # Full usage guide
│   ├── authority/              # Authority Effect utils
│   ├── bandwagon/              # Bandwagon Effect utils
│   ├── confirmation/           # Confirmation Bias utils
│   └── framing/                # Framing Effect utils
├── generator/                  # Data generation utilities
├── data_generated/             # Generated experimental data
├── webdemo/                    # Web demonstration interface
└── requirements.txt            # Python dependencies

Key Components

Component	Description	Documentation
Cognitive Bias Index	Measures bias strength via classic experiments	`data/data_README.md`
Behavioral Regulation Engine	Three control methods (Prompt/RepE/Finetune)	`control/control_README.md`
Unified Pipeline	Run full experiments with one command	`examples/unified_bias/README.md`
Ablation Studies	Test model/persona/temperature sensitivity	`examples/unified_bias/ablation/README.md`
Data Generator	Create custom bias scenarios and responses	`generator/README.md`

Supported Biases & Experiments

Bias Type	Paradigms	Data Directory	Control Range
Authority Effect	Milgram Obedience, Stanford Prison	`data/authority/`	0-4 scale
Bandwagon Effect	Asch's Line, Hotel Towel	`data/bandwagon/`	0-4 scale
Confirmation Bias	Wason Selection, Biased Information	`data/confirmation/`	0-4 scale
Framing Effect	Asian Disease, Investment/Insurance	`data/framing/`	0-4 scale

Citation

If you use CoBRA in your research, please cite our paper:

@article{liu2025cobra,
  title={CoBRA: Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments},
  author={Liu, Xuan and Shang, Haoyang and Jin, Haojian},
  journal={arXiv preprint arXiv:2509.13588},
  year={2025}
}

Paper Link: https://arxiv.org/abs/2509.13588

License

MIT License - see LICENSE for details

Contact

For questions, please contact the corresponding author Xuan Liu at xul049@ucsd.edu, or file a GitHub Issue to report bugs and request features.

Need help? Check examples/unified_bias/README.md for detailed walkthroughs. The finetuning code is in the finetuning branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toward Precise and Consistent Agent Behaviors across Models Anchored by Validated Social Science Knowledge

Quick Start (3 Steps)

Repository Structure

Key Components

Supported Biases & Experiments

Citation

License

Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Toward Precise and Consistent Agent Behaviors across Models Anchored by Validated Social Science Knowledge

Quick Start (3 Steps)

Repository Structure

Key Components

Supported Biases & Experiments

Citation

License

Contact