ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention

Xinyan Wang¹, Xiaogeng Liu², Chaowei Xiao²

¹University of Wisconsin-Madison ²Johns Hopkins University

Abstract

Large Reasoning Models (LRMs) often reach a correct solution before their long Chain-of-Thought trace ends, yet continue with redundant verification, repeated attempts, or unnecessary exploration that wastes computation and can even overturn the correct answer. We frame this behavior as a latent productive-to-redundant transition and show that it is directly reflected in hidden states: around first-correct-solution (FCS) boundaries, late-layer representations separate efficient from overthinking tokens, while boundary-permutation and position-control baselines collapse. Based on this signal, we propose ROM, a model-agnostic streaming intervention framework that monitors frozen LRMs with a lightweight hidden-state detector and intervenes at well-formed reasoning boundaries. Counterfactual Self-Correction (CSC) augments supervision with balanced wrong→correct trajectories, preserving useful pre-FCS correction while labeling only post-FCS continuation as redundant. Across MATH500, GSM8K, AIME25, and MMLU-Pro, ROM improves the overall tradeoff on both Qwen3-8B and DeepSeek-R1-Distill-Qwen-32B (DS-32B): on Qwen3-8B, it raises accuracy from 74.47% to 74.78% and reduces response length from 4262 to 3107 tokens; on DS-32B, it raises accuracy from 68.60% to 68.72% and reduces response length from 3062 to 2319 tokens. The same FCS-derived supervision transfers across scale and training origin, suggesting a shared long-CoT boundary rather than a backbone-specific artifact. ROM is compatible with L1, removing another 20.9–21.6% tokens at zero accuracy loss. ROM also generalizes to open-ended MMLU-Pro (+1.56 pp, 35.4% shorter) and reduces wall-clock latency by 46.5%.

Project Structure

ROM/
├── rom/                        # Core package
│   ├── models.py               # StreamingHead, Qwen3WithHead
│   ├── dataset.py              # Dataset loading & embedding cache
│   ├── train.py                # Training pipeline
│   ├── eval.py                 # Offline evaluation (vLLM)
│   ├── env.py                  # Environment setup
│   └── utils/
│       ├── math.py             # Answer extraction & correctness checking
│       └── eval_helpers.py     # Metrics, probability computation
├── configs/
│   ├── train.yaml              # Training defaults
│   └── eval.yaml               # Evaluation defaults
├── requirements.txt
├── LICENSE
└── README.md

Quick Start

Installation

pip install -r requirements.txt

Requires Python 3.11+, PyTorch >= 2.9.0, and a CUDA-capable GPU.

Data

Training data is hosted on HuggingFace: xinyan-wang/ROM.

Download and place under data/:

# Using huggingface-cli
huggingface-cli download xinyan-wang/ROM --repo-type dataset --local-dir data

Training

All parameters are in configs/train.yaml. Run with defaults:

python -m rom.train

Override via CLI:

python -m rom.train --lr 1e-4 --num_train_epochs 30

W&B logging is enabled by default. Disable with --no_wandb.

Evaluation

We evaluate on MATH500, GSM8K, AIME25, and MMLU-Pro, served via vLLM 0.11 on a single A100 (80 GB) at temperature 0.6, top-p 0.95, top-k 20, n=3, seed 46.

python -m rom.eval

Override as needed:

python -m rom.eval --ckpt_path checkpoints/my_model.pt --test_data data/test_data/math500.jsonl

Citation

If you find ROM useful, please cite our paper 📝 and give us a ⭐!

@misc{wang2026romrealtimeoverthinkingmitigation,
      title={ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention}, 
      author={Xinyan Wang and Xiaogeng Liu and Chaowei Xiao},
      year={2026},
      eprint={2603.22016},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.22016}, 
}

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
configs		configs
rom		rom
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention

Abstract

Project Structure

Quick Start

Installation

Data

Training

Evaluation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention

Abstract

Project Structure

Quick Start

Installation

Data

Training

Evaluation

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages