Official implementation for the paper "LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction".
|
| 中文
LLM-CAS is a framework that formulates real-time hallucination correction as a hierarchical reinforcement learning (HRL) problem. Unlike static editing methods that permanently modify weights, LLM-CAS trains an agent to learn a sophisticated policy that dynamically selects optimal, temporary neuron perturbations during inference based on the immediate context. This approach allows the model to correct errant outputs without inflicting permanent damage on the model's integrity or general capabilities.
The framework utilizes a "locate-then-edit" paradigm but applies it dynamically. It identifies critical activation patterns using neuron-level causal tracing and applies perturbations only when necessary.
We recommend using Conda to manage the environment.
# Create conda environment
conda create -n llm-cas python=3.10
conda activate llm-cas
# Install dependencies
pip install -r requirements.txtThe following steps outline the standard pipeline for running LLM-CAS, from neuron localization to dynamic inference.
First, identify the causally relevant neurons for the target functional network at a specific percentage. This step generates a mask file that will be used for perturbation.
# Example: Localize top 0.3% of multiple-demand neurons
python localize.py \
--model-name meta-llama/Llama-2-7b-chat-hf \
--percentage 0.3 \
--network multiple-demand \
--localize-range 100-100 \
--pooling last-token
# Move and rename the generated mask for the next steps
mkdir -p exp
cp "cache/Llama-2-7b-chat-hf_network=multiple-demand_pooling=last-token_range=100-100_perc=0.3_nunits=None_pretrained=True.npy" \
"exp/md_0_3.npy"Generate the interaction trajectories and evaluation metrics. This step utilizes a Judge Model to assess the target model's performance on a specific dataset using the masks generated in Step 1.
# Create output directory
mkdir -p "exp/results_0_3"
# Run path generation
cd exp
python get_path.py \
--judge_model 'Qwen' \
--dataset_name storycloze \
--target_model "meta-llama/Llama-2-7b-chat-hf" \
--language_mask_path ./lng.npy \
--multidemand_mask_path "./md_0_3.npy" \
--output_path "results_0_3/output.json" \
--max_timesteps 7200
cd ..Execute the final inference using the trained PPO agent. This script applies the dynamic masking strategy to generate the final submission results and logs.
cd exp
python3 cogsearch.py \
--model "meta-llama/Llama-2-7b-chat-hf" \
--generate_submission \
--ppo_model_path results_0_3/adaptive_mask_ppo_agent_final.pth \
--language_mask_path lng.npy \
--multidemand_mask_path md_0_3.npy \
--output_dir results_0_3 \
> "../output_0_3.log"
cd ..If you use this code or findings in your research, please cite our paper:
@misc{zhang2025llmcasdynamicneuronperturbation,
title={LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction},
author={Jusheng Zhang and Ningyuan Liu and Yijia Fan and Zihao Huang and Qinglin Zeng and Kaitong Cai and Jian Wang and Keze Wang},
year={2025},
eprint={2512.18623},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.18623},
}