microsoft · QianhuiWu · Oct 28, 2025 · Oct 27, 2025 · Oct 27, 2025
diff --git a/README.md b/README.md
@@ -48,6 +48,11 @@ LLMLingua-2, a small-size yet powerful prompt compression method trained via dat
 - [LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://aclanthology.org/2024.findings-acl.57/) (ACL 2024 Findings)<br>
   _Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_
 
+SecurityLingua is a safety guardrail model that uses the security-aware prompt compression to reveal the malicious intentions behind jailbreak attacks, enabling LLMs to detect attacks and generate safe responses. Due to the highly efficient prompt compression, the defense involves negligible overhead and 100x less token costs compared to state-of-the-art LLM guardrail approaches.
+
+- [SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression](https://openreview.net/forum?id=tybbSo6wba) (CoLM 2025)<br>
+  _Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang and Lili Qiu_
+
 ## 🎥 Overview
 
 ![Background](./images/LLMLingua_motivation.png)
@@ -133,6 +138,16 @@ If you find this repo helpful, please cite the following papers:
 }
 ```
 
+```bibtex
+@inproceedings{li2025securitylingua,
+  title={{S}ecurity{L}ingua: Efficient Defense of {LLM} Jailbreak Attacks via Security-Aware Prompt Compression},
+  author={Yucheng Li and Surin Ahn and Huiqiang Jiang and Amir H. Abdi and Yuqing Yang and Lili Qiu},
+  booktitle={Second Conference on Language Modeling},
+  year={2025},
+  url={https://openreview.net/forum?id=tybbSo6wba}
+}
+```
+
 ## 🎯 Quick Start
 
 #### 1. **Installing LLMLingua:**
@@ -205,6 +220,20 @@ llm_lingua = PromptCompressor(
 )
 ```
 
+To try **SecurityLingua** in your scenarios, you can use
+
+```python
+from llmlingua import PromptCompressor
+
+securitylingua = PromptCompressor(
+    model_name="SecurityLingua/securitylingua-xlm-s2s",
+    use_slingua=True
+)
+intention = securitylingua.compress_prompt(malicious_prompt)
+```
+
+For more details about SecurityLingua, please refer to [securitylingua readme](./experiments/securitylingua/readme.md).
+
 #### 3. **Advanced usage - Structured Prompt Compression:**
 
 Split text into sections, decide on whether to compress and its rate. Use `<llmlingua></llmlingua>` tags for context segmentation, with optional rate and compress parameters.

diff --git a/experiments/securitylingua/env_setup.sh b/experiments/securitylingua/env_setup.sh
@@ -0,0 +1,13 @@
+conda create -n llmlingua python=3.10 -y && conda activate llmlingua
+pip install -e .
+pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
+pip install accelerate wandb
+pip install openai==0.28
+
+pip install spacy
+python -m spacy download en_core_web_sm
+pip install scikit-learn
+pip install tensorboard
+pip install datasets hf_transfer
+
+unset WANDB_RUN_ID WANDB_RUN_GROUP WANDB_PROJECT WANDB_NOTES WANDB_NAME
diff --git a/experiments/securitylingua/filter.py b/experiments/securitylingua/filter.py
@@ -0,0 +1,102 @@
+# Copyright (c) 2023 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+
+import argparse
+from collections import defaultdict
+from typing import Dict, List, Tuple, DefaultDict
+import numpy as np
+import torch
+
+def parse_arguments() -> argparse.Namespace:
+    """Parse command line arguments"""
+    parser = argparse.ArgumentParser(description="Filter compressed prompts based on metrics.")
+    parser.add_argument(
+        "--load_path",
+        help="path to load data",
+        default="../../../results/meetingbank/gpt-4-32k_comp/annotation_cs512_meetingbank_train_formated.pt",
+    )
+    parser.add_argument(
+        "--save_path",
+        help="path to save filtered data",
+        default="../../../results/meetingbank/gpt-4-32k_comp/annotation_kept_cs512_meetingbank_train_formated.pt",
+    )
+    parser.add_argument(
+        "--percentile",
+        help="percentile threshold for filtering",
+        default=90,
+        type=int
+    )
+    return parser.parse_args()
+
+def filter_by_metric(
+    data: DefaultDict[str, List], 
+    metric_name: str, 
+    percentile: float
+) -> Tuple[DefaultDict[str, List], DefaultDict[str, List]]:
+    """
+    Filter data based on a specific metric and percentile threshold
+
+    Args:
+        data: Dictionary containing all data points and their metrics
+        metric_name: Name of the metric to filter by
+        percentile: Percentile threshold for filtering
+
+    Returns:
+        Tuple of (kept_data, filtered_data)
+    """
+    metric_list = data[metric_name]
+    threshold = np.percentile(metric_list, percentile)
+
+    kept = defaultdict(list)
+    filtered = defaultdict(list)
+
+    # List of all metrics to transfer
+    metrics = [
+        "labels", "origin", "comp", "retrieval", "comp_rate",
+        "variation_rate", "hitting_rate", "matching_rate", "alignment_gap"
+    ]
+
+    for values in zip(*(data[metric] for metric in metrics)):
+        # Create a dictionary of current values
+        current = dict(zip(metrics, values))
+
+        # Determine which container to use based on the metric threshold
+        target = filtered if current[metric_name] >= threshold else kept
+
+        # Add values to appropriate container
+        for metric, value in current.items():
+            target[metric].append(value)
+
+    return kept, filtered
+
+def main():
+    """Main function to run the filtering process"""
+    args = parse_arguments()
+
+    # Load data
+    res_pt = torch.load(args.load_path, weights_only=False)
+    print(f"Initial sample count: {len(res_pt['variation_rate'])}")
+
+    # First filtering stage: variation rate
+    kept, filtered = filter_by_metric(
+        data=res_pt,
+        metric_name="variation_rate",
+        percentile=args.percentile
+    )
+
+    # Second filtering stage: alignment gap
+    final_kept, additional_filtered = filter_by_metric(
+        data=kept,
+        metric_name="alignment_gap",
+        percentile=args.percentile
+    )
+
+    # Save filtered results
+    torch.save(final_kept, args.save_path)
+
+    # Print statistics
+    print(f"Samples after first filter: {len(kept['variation_rate'])}")
+    print(f"Final kept samples: {len(final_kept['variation_rate'])}")
+
+if __name__ == "__main__":
+    main()