Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ LLMLingua-2, a small-size yet powerful prompt compression method trained via dat
- [LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://aclanthology.org/2024.findings-acl.57/) (ACL 2024 Findings)<br>
_Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_

SecurityLingua is a safety guardrail model that uses the security-aware prompt compression to reveal the malicious intentions behind jailbreak attacks, enabling LLMs to detect attacks and generate safe responses. Due to the highly efficient prompt compression, the defense involves negligible overhead and 100x less token costs compared to state-of-the-art LLM guardrail approaches.

- [SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression](https://openreview.net/forum?id=tybbSo6wba) (CoLM 2025)<br>
_Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang and Lili Qiu_

## 🎥 Overview

![Background](./images/LLMLingua_motivation.png)
Expand Down Expand Up @@ -133,6 +138,16 @@ If you find this repo helpful, please cite the following papers:
}
```

```bibtex
@inproceedings{li2025securitylingua,
title={{S}ecurity{L}ingua: Efficient Defense of {LLM} Jailbreak Attacks via Security-Aware Prompt Compression},
author={Yucheng Li and Surin Ahn and Huiqiang Jiang and Amir H. Abdi and Yuqing Yang and Lili Qiu},
booktitle={Second Conference on Language Modeling},
year={2025},
url={https://openreview.net/forum?id=tybbSo6wba}
}
```

## 🎯 Quick Start

#### 1. **Installing LLMLingua:**
Expand Down Expand Up @@ -205,6 +220,20 @@ llm_lingua = PromptCompressor(
)
```

To try **SecurityLingua** in your scenarios, you can use

```python
from llmlingua import PromptCompressor

securitylingua = PromptCompressor(
model_name="SecurityLingua/securitylingua-xlm-s2s",
use_slingua=True
)
intention = securitylingua.compress_prompt(malicious_prompt)
```

For more details about SecurityLingua, please refer to [securitylingua readme](./experiments/securitylingua/readme.md).

#### 3. **Advanced usage - Structured Prompt Compression:**

Split text into sections, decide on whether to compress and its rate. Use `<llmlingua></llmlingua>` tags for context segmentation, with optional rate and compress parameters.
Expand Down
13 changes: 13 additions & 0 deletions experiments/securitylingua/env_setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
conda create -n llmlingua python=3.10 -y && conda activate llmlingua
pip install -e .
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install accelerate wandb
pip install openai==0.28

pip install spacy
python -m spacy download en_core_web_sm
pip install scikit-learn
pip install tensorboard
pip install datasets hf_transfer

unset WANDB_RUN_ID WANDB_RUN_GROUP WANDB_PROJECT WANDB_NOTES WANDB_NAME
102 changes: 102 additions & 0 deletions experiments/securitylingua/filter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Copyright (c) 2023 Microsoft
# Licensed under The MIT License [see LICENSE for details]

import argparse
from collections import defaultdict
from typing import Dict, List, Tuple, DefaultDict
import numpy as np
import torch

def parse_arguments() -> argparse.Namespace:
"""Parse command line arguments"""
parser = argparse.ArgumentParser(description="Filter compressed prompts based on metrics.")
parser.add_argument(
"--load_path",
help="path to load data",
default="../../../results/meetingbank/gpt-4-32k_comp/annotation_cs512_meetingbank_train_formated.pt",
)
parser.add_argument(
"--save_path",
help="path to save filtered data",
default="../../../results/meetingbank/gpt-4-32k_comp/annotation_kept_cs512_meetingbank_train_formated.pt",
)
parser.add_argument(
"--percentile",
help="percentile threshold for filtering",
default=90,
type=int
)
return parser.parse_args()

def filter_by_metric(
data: DefaultDict[str, List],
metric_name: str,
percentile: float
) -> Tuple[DefaultDict[str, List], DefaultDict[str, List]]:
"""
Filter data based on a specific metric and percentile threshold

Args:
data: Dictionary containing all data points and their metrics
metric_name: Name of the metric to filter by
percentile: Percentile threshold for filtering

Returns:
Tuple of (kept_data, filtered_data)
"""
metric_list = data[metric_name]
threshold = np.percentile(metric_list, percentile)

kept = defaultdict(list)
filtered = defaultdict(list)

# List of all metrics to transfer
metrics = [
"labels", "origin", "comp", "retrieval", "comp_rate",
"variation_rate", "hitting_rate", "matching_rate", "alignment_gap"
]

for values in zip(*(data[metric] for metric in metrics)):
# Create a dictionary of current values
current = dict(zip(metrics, values))

# Determine which container to use based on the metric threshold
target = filtered if current[metric_name] >= threshold else kept

# Add values to appropriate container
for metric, value in current.items():
target[metric].append(value)

return kept, filtered

def main():
"""Main function to run the filtering process"""
args = parse_arguments()

# Load data
res_pt = torch.load(args.load_path, weights_only=False)
print(f"Initial sample count: {len(res_pt['variation_rate'])}")

# First filtering stage: variation rate
kept, filtered = filter_by_metric(
data=res_pt,
metric_name="variation_rate",
percentile=args.percentile
)

# Second filtering stage: alignment gap
final_kept, additional_filtered = filter_by_metric(
data=kept,
metric_name="alignment_gap",
percentile=args.percentile
)

# Save filtered results
torch.save(final_kept, args.save_path)

# Print statistics
print(f"Samples after first filter: {len(kept['variation_rate'])}")
print(f"Final kept samples: {len(final_kept['variation_rate'])}")

if __name__ == "__main__":
main()
Loading
Loading