A multithreaded PostgreSQL database anonymizer that applies configurable transformation rules to sensitive data.
- Rule-based anonymization: Define which tables/columns to anonymize in
config.py - Multiple transformers: Email, Phone, Name, Hash, UUID, PGP Encrypt/Decrypt, Lorem Ipsum, Random String, and more
- Parallel processing: Configurable worker threads for fast anonymization
- Dry-run mode: Preview transformations without writing to database
- Batch processing: Configurable batch sizes for efficient memory usage
- PostgreSQL
- Python 3.8+
Create a local .env file from .env.example before running the anonymizer.
# Install dependencies
pip install -r requirements.txt
# Run with defaults
python main.py
# Dry-run (preview transformations)
python main.py --dry-run
# Anonymize specific table
python main.py --table chat_message
# Custom workers & batch size
python main.py --max-workers 8 --batch-size 2000Define anonymization rules in anonymizer/config.py:
ANONYMIZATION_RULES = {
"table_name": {
"column_name": {
"transformers": [
TransformerClass(params),
],
"condition": NotNull, # optional
}
}
}NameTransformer— Random nameEmailTransformer— Random emailPhoneTransformer— Random phoneHashTransformer— SHA256 hashUuidTransformer— UUID v4RandomStringTransformer— Random alphanumericLoremIpsumTransformer— Lorem ipsum textPgpEncryptTransformer— PGP encryptionPgpDecryptTransformer— PGP decryptionNullishTransformer— NULL valueHtmlWrapTransformer— HTML wrapperPassThroughTransformer— No change
dropzone/ is the working folder for SQL dump files and helper scripts used during local anonymization runs.
Use these script pairs depending on your shell:
restore_database.ps1/restore_database.sh— Restore an input dump into PostgreSQL before running anonymization.export_anonymized_dump.ps1/export_anonymized_dump.sh— Export the anonymized database back to a dump file.