Skip to content

waldeck-dev/db-anonymizer

Repository files navigation

db-anonymizer

A multithreaded PostgreSQL database anonymizer that applies configurable transformation rules to sensitive data.

Features

  • Rule-based anonymization: Define which tables/columns to anonymize in config.py
  • Multiple transformers: Email, Phone, Name, Hash, UUID, PGP Encrypt/Decrypt, Lorem Ipsum, Random String, and more
  • Parallel processing: Configurable worker threads for fast anonymization
  • Dry-run mode: Preview transformations without writing to database
  • Batch processing: Configurable batch sizes for efficient memory usage

Quick Start

Prerequisites

  • PostgreSQL
  • Python 3.8+

Create a local .env file from .env.example before running the anonymizer.

Usage

# Install dependencies
pip install -r requirements.txt

# Run with defaults
python main.py

# Dry-run (preview transformations)
python main.py --dry-run

# Anonymize specific table
python main.py --table chat_message

# Custom workers & batch size
python main.py --max-workers 8 --batch-size 2000

Configuration

Define anonymization rules in anonymizer/config.py:

ANONYMIZATION_RULES = {
    "table_name": {
        "column_name": {
            "transformers": [
                TransformerClass(params),
            ],
            "condition": NotNull,  # optional
        }
    }
}

Transformers

  • NameTransformer — Random name
  • EmailTransformer — Random email
  • PhoneTransformer — Random phone
  • HashTransformer — SHA256 hash
  • UuidTransformer — UUID v4
  • RandomStringTransformer — Random alphanumeric
  • LoremIpsumTransformer — Lorem ipsum text
  • PgpEncryptTransformer — PGP encryption
  • PgpDecryptTransformer — PGP decryption
  • NullishTransformer — NULL value
  • HtmlWrapTransformer — HTML wrapper
  • PassThroughTransformer — No change

Dropzone

dropzone/ is the working folder for SQL dump files and helper scripts used during local anonymization runs.

Use these script pairs depending on your shell:

  • restore_database.ps1 / restore_database.sh — Restore an input dump into PostgreSQL before running anonymization.
  • export_anonymized_dump.ps1 / export_anonymized_dump.sh — Export the anonymized database back to a dump file.

About

A multithreaded PostgreSQL database anonymizer that applies configurable transformation rules to sensitive data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors