A CLI tool that prevents dirty lead data from entering your CRM.
Part of the Lead Entry Guard ecosystem.
Lead CSV Sanitizer is a local-first Python tool for cleaning, validating, and preparing lead CSV files before CRM import.
It is designed for RevOps, Sales Ops, CRM admins, and automation builders who want safer, cleaner, and more reliable lead imports.
Lead spreadsheets exported from marketing tools, scraping tools, or manual lists often contain:
-
invalid emails
-
broken phone numbers
-
inconsistent column names
-
duplicate contacts
-
missing required fields
Importing these directly into CRM systems often results in:
-
failed imports
-
duplicate contacts
-
broken reporting
-
messy CRM data
Lead CSV Sanitizer creates a deterministic pipeline that:
-
Cleans the dataset
-
Validates key fields
-
Detects duplicates
-
Flags data issues
-
Generates CRM-ready output
All locally, with no external API calls.
-
✔ Column normalization
-
✔ Whitespace cleanup
-
✔ Email validation
-
✔ Phone normalization (E.164 format)
-
✔ Duplicate detection
-
✔ Issue flags per row
-
✔ Missing field audit
-
✔ CRM-ready export
-
✔ JSON summary report
-
✔ Data health score
-
✔ Dataset size guard
-
✔ Processing time metrics
-
✔ Local-first processing (privacy friendly)
The tool calculates a Data Health Score based on:
-
valid emails
-
valid phone numbers
-
duplicate rate
-
missing required fields
Example output:
Data health score: 73%
This provides a quick overview of CRM data quality.
lead-csv-sanitizer/
│
├─ src/
│ └─ lead_csv_sanitizer/
│ ├─ cli.py
│ ├─ loader.py
│ ├─ column_mapper.py
│ ├─ cleaner.py
│ ├─ validators.py
│ ├─ dedupe.py
│ ├─ report.py
│ └─ writer.py
│
├─ tests/
├─ samples/
├─ output/
├─ pyproject.toml
└─ README.md
Create and activate a virtual environment:
python -m venv .venv
. .\\.venv\\Scripts\\Activate.ps1
Install dependencies:
python -m pip install --upgrade pip
python -m pip install -e .
Run the sanitizer on a CSV file:
python -m lead_csv_sanitizer.cli samples/messy_leads.csv
Lead CSV Sanitizer
Rows loaded: 4
Valid emails: 3/4
Valid phones: 3/4
Duplicates detected: 2
Data health score: 73%
Processing time: 2.42sThe tool generates several files inside the output/ folder:
| File | Purpose |
|-----|-----|
| cleaned_leads.csv | Full dataset with audit columns |
| crm_import_ready.csv | Simplified dataset ready for CRM import |
| rejected_rows.csv | Rows rejected due to missing identifiers |
| sanitizer_report.json | Data quality metrics |
Run the test suite:
pytest
This tool is built with privacy-first principles.
-
✔ Local processing only
-
✔ No external APIs
-
✔ No data uploads
-
✔ Deterministic outputs
Best practices:
-
never commit real customer data
-
never commit output files with PII
-
use sample data in the repository
Typical users include:
-
RevOps engineers
-
Sales Operations teams
-
CRM administrators
-
Automation consultants
-
Data quality engineers
Common scenarios:
-
preparing CSV imports for CRM
-
auditing marketing lead lists
-
cleaning scraped datasets
-
preventing duplicate CRM contacts
Potential future improvements:
-
batch processing for multiple CSV files
-
richer data quality reports
-
configurable required fields
-
CRM-specific export formats
-
desktop packaging for non-technical users
Built by Jiří Šach
Automation & Data Workflow Builder
If you find this project useful, consider starring the repository.
