This document explains how to configure, run, and optimize the generate_channel_txn_multithreaded_progress.py script to generate and insert large-scale test data (e.g., 17 million rows) into MySQL efficiently.
- Overview
- Prerequisites
- Folder Setup
- Environment Variables
- Installing Dependencies
- Database Preparation
- Script Execution
- Example Logs
- Performance Tuning Tips
- Optional Enhancements
The script creates synthetic JSON-based records and inserts them into MySQL using multithreading for speed.
It is built for performance testing, ETL simulation, and database benchmarking.
Each thread:
- Uses its own MySQL connection.
- Generates and inserts data in configurable batches.
- Logs real-time progress: inserted count, remaining, % complete, rows/sec, and ETA.
| Requirement | Minimum Version | Notes |
|---|---|---|
| Python | 3.9+ | Required for concurrent.futures and modern MySQL connector |
| MySQL Server | 8.0+ | Should allow local or external connections |
| pip | latest | For dependency management |
| Privileges | CREATE / INSERT / DROP | Required for table creation and inserts |
Recommended project layout:
/path/to/project/
β
βββ generate_channel_txn_multithreaded_progress.py
βββ .env
βββ requirements.txt
You can configure these using either a .env file or direct export commands.
MYSQL_HOST=127.0.0.1
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=rootpassword
MYSQL_DB=guardian
SOURCE_TABLE=channel_txn
TARGET_TABLE=channel_txn_temp
TOTAL_RECORDS=17000000
BATCH_SIZE=50000
THREADS=6export MYSQL_HOST=127.0.0.1
export MYSQL_USER=root
export MYSQL_PASSWORD=rootpassword
export MYSQL_DB=guardian
export SOURCE_TABLE=channel_txn
export TARGET_TABLE=channel_txn_temp
export TOTAL_RECORDS=17000000
export BATCH_SIZE=50000
export THREADS=6cd /path/to/project
python3 -m venv venv
source venv/bin/activaterequirements.txt
mysql-connector-python
Fakersudo apt update
sudo apt install -y python3 python3-pip
pip3 install mysql-connector-python faker- Start MySQL and ensure connection access.
- Create a source table
channel_txn.
The script will automatically clone it tochannel_txn_temp.
Example schema:
CREATE DATABASE guardian;
USE guardian;
CREATE TABLE channel_txn (
channel_id INT NOT NULL,
unique_id VARCHAR(64) NOT NULL,
loc_detail VARCHAR(16) NOT NULL,
ts BIGINT NOT NULL,
msg TEXT NOT NULL,
action TINYINT NOT NULL DEFAULT '0',
txn_id INT DEFAULT NULL,
nil_action TINYINT NOT NULL DEFAULT '0',
nil_id INT DEFAULT NULL,
PRIMARY KEY (channel_id, unique_id),
KEY action_nil (nil_action, channel_id, ts),
KEY action_txn (action, channel_id, ts),
KEY txn_id (txn_id),
KEY nil_id (nil_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;Run the script after activating your virtual environment.
python3 generate_channel_txn_multithreaded_progress.pyπ Connecting to MySQL 127.0.0.1:3306 ...
βοΈ Preparing session for high-speed insert ...
π§± Creating copy table `channel_txn_temp` (if not exists) ...
π Starting multithreaded load: 17,000,000 records using 6 threads (batch=50,000)
π§΅ Worker 1 | Inserted: 200,000/17,000,000 (1.18%) | Left: 16,800,000 | Speed: 95,000 rows/sec | ETA: 2.9 min
π§΅ Worker 2 | Inserted: 400,000/17,000,000 (2.35%) | Left: 16,600,000 | Speed: 108,000 rows/sec | ETA: 2.4 min
...
β
Worker 6 completed 2,833,333 rows.
π All threads done: 17,000,000 rows in 4.8 min β 59,000 rows/sec
| Field | Description |
|---|---|
| Inserted | Total rows inserted so far (across all threads) |
| Left | Remaining records to reach target |
| % Complete | Percentage progress |
| Speed | Average insert rate (rows/sec) |
| ETA | Estimated time remaining (minutes) |
Run inside MySQL before loading:
SET GLOBAL max_allowed_packet = 512M;
SET GLOBAL innodb_flush_log_at_trx_commit = 2;
SET GLOBAL sync_binlog = 0;
SET GLOBAL local_infile = 1;For faster inserts:
ALTER TABLE channel_txn_temp
DROP INDEX action_nil, DROP INDEX action_txn, DROP INDEX txn_id, DROP INDEX nil_id;After inserts:
ALTER TABLE channel_txn_temp
ADD KEY action_nil (nil_action, channel_id, ts),
ADD KEY action_txn (action, channel_id, ts),
ADD KEY txn_id (txn_id),
ADD KEY nil_id (nil_id);| Threads | Recommended for |
|---|---|
| 2β4 | Laptops or dual-core systems |
| 6β8 | Servers (8-core CPU) |
| 10+ | Only if MySQL I/O can handle the concurrency |
| Feature | Description |
|---|---|
| File-based Logging | Add Pythonβs logging module to persist progress in a log file. |
| Multiprocessing | Replace ThreadPoolExecutor with ProcessPoolExecutor for CPU-heavy workloads. |
| Connection Pooling | Use mysql.connector.pooling.MySQLConnectionPool for efficient reuse. |
| Dockerized Run | Use a container linked to your MySQL instance. |
| Scale | Threads | Duration | Throughput |
|---|---|---|---|
| 1 M rows | 4 | ~30 s | ~33 k rows/s |
| 10 M rows | 6 | ~3β4 min | ~55β60 k rows/s |
| 17 M rows | 8 | ~4β5 min | ~60β70 k rows/s |
β Key Benefits
- Environment-driven configuration
- Fully automated data generation
- Real-time progress logging
- Multithreaded inserts for speed
- Session-level optimization
β Ideal Use Cases
- Load testing
- ETL and ingestion simulation
- Data warehouse stress tests
- Application performance benchmarking
This script provides a repeatable, tunable framework for testing MySQL performance at scale.
Easily adaptable for PostgreSQL or MongoDB pipelines.
End of Document