Changelog

All notable changes to SeedStream will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[0.4.0] - 2026-03-20

Release: Database destination complete, biometric data support, security hardening.

Added

Database Destination (Stage 2)

Nested object auto-decomposition: Automatically flattens nested structures into relational columns
JMH benchmarks for database write throughput

Biometric Support

Biometric structure definitions: Face and fingerprint data schemas (config/structures/)
Biometric job definitions: Pre-built jobs for face/fingerprint test data generation
BiometricValidator: Validates biometric field constraints (dimensions, quality scores, ISO/IEC 19794 ranges)
validate CLI subcommand: Validates YAML configurations without executing a job

Formats

CBEFF format (CbeffSerializer): CBEFF-like JSON envelope format for biometric payloads

Distribution

extras/ directory: Ships with the distribution for JDBC drivers and custom Datafaker providers; contents are automatically added to the classpath at startup

Security

File permission validation: Startup check verifies config files are not world-writable (fails fast on unsafe permissions)

Build

Gradle configuration cache: Enabled for faster incremental builds
CI consolidation: SpotBugs moved into the security workflow alongside OWASP Dependency-Check

Changed

Database destination benchmark suite now supports a filter flag to run a subset of scenarios

[0.3.0] - 2026-03-08

Release: Registry-based type system, extended Datafaker coverage, database destination (Stage 1).

Added

Data Generators

20 new Datafaker semantic types (Phase 1): expanded coverage for realistic data (passport, finance, biometric-adjacent fields, etc.)
Thread-local Faker cache: Eliminates repeated Faker instantiation in hot paths; significant throughput improvement for Datafaker-heavy jobs

Database Destination (Stage 1)

JDBC destination: Write generated records to any JDBC-compatible database
HikariCP connection pooling: Configurable pool size, timeout, and keepalive
Batch SQL insert: Configurable batch size to amortize round-trip latency

Changed

Registry-based type system (DatafakerRegistry): Replaces the enum-based PrimitiveType.Kind with a ConcurrentHashMap-backed registry
- Removed 42 semantic enum values (NAME, EMAIL, ADDRESS, etc.)
- 48+ built-in types registered at startup; 20+ aliases (lat/latitude, swift/bic, etc.)
- CustomDatafakerType replaces enum-based semantic types
- TypeParser simplified (~150 lines removed); DatafakerGenerator simplified (~220 lines removed)
- Total reduction: ~350 lines of duplicated switch logic eliminated
- Foundation for future plugin architecture (runtime type registration without recompilation)

Fixed

Removed deprecated static factory methods from generator classes
Fixed FakerCacheTest ThreadLocal pollution between test cases

[0.2.0] - 2026-02-01

Major Release: Core features complete, production-ready for file and Kafka destinations.

Added

Core Functionality

Multi-threading engine with deterministic seeding and backpressure handling
Seed resolution from multiple sources (embedded, file, environment variable, remote API)
Deterministic random provider with thread-local RNG and logical worker IDs
Type system supporting primitives (int, char, decimal, boolean, date, timestamp, enum)
Nested structures with object[structure_name] syntax
Variable-length arrays with array[type, min..max] syntax
Geolocation-aware data using Datafaker (62+ locales supported)

Data Generators

Primitive generators: int, char, decimal, boolean, date, timestamp, enum (12-258M ops/s)
Datafaker integration: 28 semantic types (name, email, address, phone, company, etc.)
Composite generators: nested objects, variable-length arrays, circular reference detection
Performance: 6,923 realistic records/sec validated (100K customer records benchmark)

Formats & Destinations

JSON format: Newline-delimited JSON (RFC 7159 compliant)
CSV format: RFC 4180 compliant with configurable delimiters
Protobuf format: Protocol Buffers binary serialization with dynamic schema generation (50-70% smaller than JSON)
File destination: NIO-based with compression (gzip), append mode, buffering (600-800 MB/s)
Kafka destination: Async/sync modes, SASL/SSL authentication, compression, batching

CLI

Execute command: Run job definitions with format/count/seed/threads overrides
Validate command: Validate YAML configurations without execution
Progress logging: Real-time throughput metrics (records/sec)
Verbose mode: Detailed worker activity and destination telemetry
Debug mode: --debug flag with intelligent trace sampling (--trace-sample-rate 1-100%, default 10%)

Testing & Quality

Unit tests: 276+ tests across all modules (70%+ coverage)
Integration tests: 43 tests using Testcontainers (Kafka, file I/O)
Benchmarks: JMH-based performance validation (NFR-1 compliance)
Code quality: Spotless (Google Java Style), SpotBugs, JaCoCo, OWASP Dependency-Check
CI/CD: GitHub Actions with automated testing and Codecov integration

Documentation

README: Complete quickstart, type system reference, configuration guide
DESIGN.md: Architecture decisions, multi-threading model, extensibility patterns
PERFORMANCE.md: Comprehensive benchmarks, tuning guide, hardware recommendations
CONTRIBUTING.md: Contributor workflow, code standards, PR process, style guide
QUALITY.md: Code quality tools setup and troubleshooting
benchmarks/README.md: Benchmark execution guide and result interpretation

Changed

Module structure: 6 modules (core, schema, generators, formats, destinations, cli)
File I/O optimization: 64KB buffer (up from 8KB), batching (1000 records/batch)
Smart threading: Automatic single-threaded mode for small jobs (< 1000 records)
Seed warning: Log warning when using default seed (0) for reproducibility awareness

Fixed

Thread-safety: Logical worker IDs instead of JVM thread IDs for deterministic seeding
Memory leaks: Proper resource cleanup (connection pools, file handles, Kafka producers)
Backpressure: Bounded queue prevents OOM on fast generation + slow destination
Circular references: Detection and prevention in nested object structures

Performance

Primitive generation: 12-258M ops/s (Boolean: 258M, Integer: 57M, Char: 12M)
Datafaker generation: 13-154K ops/s (Company: 154K, Phone: 13K)
File I/O: 600-800 MB/s with optimized buffering
Real-world: 100K customer records in 14.4 seconds (6,923 rec/sec, 10 threads)
Scaling: 3.7× speedup with 4 workers (92% efficiency)

Security

License: Apache 2.0 (permissive, enterprise-friendly)
Dependencies: OWASP Dependency-Check integrated, no known CVEs
Secrets: Support for environment variables and remote auth (bearer, basic, API key)

Breaking Changes

None (first public release).

[0.1.0] - 2025-12-01

Initial Development Release: Internal testing and validation.

Added

Basic YAML configuration parsing
Primitive data generators (int, string)
File destination with JSON format
Single-threaded generation engine
CLI scaffolding with Picocli

Notes

Not publicly released. Internal prototype for architecture validation.

Roadmap

v0.5.0 (Planned - Q2 2026)

Reference generator for foreign keys (ref[structure.field])
Statistical distributions (normal, Zipfian, exponential)
Advanced Datafaker correlations and constraints
Memory profiling tooling

v0.6.0 (Planned - Q3 2026)

REST API module
gRPC API module
Docker image and Kubernetes deployment
Helm chart

v1.0.0 (Planned - 2027)

Plugin architecture (ServiceLoader-based extensibility)
Schema registry integration (Confluent, AWS Glue)
Data masking and anonymization
Metrics and monitoring (Prometheus, Grafana)
Web UI for configuration management

Contributing

See CONTRIBUTING.md for how to contribute to SeedStream.

For detailed internal planning, see docs/internal/BACKLOG.md.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Releases: mferretti/SeedStream

SeedStream v0.4.0

Changelog

[0.4.0] - 2026-03-20

Added

Database Destination (Stage 2)

Biometric Support

Formats

Distribution

Security

Build

Changed

[0.3.0] - 2026-03-08

Added

Data Generators

Database Destination (Stage 1)

Changed

Fixed

[0.2.0] - 2026-02-01

Added

Core Functionality

Data Generators

Formats & Destinations

CLI

Testing & Quality

Documentation

Changed

Fixed

Performance

Security

Breaking Changes

[0.1.0] - 2025-12-01

Added

Notes

Roadmap

v0.5.0 (Planned - Q2 2026)

v0.6.0 (Planned - Q3 2026)

v1.0.0 (Planned - 2027)

Contributing

License

Uh oh!