A high-performance platform for tracking and analyzing pricing trends across Bulgarian retail and pharmaceutical sectors. PriceStat aggregates open data from KZP (Commission for Protecting Consumers in Bulgaria) and processes enormous datasets to extract actionable insights on market dynamics.
PriceStat processes 35GB+ of historical pricing data collected daily from major Bulgarian retailers and pharmaceutical vendors including Lidl, Kaufland, Sopharmа, Billa, and others with substantial market presence (10M+ BGN annual reported profit). The platform combines the computational efficiency of Rust with the orchestration flexibility of Java to deliver a scalable, production-ready data pipeline.
- High-Volume Data Ingestion: Processes 250MB+ daily data exports from KZP in multiple inconsistent CSV formats
- Intelligent Data Cleaning: Handles malformed quotation marks and format inconsistencies across vendor datasets
- Direct Database Integration: Leverages tokio and tokio-postgres for zero-copy data streaming to PostgreSQL
- FFI-Based Orchestration: Java 25 with Foreign Function Interface (FFI) seamlessly coordinates Rust processing tasks
- Production-Grade Build System: Custom Gradle configuration with jextract and cbindgen for automatic C header compilation from Rust
- Async-First Architecture: Built on Tokio runtime for concurrent data processing and I/O operations
| Component | Technology | Purpose |
|---|---|---|
| Orchestration | Java 25 + FFI | Coordinate data pipelines and system workflows |
| Data Processing | Rust | High-performance CSV parsing and transformation |
| Async Runtime | Tokio | Non-blocking I/O and concurrent task execution |
| Database Access | tokio-postgres | Async PostgreSQL driver for efficient data ingestion |
| Build System | Gradle (Custom Config) | Automated C header compilation (jextract, cbindgen) |
| Database | PostgreSQL | Primary data storage and analytical backend |
| Infrastructure | Docker | Containerized database and deployment environment |
- Dataset Size: 35GB+
- Languages: Rust (61.6%), Java (36%), Dockerfile (2.4%) (numbers are not indicative)
- Data Sources: 5+ major retailers + additional vendors (total 221)
- Daily Processing: ~250MB per data point
- Java 25+
- Rust 1.70+
- PostgreSQL 13+
- Docker
- Gradle 8.0+
-
Build Rust Binaries
./gradlew build
The custom Gradle configuration automatically:
- Compiles Rust code with optimal flags
- Generates C headers using cbindgen
- Extracts Java bindings with jextract
- Creates platform-specific native libraries
-
Start PostgreSQL Database
docker build -f Dockerfile -t pricestat-db . docker run -d -p 5432:5432 pricestat-db -
Run the Pipeline
./gradlew run
KZP Open Data
↓
Java HTTP ZIP file quering and saving
↓
Java FFI Class that sends an Arena with the file path of the to be worked on ZIP
↓
Rust Ingestor (Tokio)
↓
CSV Validation & Cleaning
↓
Format Normalization
↓
tokio-postgres
↓
PostgreSQL Data Lake (35GB+)
Rust Processing Layer
- Asynchronous CSV ingestion using Tokio
- Intelligent quotation mark and format error recovery
- Streaming data transformation pipeline
- Direct PostgreSQL insertion via tokio-postgres
Java Orchestration
- FFI-based coordination of Rust tasks
- Workflow management and error handling
- Scheduling and monitoring
⚠️ Test4.java causes system instability in certain environments (investigation ongoing), if you experience RAM issues I recommend adjusting the set Semaphore limit in the class- Performance considerations noted for 35GB+ dataset processing
- CSV format inconsistencies across vendor exports require custom parsing logic
[MIT]
For questions or collaboration inquiries, please open an issue or contact me (OmegaSleepy).
Prices are growing and the trends are there, my project is not aimed to be the next "app to check for the price of bread the last week". My goal with this project was and still is to experiment, learn and implement Rust to Java with FFI and I learned a lot. Do not expect this project to be updated or maintained, reproducability may varry.


