This is my implementation of the One Billion Row Challenge - a performance challenge to process a large dataset of weather measurements as quickly as possible.
-
Clone the challenge repository:
git clone git@github.com:gunnarmorling/1brc.git -
Make sure to have Java installed and then run:
./mvnw clean verify
./create_measurements.sh 1000000000- Run the program:
go run main.go
- Map-reduce approach with parallel workers
- Efficient chunk-based file reading - instead of reading line by line, we read in chunks of 4MB
- Process the data as bytes instead of strings to avoid string allocations
- Custom line splitting instead of using string.Split
- Custom temperature parsing function instead of standard string conversion
- Optimized city name deduplication using integer indices instead of string keys
┌─────────────┐
│ Monitor │
│ Goroutine │
└─────────────┘
│
▼
┌─────────────┐ ┌───────────┐ ┌─────────┐ ┌───────────────┐
│ File Reader│───▶│ publishCh │───▶│ Workers │───▶│ Results Channel│
│ Goroutine │ └───────────┘ │(Map) │ └───────────────┘
└─────────────┘ └─────────┘ │
▼
┌───────────────┐
│ Reduce Phase │
│ (Main Thread) │
└───────────────┘
│
▼
┌───────────────┐
│ Save Results │
└───────────────┘