Read this in other languages: 한국어
A Producer-Consumer Pattern queue implemented based on std::vector. It shows faster performance than boost::lockfree::queue, which you can verify in the benchmark results below. It only uses the c++ 11 standard library, with a simple codebase (signal+blockingvector is about 500 lines, excluding thread code).
Since lockfree-queue uses CAS-based competition, it may not always be the optimal choice. This implementation can serve as an alternative for producer-consumer patterns. When choosing an implementation, consider trade-offs between performance, debugging ease, implementation complexity, dependencies, and constraints for your specific project needs.
This source is available under MIT license. While you can use it as-is, its simple implementation allows you to adapt it to better suit your project's requirements.
Is Lock-free Queue Always the Best Choice?
- Simple implementation of about 500 lines using only the C++ standard library
- Queue container that wraps std::vector and adds blocking functionality
- Easy debugging with a simple code structure
- No restrictions on storage types (same as std::vector storage type restrictions)
- Free memory allocation based on std::vector
- Optimized for a single producer-consumer (1:1) pattern (multiple producer-consumer (1:N, N:N) patterns require a separate management object (intermediate management class) implementation)
- 0% CPU usage when waiting
- Compiler that supports C++11.
- Compiler that supports C++11
- boost lockfree-queue libraries
- Test Environment:
- AMD 7945HX
- i9-9900K CPU @ 3.60GHz
- Test Configuration:
- Both boost::lockfree::queue and BlockingVector have 100,000 spare buffers
- 1:1 exchange between Producer and Consumer
- 10,000,000 transmissions with varying block sizes: 8, 32, 64, 128, 512, 1024, 2048, 4096, 8192, 10240, 20480
- Note: Test block size 20480 requires 32GB of memory
- Implementation details:
- Source code available in src/main.cpp
- Benchmark source and table created using Claude (sonnet 3.5)
- Results shown are from the 6th run after 5 warm-up runs
| Block Size | Execution Time (seconds) | Throughput (ops/sec) | % Faster | ||
|---|---|---|---|---|---|
| BoostLockFree | BlockingVector | BoostLockFree | BlockingVector | ||
| 8 bytes | 1.500 | 0.402 | 6.66M | 24.89M | 273.41% |
| 32 bytes | 1.501 | 0.402 | 6.66M | 24.85M | 272.95% |
| 64 bytes | 1.547 | 0.417 | 6.47M | 24.00M | 271.14% |
| 128 bytes | 1.557 | 0.452 | 6.42M | 22.12M | 244.48% |
| 512 bytes | 1.767 | 0.945 | 5.66M | 10.58M | 87.02% |
| 1024 bytes | 2.049 | 1.123 | 4.88M | 8.91M | 82.51% |
| 2048 bytes | 2.418 | 1.898 | 4.13M | 5.27M | 27.43% |
| 4096 bytes | 3.195 | 2.630 | 3.13M | 3.80M | 21.45% |
| 8192 bytes | 5.519 | 3.499 | 1.81M | 2.86M | 57.73% |
| 10240 bytes | 6.463 | 3.744 | 1.55M | 2.67M | 72.64% |
| 20480 bytes | 14.012 | 7.395 | 0.71M | 1.35M | 89.48% |
| Block Size | Execution Time (seconds) | Throughput (ops/sec) | % Faster | ||
|---|---|---|---|---|---|
| BoostLockFree | BlockingVector | BoostLockFree | BlockingVector | ||
| 8 bytes | 3.071 | 0.218 | 3.26M | 45.90M | 1309.45% |
| 32 bytes | 3.045 | 0.235 | 3.28M | 42.52M | 1194.73% |
| 64 bytes | 3.105 | 0.256 | 3.22M | 39.06M | 1112.90% |
| 128 bytes | 3.112 | 0.268 | 3.21M | 37.34M | 1062.03% |
| 512 bytes | 3.509 | 0.660 | 2.85M | 15.14M | 431.41% |
| 1024 bytes | 3.800 | 0.733 | 2.63M | 13.65M | 418.74% |
| 2048 bytes | 5.165 | 1.174 | 1.94M | 8.51M | 339.73% |
| 4096 bytes | 8.417 | 1.567 | 1.19M | 6.38M | 437.21% |
| 8192 bytes | 12.478 | 2.607 | 0.80M | 3.84M | 378.64% |
| 10240 bytes | 13.921 | 3.408 | 0.72M | 2.93M | 308.49% |
| 20480 bytes | 22.029 | 5.672 | 0.45M | 1.76M | 288.36% |
There is a notable difference in performance between the i9-9900K and AMD 7945HX. It is a simple comparison, but if you look at the lock-free, there is a difference in CPU, and you can see that Intel is about 3 times faster than AMD.