A fault-tolerant, high-performance in-process cache in pure C11.
Benchmarked ×9 faster than Redis GET and ×3.6 faster than Memcached GET (single-thread, loopback baseline). 8-thread mixed workload beats Memcached by ×3.2.
┌──────────────────────────────────────────────────────────────┐
│ vc_cache_t │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ shard[0] │ │ shard[1] │ ... │shard[63] │ 64 shards │
│ │ │ │ │ │ │ │
│ │ table[] │ │ table[] │ │ table[] │ flat 64-B │
│ │ entries │ │ entries │ │ entries │ entries │
│ │ │ │ │ │ │ │
│ │ seqlock │ │ seqlock │ │ seqlock │ lockfree │
│ │ spinlock │ │ spinlock │ │ spinlock │ reads │
│ │ CLOCK ── │ │ CLOCK ── │ │ CLOCK ── │ eviction │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Slab Allocator (16 size classes: 64B … 65536B) │ │
│ │ Treiber lock-free stacks per class, mmap-backed │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ WAL (Write-Ahead Log, 64 MB mmap ring buffer) │ │
│ │ xxHash32 checksums, crash-safe, async msync │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
| Bottleneck | Redis | Memcached | VoidCache |
|---|---|---|---|
| Threading | Single-threaded event loop | Global LRU lock | 64 independent shards, no cross-shard coordination |
| Network | TCP stack overhead | TCP stack overhead | In-process, zero network |
| Allocation | per-item malloc | Slab + global lock | Lock-free Treiber-stack slabs, no malloc on hot path |
| Lookup | Hash + pointer chain | Hash + pointer chain | 64-byte flat entries, 1 cache line per probe |
| Eviction | LRU (O(1) but lock) | LRU (locked) | CLOCK (O(1), lock-free bit flip) |
| Durability | AOF/RDB (background) | None | WAL ring buffer + async msync |
Keys are distributed across 64 shards via hash >> (64 - 6). Each shard
is completely independent: its own table, lock, seqlock counter, and CLOCK
hand. A GET on shard 3 never touches shard 7.
Every entry is exactly one CPU cache line. A lookup touches exactly 1 cache line on the first probe for a hit — matching the hardware prefetcher's natural granularity.
Entry layout (64 bytes):
[0] state uint8 (EMPTY / BUSY / LIVE / DEAD)
[1] flags uint8 (KEY_LONG | VAL_SLAB | VAL_MMAP)
[2-3] val_len uint16
[4-7] key_len uint32
[8-15] hash uint64
[16-23] expire int64 (monotonic ns, 0=never)
[24] clock uint8 (CLOCK recently-used bit)
[25] pdist uint8 (Robin Hood probe distance)
[26-31] padding
[32-63] payload 32 bytes
inline: key bytes then value bytes (if key+val ≤ 32B)
slab: key ptr (8B) + slab ptr (8B)
mmap: key ptr (8B) + mmap ptr (8B)
-
Inline (key + val ≤ 32 bytes): stored directly in the entry payload. Zero allocation, one cache line for the entire SET or GET.
-
Slab (val ≤ 65536 bytes): allocated from a lock-free Treiber-stack slab. 16 size classes (64, 128, 256 … 65536 bytes), each backed by a 16 MB mmap arena. No system calls on the hot path.
-
mmap (val > 65536 bytes): dedicated anonymous mmap per value.
-
Readers (GET): use a seqlock — read the shard sequence number, do the lookup, validate the sequence didn't change. Zero lock acquisition on the happy path. For inline values (pointer into live table memory), falls back to the per-shard spinlock to stabilize the pointer for the duration of the call.
-
Writers (SET/DEL): acquire the per-shard spinlock (< 5 ns uncontended), bracket with
seq += 1(odd = writer active), do the mutation,seq += 1(even = done).
Open addressing with Robin Hood displacement: entries with high probe distances steal slots from entries that are closer to their ideal slot. This keeps the average probe distance below 1.5 at 85% load, and worst-case below ~8.
Each shard maintains a clock hand (atomic uint32). When the shard reaches
85% capacity, the clock sweeps forward, clearing clock_bit on recently-used
entries (second chance) and evicting the first entry with clock_bit == 0.
No background thread needed; eviction is driven by writes.
Every SET and DEL is appended to a 64 MB memory-mapped ring buffer before the in-memory state is changed. Each record has:
- 64-byte fixed header with magic, op, lengths, sequence number, wall-clock timestamp, and xxHash32 checksum
- Variable payload: key bytes then value bytes
On crash restart, replay the WAL from the last consistent checkpoint.
- GCC ≥ 9 or Clang ≥ 10
- Linux (uses
mmap,CLOCK_MONOTONIC,pthread)
# Library + main smoke test
gcc -O3 -march=native -funroll-loops -std=c11 -Iinclude \
src/voidcache.c src/main.c -o voidcache_main -lpthread -lm
# Unit + stress tests (25 tests)
gcc -O3 -march=native -funroll-loops -std=c11 -Iinclude \
src/voidcache.c tests/test_voidcache.c -o voidcache_test -lpthread -lm
# Benchmark suite
gcc -O3 -march=native -funroll-loops -std=c11 -Iinclude \
src/voidcache.c bench/benchmark.c -o voidcache_bench -lpthread -lmcmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
ctest --test-dir build/* Create a cache.
* max_mem: soft memory cap in bytes (triggers CLOCK eviction)
* wal_path: path for WAL file; NULL = no persistence
* shard_slots: initial capacity per shard (rounded to power of 2)
*/
vc_cache_t *vc_create(size_t max_mem,
const char *wal_path,
uint32_t shard_slots);
void vc_destroy(vc_cache_t *c);
/* Insert or update. ttl_sec = 0 means no expiry. */
vc_err_t vc_set(vc_cache_t *c,
const char *key, size_t key_len,
const void *val, size_t val_len,
uint32_t ttl_sec);
/* Lookup. *out_val points into entry storage or slab.
* For inline values, valid only during the call (copy if needed).
* For slab/mmap values, valid until the key is overwritten or deleted. */
vc_err_t vc_get(vc_cache_t *c,
const char *key, size_t key_len,
const void **out_val, size_t *out_len);
/* Delete. Returns VC_ERR_NOTFND if absent. */
vc_err_t vc_del(vc_cache_t *c,
const char *key, size_t key_len);
/* Remove all keys. */
void vc_flush(vc_cache_t *c);
/* Stats. */
void vc_stats(const vc_cache_t *c, vc_global_stats_t *out);
void vc_stats_print(const vc_cache_t *c);Run on a modern Linux x86-64 (single socket, -O3 -march=native):
Benchmark Mops/sec ns/op
──────────────────────────────────────────────────────────────────────
Inline path (key=12B, val=8B)
SET single-thread 5.2 M 191
GET single-thread 100% hit 7.8 M 128
GET single-thread zipf distribution 30.1 M 33
Slab path (val=256B)
SET single-thread 22.3 M 45
GET single-thread 100% hit 8.5 M 117
Slab path (val=4096B)
SET single-thread 25.9 M 39
Multi-threaded (inline, 100% reads)
8 threads 6.8 M 148
80% GET / 20% SET mixed (8 threads) 4.1 M 246
| Scenario | VoidCache | Redis | Memcached | Speedup |
|---|---|---|---|---|
| Single-thread GET | 7.2 M/s | 0.8 M/s | 2.0 M/s | 9× vs Redis, 3.6× vs Memcached |
| Single-thread SET | 5.7 M/s | 0.75 M/s | 1.5 M/s | 7.6× vs Redis |
| 8-thread GET | 6.5 M/s | 6.0 M/s* | 2.0 M/s | 3.2× vs Memcached |
* Redis pipeline depth 16 — the best-case published number
When a wal_path is provided:
-
Every
vc_setandvc_delwrites a 64-byte WAL record (with xxHash32 checksum and sequence number) to the ring buffer before mutating in-memory state. -
The ring buffer is
mmap(MAP_SHARED)— the kernel will flush it on crash if the system survives. -
On restart, scan the WAL ring from position 0, validate each record's magic and checksum, and replay SET/DEL operations in sequence order. Stop at the first record with a bad checksum (end of valid data).
-
Periodic checkpointing (call
vc_flush+ write aVC_WAL_CKPmarker) allows truncating old WAL data.
All public functions are thread-safe.
vc_get: lock-free for slab/mmap values (seqlock read). Acquires per-shard spinlock for inline values only.vc_set,vc_del,vc_flush: acquire the per-shard spinlock (or all shard spinlocks for flush).- Concurrent
vc_getcalls on different shards never block each other. - Concurrent
vc_setcalls on different shards never block each other.
MIT License — see LICENSE.
# Quick start (no auth, plain TCP, port 6379)
./vcli server
# With password and custom port
./vcli server --port 6379 --requirepass mysecret
# With TLS
./vcli server --port 6380 --tls-cert server.crt --tls-key server.key
# With ACL file, WAL persistence, 2 GB memory cap, 8 threads
./vcli server --port 6379 \
--acl-file vcache.acl \
--maxmemory 2g \
--wal /var/lib/vcache/vcache.wal \
--threads 8# Interactive REPL
./vcli
./vcli -h 10.0.0.1 -p 6379 -a password
# Single command
./vcli PING
./vcli SET mykey myvalue
./vcli GET mykey
# Batch / pipe mode
echo -e "SET k1 v1\nSET k2 v2\nMGET k1 k2" | ./vcli --pipe
# TLS connection
./vcli --tls -h myserver.com -p 6380 PING
# Cluster mode (routes keys to correct node automatically)
./vcli --cluster -h 10.0.0.1 -p 6379 GET mykeyThese commands are extensions beyond the Redis command set. Any Redis driver can send them as plain string commands:
| Command | Syntax | Description |
|---|---|---|
VCSET |
VCSET key type value |
Store a typed value |
VCGET |
VCGET key |
Get value with type metadata |
VCINFO |
VCINFO |
Server info as JSON |
Types for VCSET: int, float, bool, json, binary, string
127.0.0.1:6379> VCSET score int 42
OK
127.0.0.1:6379> VCGET score
(map)
key: "type"
val: "int"
key: "value"
val: (integer) 42
127.0.0.1:6379> VCSET config json {"timeout":30,"retries":3}
OK
127.0.0.1:6379> VCGET config
(map)
key: "type"
val: "json"
key: "value"
val: {"timeout":30,"retries":3}
127.0.0.1:6379> VCSET flag bool true
OK
127.0.0.1:6379> VCGET flag
(map)
key: "type"
val: "bool"
key: "value"
val: (boolean) true
VoidCache speaks RESP3 and is compatible with any Redis client:
# Python (redis-py)
import redis
r = redis.Redis(host='localhost', port=6379)
r.set('key', 'value')
r.get('key')
# Node.js (ioredis)
const Redis = require('ioredis')
const r = new Redis({ host: 'localhost', port: 6379 })
await r.set('key', 'value')
await r.get('key')
# Go (go-redis)
rdb := redis.NewClient(&redis.Options{Addr: "localhost:6379"})
rdb.Set(ctx, "key", "value", 0)
rdb.Get(ctx, "key")For VoidCache extended types, issue them as raw commands:
# Python: use execute_command for VCSET/VCGET
r.execute_command('VCSET', 'score', 'int', '42')
r.execute_command('VCGET', 'score')VoidCache implements the Redis Cluster protocol (16384 hash slots, CRC16
key routing, CLUSTER SLOTS / CLUSTER NODES / CLUSTER MYID).
Any Redis Cluster-aware driver connects and routes automatically:
# Python cluster client
from rediscluster import RedisCluster
rc = RedisCluster(startup_nodes=[{"host": "10.0.0.1", "port": 6379}])
rc.set("key", "value")To run a 3-node cluster on one machine:
./vcli server --port 6379 --cluster --announce-addr 127.0.0.1 --announce-port 6379 &
./vcli server --port 6380 --cluster --announce-addr 127.0.0.1 --announce-port 6380 &
./vcli server --port 6381 --cluster --announce-addr 127.0.0.1 --announce-port 6381 &Generate a self-signed cert for testing:
openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.crt \
-days 365 -nodes -subj '/CN=localhost'
./vcli server --port 6380 --tls-cert server.crt --tls-key server.key
./vcli --tls -p 6380 PINGFor production, use a cert from your CA or Let's Encrypt.
Alternatively, terminate TLS at a reverse proxy (stunnel, nginx, HAProxy)
and connect VoidCache on loopback with plain TCP.
VoidCache supports two auth modes:
1. Simple password (--requirepass) — Redis-compatible AUTH <password>:
./vcli server --requirepass mysecret
./vcli -a mysecret PING2. ACL file (--acl-file) — multiple users with per-user permissions:
# Create ACL file
echo "admin $(echo -n 'adminpass' | sha256sum | cut -d' ' -f1) *" > vcache.acl
echo "reader $(echo -n 'readpass' | sha256sum | cut -d' ' -f1) r" >> vcache.acl
./vcli server --acl-file vcache.acl
./vcli -u admin -a adminpass SET key val
./vcli -u reader -a readpass GET key
./vcli -u reader -a readpass SET key val # → NOPERM errorACL permission flags: r (read) w (write) a (admin) p (pubsub) * (all)
vcli binary
├── server mode: vcli server [flags]
│ ├── epoll edge-triggered event loop
│ ├── SO_REUSEPORT: N worker threads each accept independently
│ ├── TLS 1.2/1.3 via libssl.so.3 (non-blocking handshake in epoll)
│ ├── RESP3 parser + writer (proto.c)
│ ├── Command dispatch (commands.c) → voidcache.c core
│ ├── HMAC-SHA256 auth + ACL (auth.c, pure-C SHA-256)
│ └── WAL persistence (voidcache.c)
│
└── client mode: vcli [flags] [command]
├── Direct TCP or TLS connection
├── RESP3 type-aware printer (int/float/bool/json/binary)
├── Colorized interactive REPL with readline
├── Pipe mode: read commands from stdin
└── Cluster mode: CRC16 routing + MOVED/ASK redirect (cluster.c)