In-Memory LRU Caching
High-performance, concurrent LRU caches for C and C++ applications.
LwMQ provides two powerful in-memory LRU caching solutions:
Regular Cache — Excellent for low-to-moderate contention workloads.
Segmented Cache — Designed for extreme concurrency and massive scale (hundreds of millions of operations per second on server hardware).
Both support zero-allocation steady-state operation, TTL eviction, safe read-through with tombstone protection, optional compression (LZ4 or Intel QAT), and AES-256-GCM encryption with per-entry entropy.
When to Use Each Cache
Use Case |
Recommended Cache |
Why |
|---|---|---|
Low-to-moderate threading (≤8-16 threads) |
Regular Cache |
Simpler, lower overhead |
Heavy multithreading + high contention |
Segmented Cache |
Direct bit partitioning eliminates global lock contention |
Millions to hundreds of millions ops/sec |
Segmented Cache |
Scales with cores and memory bandwidth |
Predictable memory usage / embedded / real-time |
Both |
Custom huge-block allocator + inline small data |
AI inference, session stores, hot data layers |
Both (with read-through) |
Safe single backfill + tombstone protection |
Key Concepts
Direct Bit-Based Partitioning (Segmented Cache)
You control partitioning by designating 1–10 bits from your 128-bit keys. This provides direct, zero-overhead indexing into 2–1024 independent segments. Each segment maintains its own LRU list, dramatically reducing contention under heavy concurrency.
Requirement: Supply well-distributed keys (e.g., RFC 4122 GUIDs or high-quality hashes). Poor distribution leads to hot segments.
Read-Through with Tombstone Protection
On a cache miss when using a callback:
The cache inserts a tombstone (placeholder) protected by a per-entry lock.
Exactly one thread performs the backfill.
Other threads wait efficiently on the same key.
The entry is protected from eviction while the backfill is in progress.
This eliminates the thundering herd problem without locking the entire cache or segment.
Memory Management
Allocates one or more large virtual memory block(s) upfront.
Uses a custom fixed-size allocator inside the block(s) → zero runtime allocations in steady state (especially when storing small values inline).
Supports runtime extension via additional extents.
Capable of handling terabytes of cached data.
Security & Efficiency Features
Per-entry AES-GCM encryption with additional entropy.
LZ4 (fast), Deflate, or hardware-accelerated Deflate (Intel QAT) compression.
TTL-based eviction (checked on access).
Quickstart Example
#include <api-lwmq-cache.h>
LMQ_KEY Key{};
SIZE_T DataSize{};
LMQ_CACHE Cache{};
//
// Prepare cache parameters for 1 million slots.
//
constexpr LMQ_CACHEPARAMETERS Parameters
{
sizeof(LMQ_CACHEPARAMETERS),
LMQ_CACHETYPE_DATA_CACHE,
LMQ_CACHEFLAGS_NONE,
1'000'000
};
//
// Create one key.
//
LmqMakeRfc4122Key(&Key);
//
// Create the cache.
//
LmqCreateCache(&Parameters,
&Cache);
//
// Add one entry.
//
LmqAddCacheEntry(Cache,
&Key,
Data,
Length,
LMQ_CACHEENTRY_NO_ADDITIONAL_ENTROPY,
LMQ_CACHEENTRY_FLAGS_NONE,
0.0f);
//
// Existence test: lookup one entry.
//
LmqLookupCacheEntry(Cache,
&Key,
&DataSize);
//
// Retrieve one entry.
//
LmqRetrieveCacheEntry(Cache,
&Key,
Data,
&DataSize,
LMQ_CACHEENTRY_NO_ADDITIONAL_ENTROPY,
nullptr);
//
// Delete an entry.
//
LmqRemoveCacheEntry(Cache,
&Key);
//
// Destroy the cache.
//
LmqDestroyCache(&Cache);
Best Practices
Key Selection
Prefer GUIDs created with
LmqMakeRfc4122Key()— no hashing required at lookup time.For strings or binary data, use
LmqStringToKey()orLmqBytesToKey().For Segmented Cache, choose partition bits from high-entropy positions in the key.
Choosing Segment Count (Segmented Cache)
Start with 256 or 512 segments on high-core-count servers.
More segments = better scalability, slightly higher memory overhead.
Fewer segments = more contention.
Read-Through Callbacks
Keep callbacks reasonably fast.
The cache protects the entry (via tombstone) during the callback.
Performance Tips
Use small inline data when possible.
Enable Hardware Lock Elision (HLE) where supported by the CPU.
Pre-size the cache appropriately to minimize extent allocations.
Monitor performance with
LmqGetCacheMetrics().
Performance Characteristics
Regular Cache: Multi-million ops/sec even under moderate contention.
Segmented Cache: Tens of millions ops/sec on laptops; hundreds of millions on multi-socket servers with sufficient memory bandwidth.
Near-linear scaling with cores when keys are well-distributed.
Predictable low latency (no background threads or global locks).
Limitations & Considerations
Keys are 128-bit values. Collision risk is extremely low but present if using weak hashing.
Segmented Cache requires well-distributed keys for optimal load balancing.
Encryption and compression add CPU overhead (valuable trade-off for memory/IO-bound workloads).
The cache is passive: eviction and TTL checks occur on access.
Next Steps
Full API Reference → LwMQ Caching API
Advanced topics: memory extents, metrics, custom allocators (see API documentation)