In-Memory LRU Caching

High-performance, concurrent LRU caches for C and C++ applications.

LwMQ provides two powerful in-memory LRU caching solutions:

  • Regular Cache — Excellent for low-to-moderate contention workloads.

  • Segmented Cache — Designed for extreme concurrency and massive scale (hundreds of millions of operations per second on server hardware).

Both support zero-allocation steady-state operation, TTL eviction, safe read-through with tombstone protection, optional compression (LZ4 or Intel QAT), and AES-256-GCM encryption with per-entry entropy.

When to Use Each Cache

Use Case

Recommended Cache

Why

Low-to-moderate threading (≤8-16 threads)

Regular Cache

Simpler, lower overhead

Heavy multithreading + high contention

Segmented Cache

Direct bit partitioning eliminates global lock contention

Millions to hundreds of millions ops/sec

Segmented Cache

Scales with cores and memory bandwidth

Predictable memory usage / embedded / real-time

Both

Custom huge-block allocator + inline small data

AI inference, session stores, hot data layers

Both (with read-through)

Safe single backfill + tombstone protection

Key Concepts

Direct Bit-Based Partitioning (Segmented Cache)

You control partitioning by designating 1–10 bits from your 128-bit keys. This provides direct, zero-overhead indexing into 2–1024 independent segments. Each segment maintains its own LRU list, dramatically reducing contention under heavy concurrency.

Requirement: Supply well-distributed keys (e.g., RFC 4122 GUIDs or high-quality hashes). Poor distribution leads to hot segments.

Read-Through with Tombstone Protection

On a cache miss when using a callback:

  • The cache inserts a tombstone (placeholder) protected by a per-entry lock.

  • Exactly one thread performs the backfill.

  • Other threads wait efficiently on the same key.

  • The entry is protected from eviction while the backfill is in progress.

This eliminates the thundering herd problem without locking the entire cache or segment.

Memory Management

  • Allocates one or more large virtual memory block(s) upfront.

  • Uses a custom fixed-size allocator inside the block(s) → zero runtime allocations in steady state (especially when storing small values inline).

  • Supports runtime extension via additional extents.

  • Capable of handling terabytes of cached data.

Security & Efficiency Features

  • Per-entry AES-GCM encryption with additional entropy.

  • LZ4 (fast), Deflate, or hardware-accelerated Deflate (Intel QAT) compression.

  • TTL-based eviction (checked on access).

Quickstart Example

#include <api-lwmq-cache.h>

LMQ_KEY Key{};
SIZE_T DataSize{};
LMQ_CACHE Cache{};

//
// Prepare cache parameters for 1 million slots.
//

constexpr LMQ_CACHEPARAMETERS Parameters
{
    sizeof(LMQ_CACHEPARAMETERS),
    LMQ_CACHETYPE_DATA_CACHE,
    LMQ_CACHEFLAGS_NONE,
    1'000'000
};

//
// Create one key.
//

LmqMakeRfc4122Key(&Key);

//
// Create the cache.
//

LmqCreateCache(&Parameters,
               &Cache);

//
// Add one entry.
//

LmqAddCacheEntry(Cache,
                 &Key,
                 Data,
                 Length,
                 LMQ_CACHEENTRY_NO_ADDITIONAL_ENTROPY,
                 LMQ_CACHEENTRY_FLAGS_NONE,
                 0.0f);

//
// Existence test: lookup one entry.
//

LmqLookupCacheEntry(Cache,
                    &Key,
                    &DataSize);

//
// Retrieve one entry.
//

LmqRetrieveCacheEntry(Cache,
                      &Key,
                      Data,
                      &DataSize,
                      LMQ_CACHEENTRY_NO_ADDITIONAL_ENTROPY,
                      nullptr);

//
// Delete an entry.
//

LmqRemoveCacheEntry(Cache,
                    &Key);

//
// Destroy the cache.
//

LmqDestroyCache(&Cache);

Best Practices

Key Selection

  • Prefer GUIDs created with LmqMakeRfc4122Key() — no hashing required at lookup time.

  • For strings or binary data, use LmqStringToKey() or LmqBytesToKey().

  • For Segmented Cache, choose partition bits from high-entropy positions in the key.

Choosing Segment Count (Segmented Cache)

  • Start with 256 or 512 segments on high-core-count servers.

  • More segments = better scalability, slightly higher memory overhead.

  • Fewer segments = more contention.

Read-Through Callbacks

  • Keep callbacks reasonably fast.

  • The cache protects the entry (via tombstone) during the callback.

Performance Tips

  • Use small inline data when possible.

  • Enable Hardware Lock Elision (HLE) where supported by the CPU.

  • Pre-size the cache appropriately to minimize extent allocations.

  • Monitor performance with LmqGetCacheMetrics().

Performance Characteristics

  • Regular Cache: Multi-million ops/sec even under moderate contention.

  • Segmented Cache: Tens of millions ops/sec on laptops; hundreds of millions on multi-socket servers with sufficient memory bandwidth.

  • Near-linear scaling with cores when keys are well-distributed.

  • Predictable low latency (no background threads or global locks).

Limitations & Considerations

  • Keys are 128-bit values. Collision risk is extremely low but present if using weak hashing.

  • Segmented Cache requires well-distributed keys for optimal load balancing.

  • Encryption and compression add CPU overhead (valuable trade-off for memory/IO-bound workloads).

  • The cache is passive: eviction and TTL checks occur on access.

Next Steps

  • Full API ReferenceLwMQ Caching API

  • Advanced topics: memory extents, metrics, custom allocators (see API documentation)