Process-to-Process Messaging

Messaging is a core feature of LwMQ. It enables processes to exchange messages, which are structured entities composed of one or more data frame and which are sent atomically.

At one end of the spectrum of existing messaging systems one finds sophisticated message brokers reachable over a network and requiring complex deployment, configuration and maintenance, often with companies specializing in deploying and supporting such systems. These systems often offer strong guarantees such as exactly once delivery, persistence, message replay, and more.

At the other end of the spectrum one find libraries providing message-oriented communication often patterned after well-known abstractions such as BSD sockets, enabling local and remote communication between processes without an intermediate broker.

These system are lightweight in the sense that they require less (or no) administration, but also offer less guarantees, often in exchange for more speed.

LwMQ falls on the side of the second category, providing direct peer-to-peer inter-process communication without configuration or intermediate party.

Its key differentiators include the use of transport mechanisms that are not commonly available in other message-oriented communication mechanisms, and often reserved to server or datacenter environments only.

Most competing solution leverage Unix Domain Sockets (UDS) for local communications, as this transport is aligned in the continuity of the network-based paradigms those systems are built upon.

In contrast, LwMQ provides its own shared-memory-based physical layer for local communication, as well as its own data link and network layers on top if it.

The result is much improved throughput, easily in the multi-million messages per second on common hardware for small payloads, and reaching multi-GB per second throughput with large payloads, as well as much reduced latency down to the low microseconds range or better for single one-way messages, and even sub-microsecond in some scenarios.

LwMQ messaging is entirely asynchronous. Messages are posted to prioritized queues without waiting and application can create multiple queues to avoid interlocking its own threads, or use synchronized queues for worry-free multithreading.

There is no request-and-reply concept in LwMQ. What you get in independent channels in one, the other, or both directions, where the traffic is asynchronous and entirely independent.

One way to look at it is through the lens of a multi-lane interstate superhighway: both directions are high-bandwidth, asynchronous, and independent of each other.

The northbound (say) traffic does not need to be synchronized in any way with the southbound traffic, and both direction can be used at simultaneously at their maximum throughput.

Similarly, you can have any number of independent lanes in each direction, each independent of each other. Just like in real-live you can have a truck lane, multiple regular lanes, and a passing lane with independent traffic, and this in one or both directions in any imaginable configuration.

LwMQ leverages Remote Direct Memory Access (RDMA) for remote peer-to-peer communication. RDMA is a datacenter technology that bypasses the network stack almost entirely (the technology is known as “kernel bypass”) and achieves higher throughput and lower latency that what can be reached through typical network stacks on most operating systems.

LwMQ supports three flavors of RDMA through NetworkDirect v2 providers:

InfiniBand: A high-bandwidth, low-latency datacenter networking technology designed for RDMA.

RoCE (RDMA over Converged Ethernet): Enables RDMA over Ethernet networks, allowing RDMA to function in traditional network infrastructures.

iWARP (Internet Wide Area RDMA Protocol): Allows RDMA over TCP/IP, which is more scalable over long distances.

LwMQ brings unprecedented local IPC performance without special hardware or software requirements, as well as datacenter-level remote IPC performance to regular applications running on regular workstations with unprecedented ease of use, provided they are equipped with an RDMA-capable network adapter with suitable drivers.

The same of course also applies to server applications as well as RDMA-capable virtual machines now available on Google Cloud, AWS and Azure, provided they are equipped with suitable NDv2 drivers.

Workstation adapters capable of RDMA operation over iWARP or RoCE (both atop Ethernet) can be readily obtained example from Broadcom, NVIDIA (Mellanox) or Intel, while InfiniBand adapters are now common on server hardware and some high-end workstations.

LwMQ does is not concerned about the details of the underlying physical transport provided the vendor supplies the appropriate drivers. Furthermore, LwMQ users are entirely shielded from the underlying transport details and can leverage any of the supported transports with virtually no change to the code.

For special case, LwMQ also provides “raw” channels with direct access to the underlying transport buffers, bypassing the messaging and queuing layers as well as message encoding.

This allows users to implement custom protocols on top of LwMQ’s transport layer as well as cater to special use cases such as high-performance data replication, BLOB transfer, or even remote memory access (RMA) scenarios where the application is directly responsible for managing the remote memory and the synchronization of the data transfer, while still benefiting from LwMQ’s efficient transport mechanisms and APIs for connection management, buffer registration, and event handling.

Finally, LwMQ supports Hyper-V specific transports that enable communication between the Host OS, Guest VMs, and Containers without leveraging any network stack at all. Communication is achieved at high speed between the host operating system and potentially headless VMs and Containers having no virtual network hardware attached.

In the future, LwMQ will enable cross-VM and cross-container communication through direct memory access when the needed support is exposed by the underlying hypervisor and OS.

Together, these features enable new scenarios for locally distributed and networked application communicating without borders at the highest throughput allowed by the hardware.

Nothing prevents LwMQ to also support classic network protocols such as regular TCP/IP or UDP, but LwMQ’s design revolves around a DMA-first architecture that is optimized from the ground up for high-speed shared-memory IPC and RDMA, and is fundamentally a peer-to-peer (1:1) communication system where the application establishes peer connections through LwMQ’s communication channels.

In the future, LwMQ may support one-to-many (1:N) cardinality through reliable multicast transports where the message replication is offloaded to the network hardware, but v1 focuses on the 1:1 use case, knowing there is no intrinsic limits on the number of channels an application can create and use concurrently.

Ultrafast IPC and in-memory caching enables many scenarios including real-time financial data dissemination, locally-distributed agents, faster AI training, reinforcement learning and inference scenarios, delegation of sensitive work to a separate process, container or virtual machine with minimal impact on performance, i.e. fast batching of commands to a numeric solver, or a graphic or HTML renderer hosted separately, a risk engine, or simulation components.

In these scenarios, LwMQ performs many time faster than existing solutions, even more so when the current solution involves HTTP/2 over a local socket connection such as gRPC or a local REST API, not mentioning LwMQ provides elastic queues, queue priorities, and other unique features.