A few days ago, the Google AI team introduced Snap, a microkernel-inspired approach to host networking at the 27th ACM Symposium on Operating Systems Principles. Snap is a userspace networking system with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service.
The Google AI team says, “Snap has been running in production for over three years, supporting the extensible communication needs of several large and critical systems.”
Prior to Snap, Google AI team says they were limited in their ability to develop and deploy new network functionality and performance optimizations in several ways. This is because developing kernel code was slow and drew on a smaller pool of software engineers. Second, feature release through the kernel module reloads covered only a subset of functionality and often required disconnecting applications, while the more common case of requiring a machine reboot necessitated draining the machine of running applications.
Unlike prior microkernel systems, Snap benefits from multi-core hardware for fast IPC and does not require the entire system to adopt the approach wholesale, as it runs as a userspace process alongside our standard Linux distribution and kernel.
Source: Snap Research paper
Using Snap, the Google researchers also created a new communication stack called Pony Express that implements a custom reliable transport and communications API. Pony Express provides significant communication efficiency and latency advantages to Google applications, supporting use cases ranging from web search to storage.
Features of the Snap userspace networking system
Snap’s architecture comprises of recent ideas in userspace networking, in-service upgrades, centralized resource accounting, programmable packet processing, kernel-bypass RDMA functionality, and optimized co-design of transport, congestion control, and routing. With these, Snap:
- Enables a high rate of feature development with a microkernel-inspired approach of developing in userspace with transparent software upgrades. It also retains the benefits of centralized resource allocation and management capabilities of monolithic kernels and also improves upon accounting gaps with existing Linux-based systems.
- Implements a custom kernel packet injection driver and a custom CPU scheduler that enables interoperability without requiring the adoption of new application runtimes and while maintaining high performance across use cases that simultaneously require packet processing through both Snap and the Linux kernel networking stack.
- Encapsulates packet processing functions into composable units called “engines”, which enables both modular CPU scheduling as well as incremental and minimally disruptive state transfer during upgrades.
- Through Pony Express, it provides support for OSI layer 4 and 5 functionality through an interface similar to an RDMA-capable “smart” NIC. This enables transparently leveraging offload capabilities in emerging hardware NICs as a means to further improve server efficiency and throughput.
- Supports 3x better transport processing efficiency than the baseline Linux kernel and supporting RDMA-like functionality at speeds of 5M ops/sec/core.
MicroQuanta: Snap’s new lightweight kernel scheduling class
To dynamically scale CPU resources, Snap works in conjunction with a new lightweight kernel scheduling class called MicroQuanta. It provides a flexible way to share cores between latency-sensitive Snap engine tasks and other tasks, limiting the CPU share of latency-sensitive tasks and maintaining low scheduling latency at the same time.
A MicroQuanta thread runs for a configurable runtime out of every period time units, with the remaining CPU time available to other CFS-scheduled tasks using a variation of a fair queuing algorithm for high and low priority tasks (rather than more traditional fixed time slots).
MicroQuanta is a robust way for Snap to get priority on cores runnable by CFS tasks that avoid starvation of critical per-core kernel threads. While other Linux real-time scheduling classes use both per-CPU tick-based and global high-resolution timers for bandwidth control, MicroQuanta uses only per-CPU highresolution timers. This allows scalable time-slicing at microsecond granularity.
Snap is being received positively by many in the community.
Snap: a Microkernel Approach to Host Networking
Another cracking paper from SOSP. We know the Linux kernel's networking stack is … meh.
Google's user land networking system achieves "over 3x Gbps/core improvement for RPC workloads, RDMA-like perf of up to 5M IOPS/- core." pic.twitter.com/Cij7fiZ6uf
— Cindy Sridharan (@copyconstruct) October 27, 2019
To know more about Snap in detail, you can read it’s complete research paper.