GitHub, open sourced the GitHub Load Balancer (GLB) Director on August 8, 2018. GLB Director is a Layer 4 load balancer which scales a single IP address across a large number of physical machines. It also minimizes connection disruption during any change in servers.
Apart from open sourcing the GLB Director, GitHub has also shared details on the Load balancer design.
GitHub had first released its GLB on September 22, 2016. The GLB is GitHub’s scalable load balancing solution for bare metal data centers. It powers a majority of GitHub’s public web and Git traffic, and GitHub’s critical internal systems such as its highly available MySQL clusters.
How GitHub Load Balancer Director works
GLB Director is designed for use in data center environments where multiple servers can announce the same IP address via BGP. Further, the network routers shard traffic amongst those servers using ECMP routing.
The ECMP shards connections per-flow using consistent hashing and by addition or removal of nodes. This will cause some disruption to traffic as the state isn’t stored for each flow.
A split L4/L7 design is typically used to allow the L4 servers to redistribute these flows back to a consistent server in a flow-aware manner. GLB Director implements the L4 (director) tier of a split L4/L7 load balancer design.
The GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.
GLB Director only processes packets on ingress. It then encapsulates them inside an extended Generic UDP Encapsulation packet. Egress packets from proxy layer servers are sent directly to clients using Direct Server Return.
Read more about the GLB Director in detail on the GitHub Engineering blog post.