Yesterday, Facebook open-sourced LogDevice, a distributed data store designed specifically for logs. This initial release includes the toolset that forms the core of LogDevice operational infrastructure. In future, they are planning to open-source the automation tools that they have built to manage LogDevice clusters.
It is currently supported only on Ubuntu 18.04 (Bionic Beaver). However, it should be possible to build it on any other Linux distribution without significant challenges.
What is LogDevice
LogDevice, as the name suggests, is a log system which promises to be scalable and fault tolerant. Unlike filesystems, which store and serve data as organized files, LogDevice stores and delivers data as logs. A log is a record-oriented, append-only, trimmable file.
How it works
LogDevice uses a placement and delivery scheme, which is great for write availability and handling spiky write workloads:
1. Separating sequencing and storage: First, the ordering of records is decoupled from the actual storage of record copies. For every log in a LogDevice cluster, a sequencer object is executed to issue monotonically increasing sequence numbers as records are appended to that log.
2. Placement: After a record is stamped with a sequence number, its copies can be potentially stored on any storage in the cluster.
3. Reading: A client who wants to read a particular log contacts all storage nodes that are permitted to store records of that log. These nodes are collectively called node set of the log and usually kept smaller than the total number of nodes in the cluster. The contacted nodes deliver record copies to the client by pushing them into the TCP connections as fast as they can.
4. Metadata history: The node set is part of the replication policy of the log. It can be changed at any time, with an appropriate note in the log’s metadata history. Readers can then consult it in order to find the storage nodes to connect to.
5. Reordering and de-duplication: The reordering and occasional de-duplication of records is done by the LogDevice client library. This is done to ensure that the records are delivered to the reader application in the order of their log sequence number (LSN).
What are its common use cases
Facebook uses LogDevice for various use cases, which include:
- Write-ahead logging for durability
- Transaction logging in a distributed database
- Event logging
- Journals of deferred work items
- Distribution of index updates in large distributed databases
- Machine learning pipelines
- Replication pipelines
- Durable reliable task queues
- Stream processing pipelines