Home Data News Facebook open sources LogDevice, a distributed data store for logs

Facebook open sources LogDevice, a distributed data store for logs

September 24, 2018 - 8:12 am

3052

3 min read

Facebook made its distributed log system, called LogDevice available as an open source project, two weeks back. It is a highly scalable and fault-tolerant distributed data store for sequential data. LogDevice comes with features such as high write availability, consistency guarantees, non-deterministic record placement, and a local log store.

LogDevice is widely used at Facebook. Existing use cases of LogDevice include stream processing pipelines, distribution of index updates in large distributed databases, machine learning pipelines, replication pipelines, and durable reliable task queues.

LogDevice is designed from the ground up so that it can serve different logs with high reliability and efficiency at scale. It is also highly tunable which allows the use cases as mentioned above to be optimized for the right set of trade-offs when it comes to durability-efficiency and consistency-availability space.

Let’s have a look at the key features of LogDevice.

High write availability

LogDevice comes with high write availability which is uncommon in most of the existing logging applications of Facebook. LogDevice efficiently separates record sequencing from record storage. It uses non-deterministic placement of records which improves the write availability and betters the tolerate temporary load imbalances caused by spikes in the write load on individual logs.

Consistency Guarantees

The consistency guarantees provided by a LogDevice log are similar to the ones provided by a record-oriented file system. It comes with built-in data loss detection and reporting. In case data loss occurs, the Log sequence numbers (LSNs) of all records that were lost gets reported to every reader attempting to read the affected log and range of LSNs.

Although, there are no ordering guarantees provided for records of different logs.

Non-deterministic record placement

LogDevice has a different approach when it comes to record placement. First, the ordering of records in a log is decoupled from the actual storage of record copies. For each log in a LogDevice cluster, it runs a sequencer object whose only job is issuing the monotonically increasing sequence numbers as records. The sequencer runs either on a storage node or on a node which has been reserved for sequencing.

After the record has been stamped with a sequence number, the copies of that record get stored on any storage node within a cluster. The LogDevice client library is capable of performing the reordering and occasional de-duplication of records. This makes sure that the records get delivered to the reader application in the order of their LSNs.

Local Log Store

The local log store of LogDevice is called LogsDB. LogsDB is a write-optimized datastore which has been designed to keep the number of disks seeks small and controlled. Also, the write and read IO patterns on the storage device are mostly sequential. The write-optimized data stores offer great performance when writing data, even if it belongs to multiple files or logs.

Apart from that, LogsDB is quite efficient for log tailing workloads, which is a common pattern of log access where records are delivered to readers soon after they are written. These records are never read again except in rare cases such as massive backfills.

LogsDB is built on top of RocksDB, which is an ordered durable key-value data store based on LSM trees. LogsDB act as a time-ordered collection of RocksDB column families. Each RocksDB instance is called a LogsDB partition.

For more information, visit the official LogDevice website.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Facebook open sources LogDevice, a distributed data store for logs

High write availability

Consistency Guarantees

Non-deterministic record placement

Local Log Store

Read Next

MobilePro

datapro

Programming

Subscribe to our newsletter