Uber open sources its large scale metrics platform, M3 for Prometheus

Yesterday, Uber Inc., open-sourced its robust and scalable metrics infrastructure, M3 for Prometheus, a popular monitoring and alerting solution. Uber has been using M3 for a long time to access metrics on their backend systems. However, by open sourcing M3 as a remote storage backend for Prometheus, Uber wants others in the broader community to benefit from their metrics platform.

Prior to releasing M3, Uber released M3DB, the scalable storage backend for M3. M3DB is a distributed time series database that can be used for storing real-time metrics at long retention

Along with M3, Uber also open sourced M3 Coordinator, a bridge that users can deploy to access the benefits of M3DB and Prometheus. The M3 Coordinator performs downsampling, ad hoc retention, and aggregation of metrics using retention and rollup rules. This helps in applying specific retention and aggregations to subsets of metrics on the go. The rules of the process are stored in etcd, which runs embedded in the binary of an M3DB seed node.

M3 for Prometheus

Although Prometheus is a popular monitoring and alerting solution, its scalability and durability is limited by single nodes. The M3 metric platform provides a turnkey, scalable, and configurable multi-tenant store for Prometheus metrics.

uber-open-sources-its-large-scale-metrics-platform-m3-for-prometheus-img-0

Source: Uber Engineering

Uber, before using M3, emitted metrics to a Graphite stack, which stored them using the Whisper file format in a sharded Carbon cluster. Uber then made use of Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. However, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. Thus, this solution was not worth continuing as Uber kept expanding.

This led them to build M3, a system which provides fault-tolerant metrics ingestion, storage, and querying as a managed platform. Released in the year 2015, M3 now houses over 6.6 billion time series.

Features of M3 include:

It optimizes every part of the metrics pipeline. This gives engineers an improved storage and results in lesser hardware usage.

M3 ensures that the data is as highly compressed to reduce hardware footprint. This further optimizes Gorilla’s TSZ compression to compress float64 values, known as M3TSZ compression.

Maintains a lean memory footprint for storage to avoid memory becoming a bottleneck since a significant portion of each data point can be “write once, read never.”

To speed up access time, a Bloom filter and index summary per shard time window block in mmap’d memory is available. This allows ad-hoc queries of up to 100,000 unique time series in a single query over long retention periods (in some cases, spanning years of retention).

With M3, one can avoid compactions where possible, including the downsampling path. This will further increase the utilization of host resources for more concurrent writes and provide steady write/read latency.

One can also use a native design for time series storage that does not require vigilant operational attention to run with a high write volume.

The M3 architecture

uber-open-sources-its-large-scale-metrics-platform-m3-for-prometheus-img-1

The M3 architecture

M3 architecture includes a single global view of all metrics

With such a global view, upstream consumers need not navigate routing. This increases the overall simplicity of metrics discoverability. For workloads that failover applications between regions or workloads sharded across regions, the single global view makes it much easier to sum and query metrics across all regions in a single query.

This lets users see all operations of a specific type globally, and look at a longer retention to view historical trends in a single place.

How can one achieve the single global view?

To achieve this single pane view, metrics are written in M3 to local regional M3DB instances. In this setup, replication is local to a region and can be configured to be isolated by availability zone or rack. Queries fan out to both the local region’s M3DB instances and coordinators in remote regions where metrics are stored, returning compressed M3TSZ blocks for matched time series wherever possible.

Uber engineers plan to further upgrade M3 to push query aggregations to remote regions to execute before returning results, as well as to the local M3DB storage node wherever possible.

Read more about M3 in detail in Uber Engineering official blog post.

China’s Baidu launches Duer OS Prometheus Project to accelerate conversational AI

Log monitoring tools for continuous security monitoring policy [Tutorial]

Monitoring, Logging, and Troubleshooting