Datadog, the monitoring, and analytics platform released DDSketch (Distributed Distribution Sketch) which is a fully-mergeable, relative-error quantile sketching algorithm with formal guarantees. It was presented at VLDB2019 in August.
DDSketch is fully-mergeable and relative-error quantile sketching algorithm
Per Wikipedia, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities or dividing the observations in a sample in the same way. Quantiles are useful measures because they are less susceptible than means to long-tailed distributions and outliers.
Calculating exact Quantiles can be expensive for both storage and network bandwidth. So, most monitoring systems compress the data into sketches and compute approximate quantiles. However, maintaining Quantile sketches has primarily been done on bounding the rank error of the sketch while using little memory. Unfortunately, for data sets with heavy tails, rank-error guarantees can return values with large relative errors.
Also, quantile sketches should be mergeable which means that several combined sketches must be as accurate as a single sketch of the same data. These two problems are addressed in DDSketch which comes with formal guarantees, is also fully-mergeable and has relative-error sketching. The sketch is extremely fast as well as accurate and is currently being used by Datadog.
How DDSketch works
As mentioned earlier, DDSketch has relative error guarantees. This means it computes quantiles with a controlled relative error. For example, for a DDSketch with a relative accuracy guarantee set to 1% and expected quantile value set to 100, the computed quantile value is guaranteed to be between 99 and 101. If the expected quantile value is 1000, the computed quantile value is guaranteed to be between 990 and 1010.
DDSketch works by mapping floating-point input values to bins and counting the number of values for each bin. The mapping to bins is handled by IndexMapping, while the underlying structure that keeps track of bin counts is Store.
The memory size of the sketch depends on the range that is covered by the input values; the larger the range, the more bins are needed to keep track of the input values. As a rough estimate, when working on durations using standard parameters (mapping and store) with a relative accuracy of 2%, about 2.2kB (297 bins) are needed to cover values between 1 millisecond and 1 minute, and about 6.8kB (867 bins) to cover values between 1 nanosecond and 1 day.
DDSketch implementations and comparisons
Datadog has provided implementations of DDSketch in Java, Go, and Python. The Java implementation provides multiple versions of DDS. They have also compared DDSketch against the Java implementation of HDR Histogram, the Java implementation of the GKArray version of the GK sketch, as well as the Java implementation of the Moments sketch.
HDR Histogram is the only relative-error sketch in the literature. It has extremely fast insertion times (only requiring low-level binary operations), as the bucket sizes are optimized for insertion speed instead of size, and it is fully mergeable (though very slow). The main downside, the researchers say, is that it can only handle a bounded (though very large) range that might not be suitable for certain data sets. It also has no published guarantees, though the researchers agree that much of the analysis presented for DDSketch can be made to apply to a version of HDR Histogram that more closely resembles DDSketch with a slightly worse guarantee.
The Moments sketch takes an entirely different approach by estimating the moments of the underlying distribution. It has notably fast merging times and is fully mergeable. The guaranteed accuracy, however, is only for the average rank error, unlike other sketches which have guarantees for the worst-case error (whether rank or relative)
Compared to GK, the relative accuracy of DDSketch is comparable for dense data sets, while for heavy-tailed data sets the improvement in accuracy can be measured in orders of magnitude. The rank error is also comparable to if not better than that of GK. Additionally, it is much faster in both insertion and merge.
Note: All images are taken from the research paper.
For more technical coverage, please read the research paper.
In other related news, late August, Datadog announced that it has filed a registration statement on Form S-1 with the U.S. Securities and Exchange Commission relating to a proposed initial public offering of its Class A common stock. The firm listed a $100 million raise in its prospectus, a provisional number that will change when the company sets a price range for its equity.