Three years ago, the PipelineDB team published the very first release of PipelineDB, as a fork of PostgreSQL. It received enormous support and feedback from thousands of organizations worldwide, including several Fortune 100 companies. It was highly requested that the fork be released as an extension of PostgreSQL. Yesterday, the team released PipelineDB 1.0.0 as a PostgreSQL extension under the liberal Apache 2.0 license.
What is PipelineDB?
PipelineDB can be used while storing huge amounts of time-series data that needs to be continuously aggregated. It only stores the compact output of these continuous queries as incrementally updated table rows, which can be evaluated with minimal query latency.
It is used for analytics use cases that only require summary data, for instance, for real-time reporting dashboards. PipelineDB will sespeciallybe beneficial in scenarios where queries are known in advance. These queries can be run continuously in order to make the data infrastructure that powers these real time analytics applications simpler, faster, and cheaper as compared to the traditional “store first, query later” data processing model.
How does PipelineDB work?
PipelineDB uses SQL to write time-series events to a stream, which are also structured as tables. A continuous view is then used to perform an aggregation over this stream.
Even if billions of rows are written to the stream, the continuous view ensures that only one physical row per hour is actually persisted within the database.
Once the continuous view reads new incoming events and the distinct count is updated to reflect new information, the raw events will be discarded and not stored in PipelineDB. Which enables it to achieve:
- Enormous levels of raw event throughput on modest hardware footprints
- Extremely low read query latencies
- Traditional dependence between data volumes ingested and data volumes stored is broken
All of this facilitates a high performance for the system which is sustained indefinitely.
PipelineDB also supports another type of continuous queries called ‘continuous transforms’. Continuous transforms are stateless and apply a transformation to a stream. They write out the result to another stream.
Features of PipelineDB
PipelineDB 1.0.0 has brought about some changes to version 0.9.7. The main highlights are as follows.
- Non-standard syntax has been removed.
- Configuration parameters are now qualified by pipelinedb.
- PostgreSQL pg_dump, pg_restore, and pg_upgrade tooling is now used instead of the PipelineDB variants
- Certain functions and aggregates are renamed to be descriptive about what problem they solve for the users .
- “Top-K” now represents Filtered-Space-Saving
- “Distributions” now refer to T-Digests
- “Frequency” now refers to Count-Min-Sketch
- Bloom filters introduced for set membership analysis
- Distributions and percentiles analysis is now possible
What’s more? Continuous queries can be chained together into arbitrarily complex topologies of continuous computation. Each continuous query produces its own output stream of its incremental updates. This can be consumed by another continuous query as any other stream.
The team aims to follow up with the functionality of automated partitioning for continuous views in the upcoming release. You can head over to the PipelineDb blog for more insights on this news.