On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud.
That Was Then
In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types:
- CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data
- CDP Machine Learning: a kubernetes-based service that allows data scientists to deploy collaborative workspaces with secure, self-service access to enterprise data.
- CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data.
At the heart of CDP is SDX, a unified context layer for governance and security, that makes it easy to create a secure data lake and run workloads that address all stages of your data lifecycle (collect, enrich, report, serve and predict).
This is Now
With CDP-PC just a bit over a year old, we thought now would be a good time to reflect how far we have come since then. Over the past year, we’ve not only added Azure as a supported cloud platform, but we have improved the original services while growing the CDP-PC family significantly:
- Data Warehouse – in addition to a number of performance optimizations, DW has added a number of new features for better scalability, monitoring and reliability to enable self-service access with security and performance
- Machine Learning – has grown from a collaborative workbench to an end-to-end Production ML platform that enables data scientists to deploy a model or an application to production in minutes with production-level monitoring, governance and performance tracking.
- Data Hub – has expanded to support all stages of the data lifecycle:
- Collect – Flow Management (Apache NiFi), Streams Management (Apache Kafka) and Streaming Analytics (Apache Flink)
- Enrich – Data Engineering (Apache Spark and Apache Hive)
- Report – Data Engineering (Hive3), Data Mart (Apache Impala) and Real-Time Data Mart (Apache Impala with Apache Kudu)
- Serve – Operational Database (Apache HBASE), Data Exploration (Apache Solr)
- Predict – Data Engineering (Apache Spark)
- CDP Data Engineering (1) – a service purpose-built for data engineers focused on deploying and orchestrating data transformation using Spark at scale. Behind the scenes, CDE leverages kubernetes to provide isolation and autoscaling as well as providing a comprehensive toolset to streamline ETL processes – including orchestration automation, pipeline monitoring and visual troubleshooting
- CDP Operational Database (2) – an autonomous, multimodal, autoscaling database environment supporting both NoSQL and SQL. Under the covers, Operational Database leverages HBASE and allows end users to create databases without having to worry about infrastructure requirements
- Data Visualization (3) – an insight and visualization tool, pre-integrated with Data Warehouse and Machine Learning, that simplifies sharing analytics and information among data teams
- Replication Manager – makes it easy to copy or migrate unstructured (HDFS) or structured (Hive) data from on-premise clusters to CDP environments running in the Public Cloud
- Workload Manager – provides in-depth insights into workloads that can be used for troubleshooting failed jobs and optimizing slow workloads
- Data Catalog – enables data stewards to organize and curate data assets globally, understand where relevant data is located, and audit how it is created, modified, secured and protected
Each of the above is integrated with SDX, ensuring a consistent mechanism for authentication, authorization, governance and management of data, regardless of where you access your data from and how you consume it.
Behind these new features is a support cast of many issues resolved, tweaks made and improvements added by a cast of hundreds of people to improve performance, scalability, reliability, usability and security of CDP Public Cloud.
And We Are Not Done
And that was just the first 12 months. Our roadmap includes a number of exciting new features and enhancements to build on our vision of helping you:
- Do Cloud Better: Deliver cloud-native analytics to the business in a secure, cost-efficient, and scalable manner.
- Enable Cloud Everywhere: Accelerate adoption of cloud-native data services for public clouds
- Optimize the Data Lifecycle: Collect, enrich, report, serve, and model enterprise data for any business use case in any cloud.
Learn More, Keep in Touch
Keep up with what’s new in CDP-PC by following our monthly release summaries.
(1) Currently available on AWS only
(2) Technical Preview on AWS and Azure
(3) Data Visualization is in Tech Preview on AWS and Azure