Microsoft recently announced the stirring combination of Apache Spark Analytics platform and the Azure cloud at the Microsoft Connect();.
Presenting, Azure Databricks!
Azure Databricks is a close collaboration between Microsoft and Databricks to bring about benefits not present in any other cloud platforms.
Azure Databricks: The trinity effect
This is the very first time that an Apache Spark platform provider ’Databricks’ has partnered with a cloud provider ‘Microsoft Azure’ to bring about a highly optimized platform for data analytics workloads.
Data management on the cloud has opened up pathways in the field of Artificial Intelligence, predictive analytics, and for real-time applications. Apache Spark has been everyone’s favorite platform to implement these cutting-edge analytics applications, due to its vast community and a worldwide enterprise network. Its ability to run powerful analytics algorithms at scale, allows businesses to derive real-time insights with ease. However, the management and deployment of Spark within the enterprise use cases, which includes a large number of users and has strong security requirements, was a bit challenging. Azure Databricks comes as a solution to this, by providing business users with a platform to work effectively with the data professionals –data scientists and data engineers.
Benefits of Azure Databricks: A sneak-peek
- Highly optimized for a cost-efficient and improved performance in the cloud, with an added end-to-end, managed Apache Spark platform.
- Includes features such as one-click deployment, autoscaling, and an optimized Databricks Runtime that can improve the performance of Spark jobs in the cloud by 10-100x.
- A simple and cost-efficient implementation of large-scale Spark workloads.
- Includes an interactive notebook environment, along with a few monitoring tools, and security controls that make it easy to leverage Spark in enterprises with a huge number of users.
- Optimized connectors to Azure storage platforms (e.g. Data Lake and Blob Storage) for fast data access.
- A one-click management directly from the Azure console.
- It even includes common analytics libraries, such as the Python and R data science stacks, pre-installed to use them with Spark in order to derive insights.
The partnership Architecture:
Azure Databricks has an architecture which allows customers to effectively and easily connect Azure Databricks to any of the storage resource present in their account. For instance, an existing subscription of Blob store or Data Lake.
Also, the Databricks is centrally managed through the Azure control center. Hence, it requires no additional setup.
Fully integrated Azure features
Azure Databricks has been appended to the best of Microsoft Azure features. Some of them are listed below:
- A secure and private data control where the ownership rights are with the customer alone
- Diversity in the network infrastructure needs
- Integration of the Azure Storage and Azure Data Lake
- An Azure Active Directory, which provides control of access to resources used
- An integration of the Azure SQL Data Warehouse, Azure SQL DB, and Azure CosmosDB
- The latest generation of Azure hardware (Dv3 VMs), with NvMe SSDs with 100us latency on IO, making Databricks I/O performance much better
Azure has many other features that have been integrated into the Azure Databricks. For a more detailed overview of Azure Databricks you can visit the official link here.