Qubole has announced the availability of a working implementation of Apache Spark on AWS Lambda. The big data-as-a-service company said the prototype has been able to show a successful scan of 1 TB of data and sort 100 GB of data from AWS Simple Storage Service (S3).
Qubole said the ability to run Spark on Lambda, a serverless compute service that allows users to only pay for the compute power they use without needing to provision servers, makes the platform more elastic and efficient with its resource usage.
Earlier, it was a challenge to run Spark on AWS Lambda. Mainly due to Spark’s inability to communicate directly with Lambda (something it needs to do in order to be able to run its executors). Also, Lambda’s limited runtime resources (limited to a maximum execution duration of five minutes, 1,536 MB memory and 512 MB disk space) makes it extremely difficult for a memory-hungry platform like Spark to run.
The Spark on Lambda service overcomes both these limitations. Qubole said it performed some technical wizardry to ensure the service runs its executors from within an AWS Lambda invocation, thereby sidestepping the communication issues. And then, Lambda’s limited runtime resources issue was dealt with by using external storage to avoid local disk size limits.
Spark on Lambda’s elasticity works perfectly for a number of use cases, including:
“Qubole customers run some of the largest Spark clusters in the world. We wanted to show that a complex technology like Spark can be implemented on a serverless compute infrastructure like Lambda and scale efficiently,” Qubole CEO Ashish Thusoo said. “Spark on Lambda can eliminate most of the operational complexities of running Spark clusters, handle bursty workloads more effectively and be more cost efficient.”
Qubole said Spark on Lambda is currently available as a technology preview and the company will demonstrate its capabilities during the AWS Re:Invent 2017 conference in Las Vegas at Sands Expo booth 834 and Aria booth 201. The code is available on Github at https://github.com/qubole/spark-on-lambda.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…