Apache Spark

Announcing Databricks Runtime 4.2!

Databricks announces Databricks Runtime 4.2 with numerous updates and added components on Spark internals, Databricks Delta and improvisions to its…

6 years ago

Project Hydrogen: Making Apache Spark play nice with other distributed machine learning frameworks

Apache Spark team has revealed a new venture during a keynote at Spark AI Summit called Project Hydrogen. This new…

6 years ago

Why is Hadoop dying?

Hadoop has been the definitive big data platform for some time. The name has practically been synonymous with the field.…

6 years ago

Data Science News Daily Roundup – 21st March 2018

Microsoft SQL Server Management Studio 17.6, IBM’s Deep Learning as a Service program, Intel’s nGraph, and more in today’s top…

6 years ago

Apache Ignite 2.4 rolls out with Machine Learning and Spark DataFrames capabilities

The Apache Ignite community has announced the latest version of Apache Ignite, its open-source distributed database. Apache Ignite 2.4 features…

6 years ago

Pandas on Ray: Make Pandas faster by replacing one line of your code

Pandas on Ray is the latest development in the Ray framework. It is a DataFrame library that wraps Pandas and…

6 years ago

How to win Kaggle competition with Apache SparkML

[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Mastering Apache Spark 2.x - Second Edition…

6 years ago

Working with Spark’s graph processing library, GraphFrames

[box type="note" align="" class="" width=""]This article is an excerpt from a book by Rajanarayanan Thottuvaikkatumana titled, Apache Spark 2 for…

6 years ago

Getting started with Spark 2.0

[box type="note" align="" class="" width=""]This article is an excerpt from a book by Muhammad Asif Abbasi titled Learning Apache Spark…

6 years ago

Basics of Spark SQL and its components

[box type="note" align="" class="" width=""]Below given is an excerpt from the book Learning Spark SQL by Aurobindo Sarkar. Spark SQL…

6 years ago