Home Data Tutorials How to run Spark in Mesos

How to run Spark in Mesos

January 31, 2018 - 12:00 am

7684

5 min read

This article is an excerpt from a book written by Muhammad Asif Abbasi titled Learning Apache Spark 2. In this book, you will learn how to perform big data analytics using Spark streaming, machine learning techniques and more.

From the article given below, you will learn how to operate Spark in Mesos cluster manager.

What is Mesos?

Mesos is an open source cluster manager started as a UC Berkley research project in 2008 and quite widely used by a number of organizations. Spark supports Mesos, and Matei Zahria has given a keynote at Mesos Con in June of 2016. Here is a link to the YouTube video of the keynote.

Before you start

If you haven’t installed Mesos previously, the getting started page on the Apache website gives a good walk through of installing Mesos on Windows, MacOS, and Linux. Follow the URL https://mesos.apache.org/getting-started/.

Once installed you need to start-up Mesos on your cluster
Starting Mesos Master: ./bin/mesos-master.sh -ip=[MasterIP] -workdir=/var/lib/mesos
Start Mesos Agents on all your worker nodes: ./bin/mesos-agent.sh – master=[MasterIp]:5050 -work-dir=/var/lib/mesos
Make sure Mesos is up and running with all your relevant worker nodes configured: http://[MasterIP]@5050

Make sure that Spark binary packages are available and accessible by Mesos. They can be placed on a Hadoop-accessible URI for example:

HTTP via http://
S3 via s3n://
HDFS via hdfs://

You can also install spark in the same location on all the Mesos slaves, and configure spark.mesos.executor.home to point to that location.

Running in Mesos

Mesos can have single or multiple masters, which means the Master URL differs when submitting application from Spark via mesos:

Single Master
Mesos://sparkmaster:5050
Multiple Masters (Using Zookeeper)
Mesos://zk://master1:2181, master2:2181/mesos

Modes of operation in Mesos

Mesos supports both the Client and Cluster modes of operation:

Client mode

Before running the client mode, you need to perform couple of configurations:

Spark-env.sh
Export MESOS_NATIVE_JAVA_LIBRARY=<Path to libmesos.so [Linux]> or <Path to libmesos.dylib[MacOS]>
Export SPARK_EXECUTOR_URI=<URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3>
Set spark.executor.uri to URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3

Batch Applications

For batch applications, in your application program you need to pass on the Mesos URL as the master when creating your Spark context. As an example:

val sparkConf = new SparkConf()

.setMaster(“mesos://mesosmaster:5050”)

.setAppName(“Batch Application”)

.set(“spark.executor.uri”, “Location to Spark binaries

(Http, S3, or HDFS)”)

val sc = new SparkContext(sparkConf)

If you are using Spark-submit, you can configure the URI in the conf/sparkdefaults.conf file using spark.executor.uri.

Interactive applications

When you are running one of the provided spark shells for interactive querying, you can pass the master argument e.g:

./bin/spark-shell -master mesos://mesosmaster:5050

Cluster mode

Just as in YARN, you run spark on mesos in a cluster mode, which means the driver is launched inside the cluster and the client can disconnect after submitting the application, and get results from the Mesos WebUI.

Steps to use the cluster mode

Start the MesosClusterDispatcher in your cluster: ./sbin/start-mesos-dispatcher.sh -master mesos://mesosmaster:5050. This will generally start the dispatcher at port 7077.
From the client, submit a job to the mesos cluster by Spark-submit specifying the dispatcher URL.

Example:

./bin/spark-submit

–class org.apache.spark.examples.SparkPi

–master mesos://dispatcher:7077

–deploy-mode cluster

–supervise

–executor-memory 2G

–total-executor-cores 10

s3n://path/to/examples.jar

Similar to Spark Mesos has lots of properties that can be set to optimize the processing. You should refer to the Spark Configuration page (http://spark.apache.org/docs/latest/configuration.html) for more Information.

Mesos run modes

Spark can run on Mesos in two modes:

Coarse Grained (default-mode): Spark will acquire a long running Mesos task on each machine. This offers a much cost of statup, but the resources will continue to be allocated to spark for the complete duration of the application.
Fine Grained (deprecated): The fine grained mode is deprecated as in this case each mesos task is created per Spark task. The benefit of this is each application receives cores as per its requirements, but the initial bootstrapping might act as a deterrent for interactive applications.

Key Spark on Mesos configuration properties

While Spark has a number of properties that can be configured to optimize Spark processing, some of these properties are specific to Mesos. We’ll look at few of those key properties here.

Property Name	Meaning/Default Value
spark.mesos.coarse	Setting it to true (default value), will run Mesos in coarse grained mode. Setting it to false will run it in fine-grained mode.
spark.mesos.extra.cores	This is more of an advertisement rather than allocation in order to improve parallelism. An executor will pretend that it has extra cores resulting in the driver sending it more work. Default=0
spark.mesos.mesosExecutor.cores	Only works in fine grained mode. This specifies how many cores should be given to each Mesos executor.
spark.mesos.executor.home	Identifies the directory of Spark installation for the executors in Mesos. As discussed, you can specify this using spark.executor.uri as well, however if you have not specified it, you can specify it using this property.
spark.mesos.executor.memoryOverhead	The amount of memory (in MBs) to be allocated per executor.
spark.mesos.uris	A comma separated list of URIs to be downloaded when the driver or executor is launched by Mesos.
spark.mesos.prinicipal	The name of the principal used by Spark to authenticate itself with Mesos.

You can find other configuration properties at the Spark documentation page (http://spark.apache.org/docs/latest/running-on-mesos.html#spark-properties).

To summarize, we covered the objective to get you started with running Spark on Mesos.

To know more about Spark SQL, Spark Streaming, Machine Learning with Spark, you can refer to the book Learning Apache Spark 2.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…