5 min read

This article is an excerpt from a book written by Muhammad Asif Abbasi titled Learning Apache Spark 2. In this book, you will learn how to perform big data analytics using Spark streaming, machine learning techniques and more.

From the article given below, you will learn how to operate Spark in Mesos cluster manager.

What is Mesos?

Mesos is an open source cluster manager started as a UC Berkley research project in 2008 and quite widely used by a number of organizations. Spark supports Mesos, and Matei Zahria has given a keynote at Mesos Con in June of 2016. Here is a link to the YouTube video of the keynote.

Before you start

If you haven’t installed Mesos previously, the getting started page on the Apache website gives a good walk through of installing Mesos on Windows, MacOS, and Linux. Follow the URL https://mesos.apache.org/getting-started/.

  1. Once installed you need to start-up Mesos on your cluster
  2. Starting Mesos Master: ./bin/mesos-master.sh -ip=[MasterIP] -workdir=/var/lib/mesos
  3. Start Mesos Agents on all your worker nodes: ./bin/mesos-agent.sh – master=[MasterIp]:5050 -work-dir=/var/lib/mesos
  4. Make sure Mesos is up and running with all your relevant worker nodes configured: http://[MasterIP]@5050

Make sure that Spark binary packages are available and accessible by Mesos. They can be placed on a Hadoop-accessible URI for example:

  1. HTTP via http://
  2. S3 via s3n://
  3. HDFS via hdfs://

You can also install spark in the same location on all the Mesos slaves, and configure spark.mesos.executor.home to point to that location.

Running in Mesos

Mesos can have single or multiple masters, which means the Master URL differs when submitting application from Spark via mesos:

  1. Single Master
  2. Mesos://sparkmaster:5050
  3. Multiple Masters (Using Zookeeper)
  4. Mesos://zk://master1:2181, master2:2181/mesos

Modes of operation in Mesos

Mesos supports both the Client and Cluster modes of operation:

Client mode

Before running the client mode, you need to perform couple of configurations:

  1. Spark-env.sh
  2. Export MESOS_NATIVE_JAVA_LIBRARY=<Path to libmesos.so [Linux]> or <Path to libmesos.dylib[MacOS]>
  3. Export SPARK_EXECUTOR_URI=<URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3>
  4. Set spark.executor.uri to URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3

Batch Applications

For batch applications, in your application program you need to pass on the Mesos URL as the master when creating your Spark context. As an example:

val sparkConf = new SparkConf()

               .setMaster(“mesos://mesosmaster:5050”)

               .setAppName(“Batch Application”)

               .set(“spark.executor.uri”, “Location to Spark binaries

               (Http, S3, or HDFS)”)

val sc = new SparkContext(sparkConf)

If you are using Spark-submit, you can configure the URI in the conf/sparkdefaults.conf file using spark.executor.uri.

Interactive applications

When you are running one of the provided spark shells for interactive querying, you can pass the master argument e.g:

./bin/spark-shell -master mesos://mesosmaster:5050

Cluster mode

Just as in YARN, you run spark on mesos in a cluster mode, which means the driver is launched inside the cluster and the client can disconnect after submitting the application, and get results from the Mesos WebUI.

Steps to use the cluster mode

  1. Start the MesosClusterDispatcher in your cluster: ./sbin/start-mesos-dispatcher.sh -master mesos://mesosmaster:5050. This will generally start the dispatcher at port 7077.
  2. From the client, submit a job to the mesos cluster by Spark-submit specifying the dispatcher URL.

Example:

       ./bin/spark-submit

       –class org.apache.spark.examples.SparkPi

       –master mesos://dispatcher:7077

       –deploy-mode cluster

       –supervise

       –executor-memory 2G

       –total-executor-cores 10

       s3n://path/to/examples.jar

Similar to Spark Mesos has lots of properties that can be set to optimize the processing. You should refer to the Spark Configuration page (http://spark.apache.org/docs/latest/configuration.html) for more Information.

Mesos run modes

Spark can run on Mesos in two modes:

  1. Coarse Grained (default-mode): Spark will acquire a long running Mesos task on each machine. This offers a much cost of statup, but the resources will continue to be allocated to spark for the complete duration of the application.
  2. Fine Grained (deprecated): The fine grained mode is deprecated as in this case each mesos task is created per Spark task. The benefit of this is each application receives cores as per its requirements, but the initial bootstrapping might act as a deterrent for interactive applications.

Key Spark on Mesos configuration properties

While Spark has a number of properties that can be configured to optimize Spark processing, some of these properties are specific to Mesos. We’ll look at few of those key properties here.

Property Name Meaning/Default Value
spark.mesos.coarse Setting it to true (default value), will run
Mesos in coarse grained mode. Setting it to
false will run it in fine-grained mode.
spark.mesos.extra.cores This is more of an advertisement rather than
allocation in order to improve parallelism. An
executor will pretend that it has extra cores
resulting in the driver sending it more work.
Default=0
spark.mesos.mesosExecutor.cores Only works in fine grained mode. This
specifies how many cores should be given to
each Mesos executor.
spark.mesos.executor.home Identifies the directory of Spark installation
for the executors in Mesos. As discussed, you can specify this using
spark.executor.uri as well, however if
you have not specified it, you can specify it
using this property.
spark.mesos.executor.memoryOverhead The amount of memory (in MBs) to be
allocated per executor.
spark.mesos.uris A comma separated list of URIs to be
downloaded when the driver or executor is
launched by Mesos.
spark.mesos.prinicipal The name of the principal used by Spark to
authenticate itself with Mesos.

 

You can find other configuration properties at the Spark documentation page (http://spark.apache.org/docs/latest/running-on-mesos.html#spark-properties).

To summarize, we covered the objective to get you started with running Spark on Mesos.

To know more about Spark SQL, Spark Streaming, Machine Learning with Spark, you can refer to the book Learning Apache Spark 2.

Learning Apache Spark 2

Data Science fanatic. Cricket fan. Series Binge watcher. You can find me hooked to my PC updating myself constantly if I am not cracking lame jokes with my team.

LEAVE A REPLY

Please enter your comment!
Please enter your name here