

Cluster vs Client: Execution modes for a Spark application - Knoldus Blogs
source link: https://blog.knoldus.com/cluster-vs-client-execution-modes-for-a-spark-application/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Cluster vs Client: Execution modes for a Spark application
Reading Time: 3 minutes
Whenever we submit a Spark application to the cluster, the Driver or the Spark App Master should get started. And the Driver will be starting N number of workers. Spark driver will be managing spark context object to share the data and coordinates with the workers and cluster manager across the cluster. Cluster Manager can be Spark Standalone or Hadoop YARN or Mesos. Workers will be assigned a task and it will consolidate and collect the result back to the driver. A spark application gets executed within the cluster in two different modes – one is cluster mode and the second is client mode.
Cluster Mode
In the cluster mode, the Spark driver or spark application master will get started in any of the worker machines. So, the client who is submitting the application can submit the application and the client can go away after initiating the application or can continue with some other work. So, it works with the concept of Fire and Forgets.
The question is: when to use Cluster-Mode? If we submit an application from a machine that is far from the worker machines, for instance, submitting locally from our laptop, then it is common to use cluster mode to minimize network latency between the drivers and the executors. In any case, if the job is going to run for a long period time and we don’t want to wait for the result then we can submit the job using cluster mode so once the job submitted client doesn’t need to be online.
How to submit spark application in cluster mode
First, go to your spark installed directory and start a master and any number of workers on a cluster using commands:
./sbin/start-master.sh
./sbin/start-slave.sh spark://<<hostname/ipaddress>>:portnumber - worker1
./sbin/start-slave.sh spark://<<hostname/ipaddress>>:portnumber - worker2
Then, run command:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://<<hostname/ipaddress>>:portnumber --deploy-mode cluster ./examples/jars/spark-examples_2.11-2.3.1.jar 5(number of partitions)
NOTE: Your class name, Jar File and partition number could be different.
Client Mode
In the client mode, the client who is submitting the spark application will start the driver and it will maintain the spark context. So, till the particular job execution gets over, the management of the task will be done by the driver. Also, the client should be in touch with the cluster. The client will have to be online until that particular job gets completed.
In this mode, the client can keep getting the information in terms of what is the status and what are the changes happening on a particular job. So, in case if we want to keep monitoring the status of that particular job, we can submit the job in client mode. In this mode, the entire application is dependent on the Local machine since the Driver resides in here. In case of any issue in the local machine, the driver will go off. Subsequently, the entire application will go off. Hence this mode is not suitable for Production use cases. However, it is good for debugging or testing since we can throw the outputs on the driver terminal which is a Local machine.
How to submit spark application in client mode
First, go to your spark installed directory and start a master and any number of workers on a cluster. Commands are mentioned above in Cluster mode. Then run the following command:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://<<hostname/ipaddress>>:portnumber --deploy-mode client ./examples/jars/spark-examples_2.11-2.3.1.jar 5(number of partitions)
Meanwhile, it requires only change in deploy-mode which is the client in Client mode and cluster in Cluster mode.
Conclusion
Spark application can be submitted in two different ways – cluster mode and client mode. In cluster mode, the driver will get started within the cluster in any of the worker machines. So, the client can fire the job and forget it. In client mode, the driver will get started within the client. So, the client has to be online and in touch with the cluster. So, if the client machine is “far” from the worker nodes then it makes sense to use cluster mode. If our application is in a gateway machine quite “close” to the worker nodes, the client mode could be a good choice.
Recommend
-
6
Reading Time: 4 minutes As you all know Rust Programming has Futures which helps to make our code Asynchronous. Rust's Futures are analogous to Promises.This article mainly pertains to the in...
-
9
Deploy a Spark Application on Cluster Reading Time: 2 minutesIn one of our previous blog, Setup a Apa...
-
14
KnolX – Introduction to Streaming in Apache Spark Reading Time: 4 minutesBased on Apache Spark 1.6.0...
-
11
RDD: Spark’s Fault Tolerant In-Memory weapon Reading Time: 5 minutesA fault-tolerant collection of elements that can be op...
-
6
Scala, Couchbase, Spark and Akka-http: A combinatory tutorial for starters Reading Time: 5 minutes
-
9
Morpheus – Cypher for Spark Reading Time: 4 minutes The Apache Spark ecosystem is experiencing a boom wit...
-
4
Logging Spark Application on standalone cluster Reading Time: < 1 minuteLogging of the application is much important to...
-
6
Spark – LDA : A Complete example of clustering algorithm for topic discovery. Reading Time: 6 minutesIn this blog we will...
-
8
Reading Time: 3 minutesWelcome back , folks ! Time for some new gig ! I think that last series i.e. Scala – IOT was pretty amazing , which got an ov...
-
2
Execution Modes: The Heart of TensorFlow's Behavior TensorFlow 2.x supports two primary execution modes: eager execution and grap...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK