Deep Understanding of SparkContext & Application’s Driver Process

DRIVER :
A Spark driver is a JVM process that hosts SparkContext for a Spark application. It is the master node in a Spark application.
The driver (an application’s driver process) splits a Spark application into tasks.
It also schedules them to run on executors. It’s driver responsibility to coordinated with workers and also manage the execution of task.

Driver’s Memory :
In client deploy mode the driver’s memory is the memory of the JVM process the Spark application runs on. The driver memory can be set using spark-submit’s –driver-memory command-line option or by setting the spark.driver.memory configuration.

 

SparkContext :
Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.
Once a SparkContext is created you can use it to create RDDs, accumulators and broadcast variables, access Spark services and run jobs (until SparkContext is stopped).
A Spark context is essentially a client of Spark’s execution environment and acts as the master of your Spark application.

Function of Spark Context :

  1. Running jobs synchronously
  2. Creating Distributed RDD, Accumulators, Broadcast variables
  3. Configuration Setup
  4. Access of different services
  5. Get the current Status of application
  6. Cancelling a job
  7. Cancelling a stage
  8. Programmable Dynamic Allocation

For detail spark architecture please read my previous article Spark Architecture.

Apache Spark Architecture

3 thoughts on “Deep Understanding of SparkContext & Application’s Driver Process”

  1. Pingback: Lazy Evaluation in Apache Spark and its Advantage - Mycloudplace

Leave a Comment

Your email address will not be published. Required fields are marked *