Understanding the SparkSession in Spark 2.0

by Brian Uri!, 2016-09-24

Synopsis

This recipe introduces the new SparkSession class from Spark 2.0, which provides a unified entry point for all of the various Context classes previously found in Spark 1.x. There is no hands-on work involved.

Prerequisites

There are no prerequisites for this recipe.

Target Versions

The SparkSession class was introduced in Spark 2.0.0.

⇖ Introducing SparkSession

The SparkSession class is a new feature of Spark 2.0 which streamlines the number of configuration and helper classes you need to instantiate before writing Spark applications. SparkSession provides a single entry point to perform many operations that were previously scattered across multiple classes, and also provides accessor methods to these older classes for maximum compatibility.

In interactive environments, such as the Spark Shell or interactive notebooks, a SparkSession is already be created for you in a variable named spark. For consistency, you should use this name when you create one in your own application. You can create a new SparkSession through a Builder pattern which uses a "fluent interface" style of coding to build a new object by chaining methods together. Spark properties can be passed in, as shown in these examples:

Java
Python
R
Scala
💾Default language saved.

The SparkR library doesn't use the "fluent interface" style. Simply pass parameters into the function.

At the end of your application, calling stop() on the SparkSession implicitly stops any nested Context classes.

Java
Python
R
Scala
💾Default language saved.

In R, you call stop() on the sparkR object, rather than the session.

⇖ Updating Spark 1.x Applications

The developers of Spark 2.0 maintained backwards compatibility with Spark 1.x when they introduced SparkSession, so all of your existing code should still work in Spark 2.0. When you are ready to modernize your code, you should understand the relationships between the older classes and SparkSession.

SparkConf

Previously, this class was required to initialize configuration properties used by the SparkContext, as well as set runtime properties while an application was running. Now, all initialization occurs through the SparkSession builder class. You still use this class (via the conf accessor) to set runtime properties, but do not need to manually create it.

Java
Python
R
Scala
💾Default language saved.

In R, you change properties by reinitializing the entire session.

SparkContext and JavaSparkContext

You will continue to use these classes (via the sparkContext accessor) to perform operations that require the Spark Core API, such as working with accumulators, broadcast variables, or low-level RDDs. However, you do not need to manually create them.

Java
Python
R
Scala
💾Default language saved.

The low-level Spark Core API is not exposed in SparkR.

SQLContext

The SQLContext is completely superceded by SparkSession. Most Dataset and DataFrame operations are directly available in SparkSession. Operations related to table and database metadata are now encapsulated in a Catalog (via the catalog accessor).

Java
Python
R
Scala
💾Default language saved.

SparkR does not use a Catalog.

HiveContext

The HiveContext is completely superceded by SparkSession. You need enable Hive support when you create your SparkSession and include the necessary Hive library dependencies in your classpath.

Java
Python
R
Scala
💾Default language saved.

SparkR does not use a Catalog.

Reference Links

SparkSession in the Java API Documentation
SparkSession in the Python API Documentation
SparkSession in the R API Documentation
SparkSession in the Scala API Documentation

Change Log

This recipe hasn't had any substantive updates since it was first published.

Spot any inconsistencies or errors? See things that could be explained better or code that could be written more idiomatically? If so, please help me improve Sparkour by opening a ticket on the Issues page. You can also discuss this recipe with others in the Sparkour community on the Discussion page.

back to Recipes

Apache, Spark, and Apache Spark are trademarks of the Apache Software Foundation (ASF).
Sparkour is © 2016 - 2024 by It is an independent project that is not endorsed or supported by Accenture Federal Services or the ASF.

visitors since February 2016