Introduction to Big Data with Apache Spark

Effective immediately in response to COVID-19, all Technology Training classes will be delivered online until further notice.

In advance of each session, Tech Training will provide you with a Zoom link to your class, along with any required class materials.

This half-day course will introduce you to open-source Big Data technologies, including Apache Spark, and shed light on how enterprise companies often utilize and tame large data sets to drive and problem-solving and decision-making efforts.

Abstract
The training session is intended for software engineers and software architects. It provides a practical learning experience through a combination of about 70% lecture and 30% hands-on demo work with learner participation. The session will include examples of how companies have solved Big Data problems, and the application of Big Data within the industry.

Learning Objectives:

During this course, you will have the opportunity to learn how to:
Understand big data ecosystems and data distributions in the industry
Consider the different libraries associated with Apache Spark
Use Apache Spark and work with data structures

Topics include:

History and background of Big Data
Understanding the Big Data Ecosystems
Industry uses for Big Data Distributions
Why use Apache Spark?
Comparing MapReduce vs Apache Spark
Apache Spark Architecture
Understanding libraries associated with Spark -- Streaming, Machine Learning
Using Spark (Cloud or On Premises)
Working with Spark data structures used for handling data

University IT Technology Training classes are only available to Stanford University staff, faculty, students and Stanford Hospitals & Clinics employees. A valid SUNet ID is needed in order to enroll in a class.

Introduction to Big Data with Apache Spark

Custom training workshops are available for this program

Special Group Rates