In advance of each session, Tech Training will provide you with a Zoom link to your class, along with any required class materials.
This half-day course will introduce you to open-source Big Data technologies, including Apache Spark, and shed light on how enterprise companies often utilize and tame large data sets to drive and problem-solving and decision-making efforts.
Abstract
The training session is intended for software engineers and software architects. It provides a practical learning experience through a combination of about 70% lecture and 30% hands-on demo work with learner participation. The session will include examples of how companies have solved Big Data problems, and the application of Big Data within the industry.
Learning Objectives:
- During this course, you will have the opportunity to learn how to:
- Understand big data ecosystems and data distributions in the industry
- Consider the different libraries associated with Apache Spark
- Use Apache Spark and work with data structures
Topics include:
- History and background of Big Data
- Understanding the Big Data Ecosystems
- Industry uses for Big Data Distributions
- Why use Apache Spark?
- Comparing MapReduce vs Apache Spark
- Apache Spark Architecture
- Understanding libraries associated with Spark -- Streaming, Machine Learning
- Using Spark (Cloud or On Premises)
- Working with Spark data structures used for handling data
University IT Technology Training classes are only available to Stanford University staff, faculty, students and Stanford Hospitals & Clinics employees. A valid SUNet ID is needed in order to enroll in a class.