Fundamentals of Big Data (Live Online)

This live online class will help you get started with the background and introduction of the history of Big Data. Get an introduction to working with Big Data Ecosystem technologies, which include HDFS, MadReduce, Hive, Pig, Machine Learning, and more.

Pre-requisite: Basic Programming knowledge, SQL and Data knowledge preferred

Course Description:

The Introduction to Big Data course is the first stop in the Big Data curriculum series coming up at Stanford. It will help you get started with the background and introduction of the history of Big Data.

Along the way, you will get an introduction to working with Big Data Ecosystem technologies (HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie and No-SQL like HBase) for Big Data scenarios.

This course will provide an understanding of Big data ecosystem before and after Apache Spark. Finally, you will be learning the Spark core fundamentals and architecture. You will setup account on Apache Spark Databricks Cloud and perform an exercise on big data analysis using Apache Spark.

Learning Objectives:

In this course, you will have the opportunity to:

Understand the History and background of Big data and Hadoop
Describe the Big Data landscape including examples of real-world big data problems
Explain the 5 V's of Big Data (volume, velocity, variety, veracity, and value)
Understand the foundational principles that have made Big Data so successful
Provide an explanation of the ecosystem components like HDFS, MapReduce, Sqoop, Flume, - Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie and No-SQL like HBase
Understand the various offerings like Cloudera, Hortonworks, MapR, Amazon EMR and Microsoft Azure HDInsight in the industry around Big data on cloud and on Premise
Understand the impact and value of Apache Spark in the Big Data Ecosystem
Understand the Apache Spark Architecture and the various libraries to perform various use cases like Streaming, Machine & Deep Learning, GraphX, etc.
Setup Account on Apache Spark Databricks Cloud.
Perform hands-on activity on Big Data Ecosystem

Topic Outline:

Course Introduction
History and background of Big Data and Hadoop
5 V's of Big Data
Secret Sauce of Big Data Hadoop
Big Data Distributions in Industry
Big Data Ecosystem before Apache Spark
Big Data Ecosystem after Apache Spark
Comparison of MapReduce Vs Apache Spark
Big Data Ecosystem after Apache Spark
Understand Apache Architecture and Libraries like Streaming, Machine & Deep Learning, GraphX etc.
Hands-on exercise 1: Setup Account on Apache Spark Databricks Cloud.
Hands-on exercise 2: First Spark Program
Hands-on exercise 3: Spark RDD Transformation & Actions
Hands-on exercise 4: Spark RDD Advanced Transformation & Actions
References and Next steps

Note: After registering, You will receive a follow-up email with information on how to join this session.

University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.

Fundamentals of Big Data (Live Online)

Custom training workshops are available for this program

For Stanford Affiliates: