Skip to content Skip to site navigation Skip to service navigation

Introduction to Big Data with Hadoop and Spark

Class Sessions

Date Location Cost
  • Fri Aug 14, 9:00 am to 12:00 pm
Live Online $175

Class Code

ITS-1916

Class Description



 

Effective immediately in response to COVID-19, all Technology Training classes will be delivered online until further notice.

In advance of each session, Tech Training will provide you with a Zoom link to your class, along with any required class materials.

 


 

 

This lecture provides a non-intimidating introduction to Big Data Hadoop and Spark. We will get behind the scenes to understand the secret sauce of the success of Hadoop and other Big data technologies.

In this lecture, you will get an introduction to working with Big Data Ecosystem technologies (HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie and No-SQL like HBase) for Big Data scenarios. It would provide an understanding of Big data ecosystem before and after Apache Spark. Finally, we will perform a demo on big data analysis using Apache Spark.

 

After this course, you will be able to:

  • Understand the History and background of Big data and Hadoop
  • Describe the Big Data landscape including examples of real-world big data problems
  • Explain the 5 V’s of Big Data (volume, velocity, variety, veracity, and value)
  • Understand the foundational principles that have made Big Data so successful.
  • Provide an explanation of the ecosystem components like HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie, and No-SQL like HBase.
  • Understand the various offerings like Cloudera, Hortonworks, MapR, Amazon EMR, and Microsoft Azure HDInsight in the industry around Big data on the cloud and on Premise.
  • Understand the impact and value of Apache Spark in the Big Data Ecosystem.


 

Topic Outline:

  • Course Introduction
  • History and background of Big Data and Hadoop
  • 5 V’s of Big Data
  • Secret Sauce of Big Data Hadoop
  • Big Data Distributions in Industry
  • Big Data Ecosystem before Apache Spark
  • Big Data Ecosystem after Apache Spark
  • Comparison of MapReduce Vs Apache Spark
  • Big Data Ecosystem after Apache Spark
  • Understand Apache Architecture and Libraries like Streaming, Machine & Deep Learning, GraphX etc.
  • Demo 1 - Data Analysis using Apache Spark Databricks Cloud.
  • References and Next steps

 

Structured Activity/Exercises/Case Studies:

  • Demo 1 - Data Analysis using Apache Spark Databricks Cloud.


 

 


University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.