TECHIE FESTIVAL | Introduction to Big Data with Hadoop and Spark (Lecture)

Mon, 07/29/2019 - 13:00 - Mon, 07/29/2019 - 16:00

Polya Hall | RM 111 (Turing Auditorium)

Class Code: ITS-1916

Note: This is a lecture. A laptop is recommended, but not required.

This lecture provides a non-intimidating introduction to Big Data Hadoop and Spark. We will get behind the scenes to understand the secret sauce of the success of Hadoop and other Big data technologies.

In this lecture, you will get an introduction to working with Big Data Ecosystem technologies (HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie and No-SQL like HBase) for Big Data scenarios. It would provide the understanding of Big data ecosystem before and after Apache Spark. Finally, we will perform a demo on big data analysis using Apache Spark.

After this course, you will be able to:

Understand the History and background of Big data and Hadoop
Describe the Big Data landscape including examples of real-world big data problems
Explain the 5 V's of Big Data (volume, velocity, variety, veracity, and value)
Understand the foundational principles that have made Big Data so successful
Provide an explanation of the ecosystem components like HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie and No-SQL like HBase
Understand the various offerings like Cloudera, Hortonworks, MapR, Amazon EMR and Microsoft Azure HDInsight in the industry around Big data on cloud and on Premise
Understand the impact and value of Apache Spark in the Big Data Ecosystem

Topic Outline:

Course Introduction
History and background of Big Data and Hadoop
5 V's of Big Data
Secret Sauce of Big Data Hadoop
Big Data Distributions in Industry
Big Data Ecosystem before Apache Spark
Big Data Ecosystem after Apache Spark
Comparison of MapReduce Vs Apache Spark
Big Data Ecosystem after Apache Spark
Understand Apache Architecture and Libraries like Streaming, Machine & Deep Learning, GraphX etc.
Demo 1 - Data Analysis using Apache Spark Databricks Cloud
References and Next steps

Structured Activity/Exercises/Case Studies:

Demo 1 - Data Analysis using Apache Spark Databricks Cloud

University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.

Payment Methods: STAP Funds, Departmental Account, or Credit Card.

Admission Info

Fee: $175

REGISTER TODAY!

Contact Email

techtraining@stanford.edu

Contact Phone

650-723-4391

More Information

https://uit.stanford.edu/service/techtraining/class/techie-festival-introductio…