Skip to content Skip to site navigation Skip to service navigation

Data Science 3-Day Bootcamp - Earn Your Data Science Proficiency Certification

Class Code

ITS-2569

Class Description

Effective immediately in response to COVID-19, all Technology Training classes through the Spring Quarter will be held online.

In advance of each session, Tech Training will provide you with a Zoom link to your class, along with any required class materials.
 


 

This course provides a thorough understanding of each of the key Python libraries used for data science -- NumPy, Pandas, Matplotlib and Scikit-learn, known as the Python data stack. We will perform data exploration, analysis, visualization and modeling.
 

In three days of hands-on training, you can quickly become a knowledgeable, productive, and efficient Data Science professional and earn a Stanford Technology Training Certificate of Proficiency in Data Science.


Pre-requisite: Basic Python Programming



We will begin by discussing the data science process and how to effectively work through a data science problem. We'll talk about how to clean, transform and prepare data for analysis. We will also cover descriptive and inferential statistics which will enable you to perform hypothesis testing so that you can better interpret the significance of your analysis. We will also focus on machine learning and predictive analytics. We'll discuss the various ways to measure model performance, how to select the best model for your project and ways to refine that model.

 

Learning Objectives:


During this course, you will have the opportunity to:

  • Install Anaconda on a personal computer
  • Have a clear understanding of data science and its role
  • Understand the data science process
  • Understand foundational descriptive statistics
  • Understand foundational inferential statistics
  • Understand the reasons for Python's popularity in data science
  • Learn the primary libraries for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn
  • Interact with and manipulate data arrays and matrices using NumPy
  • Perform exploratory data analysis using Pandas
  • Use Matplotlib and Seaborn to perform data visualization
  • Properly clean and prepare data for machine learning
  • Apply machine learning on a variety of datasets
  • Complete a data science project, end to end
  • Understand the big picture and the importance of data science in industry, research and technology

 

Topic Outline:

 

Day 1

  • Course introduction
  • Install Anaconda
  • Overview of Data Science
  • The data science process
  • Identifying a problem and asking good questions
  • Descriptive statistics
  • Milestone 1: Learn how to use Jupyter Notebooks
  • Essential libraries
  • Numpy
  • Pandas
  • Matplotlib
  • Milestone 2: Exploratory data analysis

 

Day 2

  • Getting data
  • Feature selection
  • Strategies for imputing missing data
  • Inferential statistics
  • Essential libraries
  • Statsmodels
  • Scikit-learn
  • Confidence intervals
  • Hypothesis testing
  • Milestone 3: Significance testing
  • Transforming data
  • Binary encoding
  • One-hot encoding
  • Feature Engineering
  • Training and test sets
  • Standardizing data
  • Milestone 4: Data modeling

 

Day 3

  • Machine learning
  • K-fold cross-validation
  • Box plot
  • Measuring performance
  • Milestone 5: Model selection
  • Refining the model
  • Hyperparameter tuning
  • Grid search
  • Milestone 6: End-to-end project
  • Next steps
     



University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.