Applied Data Science using Python Libraries like Pandas, Matplotlib and Scikit-Learn (3-day class)

This course provides a thorough understanding of each of the key Python libraries used for data science -- NumPy, Pandas, Matplotlib and Scikit-learn, known as the Python data stack. We will perform data exploration, analysis, visualization and modeling.

Pre-requisite: Basic Python Programming

We will begin by discussing the data science process and how to effectively work through a data science problem. We'll talk about how to clean, transform and prepare data for analysis. We will also cover descriptive and inferential statistics which will enable you to perform hypothesis testing so that you can better interpret the significance of your analysis. We will also focus on machine learning and predictive analytics. We'll discuss the various ways to measure model performance, how to select the best model for your project and ways to refine that model.

Learning Objectives:

During this course, you will have the opportunity to:

Install Anaconda on a personal computer
Have a clear understanding of data science and its role
Understand the data science process
Understand foundational descriptive statistics
Understand foundational inferential statistics
Understand the reasons for Python's popularity in data science
Learn the primary libraries for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn
Interact with and manipulate data arrays and matrices using NumPy
Perform exploratory data analysis using Pandas
Use Matplotlib and Seaborn to perform data visualization
Properly clean and prepare data for machine learning
Apply machine learning on a variety of datasets
Complete a data science project, end to end
Understand the big picture and the importance of data science in industry, research and technology

Topic Outline:

Day 1

Course introduction
Install Anaconda
Overview of Data Science
The data science process
Identifying a problem and asking good questions
Descriptive statistics
Milestone 1: Learn how to use Jupyter Notebooks
Essential libraries
Numpy
Pandas
Matplotlib
Milestone 2: Exploratory data analysis

Day 2

Getting data
Feature selection
Strategies for imputing missing data
- Inferential statistics
Essential libraries
Statsmodels
Scikit-learn
Confidence intervals
Hypothesis testing
Milestone 3: Significance testing
Transforming data
Binary encoding
One-hot encoding
Feature Engineering
Training and test sets
Standardizing data
Milestone 4: Data modeling

Day 3

Machine learning
K-fold cross validation
Box plot
Measuring performance
Milestone 5: Model selection
Refining the model
Hyperparameter tuning
Grid search
Milestone 6: End-to-end project
Next steps

Training material provided: Yes (Digital format)

University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.

Applied Data Science using Python Libraries like Pandas, Matplotlib and Scikit-Learn (3-day class)

Custom training workshops are available for this program

Special Group Rates