
Effective immediately in response to COVID-19, all Technology Training classes will be delivered online until further notice.
In advance of each session, Tech Training will provide you with a Zoom link to your class, along with any required class materials.
Prerequisite: Learners should have an understanding of Basic Python Programming.
When doing data science and data analysis, in order to achieve your purpose, it is important to have clean and well-prepared data to learn from. Indeed, most of the effort required to extract insight from data lies in cleaning your data.
This course provides a comprehensive guide to effectively using Python data cleaning tools and techniques. We will discuss the practical application of tools and techniques needed for data ingestion, imputing missing values, detecting unreliable data and statistical anomalies, along with feature engineering.
Learning Objectives:
- During this course, you will have the opportunity to:
- Think carefully about your data and what you want really want to know
- Learn how to ask the right questions to gain the desired insights from your data
- Detect problems from the shape of your data
- Appropriately clean your data so that you are saying what you mean
- Reasonably and reliably impute missing values
- Prepare data for analytic and machine learning tasks
- Transform your data into numerical values that machines prefer
- Create better features (independent variables) so that the machine can better understand the problem that you want it to help you to solve
Topic Outline:
- Overview of Data and Data Types
i. Numerical Values
ii. Categorical Values
iii. Ordinal Values
- Asking Clear and Precise Questions
- Feature Selection
- Normal Distribution
i. Skew
ii. Outliers
- Data Cleaning
i. Imputing Missing Values
ii. Dropping Rows
- Data Transformation
i. One-Hot Encoding
ii. Ordinal Transformation
iii. Discretization
- Feature Engineering
- Introduction to the Python Programming Language
i. Installing Anaconda
ii. Python Essentials
iii. Introduction to Pandas
Milestone 1: Learning Exercise: Learn how to use Jupyter Notebooks
- Applied Data Cleaning and Preparation
i. Using Pandas to clean data
ii. Using Pandas to prepare and transform data for analysis
Milestone 2: Learning Exercise: Perform data cleaning and preparation for data analysis
University IT Technology Training classes are only available to Stanford University staff, faculty, students and Stanford Hospitals & Clinics employees. A valid SUNet ID is needed in order to enroll in a class.