Fundamentals of Data Science (Live Online)

This live online course exposes you to real-world applications of data science and why it's become such an integral part of business and academia. We will discuss the data science process and the tools used to analyze data sets.

Prerequisite: Basic Python Programming training, or equivalent experience

In this class, you will have the opportunity to:

Install Anaconda on a personal computer.
Understand the Data Science Field.
Become familiar with Descriptive and Inferential Statistics and statistical analysis.
Learn the primary toolkit for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn.
Learn how to perform exploratory data analysis.
Learn the importance of data cleaning.
Utilize common Machine Learning algorithms such as Linear and Logistic Regression.
Learn how to evaluate models and choose the most effective one.
Understand how to interpret a Confusion Matrix
Understand the uses of the AUC-ROC curve in model evaluation.
Solidify understanding by completing hands-on exercises and milestones.
Create two data science projects.
Understand the big picture and the importance of data science in business, industry, and technology

Topic Outline:

Course Introduction
Installing Anaconda
Overview of Data Science
The Difference Between Business Analytics (BI), Data Analytics and Data Science
The Field of Data Science
The Data Science Process
- Define the Problem
- Get the Data
- Explore the Data
- Clean the Data
- Model the Data
- Communicate the Findings
Descriptive Statistics Fundamentals
Central Tendency
- Mean
- Median
- Mode
Spread of the Data
- Variance
- Standard Deviation
- Range
Relative Standing
- Percentile
- Quartile
- Inter-quartile Range
Inferential Statistics Fundamentals
- Normal Distribution
- Central Limit Theorem
- Standard Error
- Confidence Intervals
- Other Distributions
- Samples
- Hypothesis Testing
Milestone 1: Perform statistical analysis on a given data set.
Essential Python Data Science Libraries
- Numpy
- Pandas
- Matplotlib
- Scikit-learn
- Statsmodels
Data Exploration
- Describe
- Merging
- Grouping
- Evaluating Features
Data Visualization
- Line
- Scatterplot
- Pairplot
- Histogram
- Density Plot
- Bar Chart
- Boxplot
- Customizing Charts
Milestone 2: Perform Exploratory Data Analysis
Data Cleaning
- Dropping Rows
- Imputing Missing Values
- Feature Evaluating
Feature Engineering
Data Transformation
- One-Hot Encoding
- Standardization
- Normalization
Test/Train Split
Model Training
Machine Learning
- Linear Regression
- Logistic Regression
- Support Vector Machine
- Decision Tree
- K-Means
- Clustering
Milestone 3: Apply machine learning algorithms, select and refine the best model.
Conclusion: Data Science in the real world, next steps.

University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.

Fundamentals of Data Science (Live Online)

Custom training workshops are available for this program

For Stanford Affiliates: