This live online course exposes you to real-world applications of data science and why it's become such an integral part of business and academia. We will discuss the data science process and the tools used to analyze data sets.
Prerequisite: Basic Python Programming training, or equivalent experience
In this class, you will have the opportunity to:
- Install Anaconda on a personal computer.
- Understand the Data Science Field.
- Become familiar with Descriptive and Inferential Statistics and statistical analysis.
- Learn the primary toolkit for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn.
- Learn how to perform exploratory data analysis.
- Learn the importance of data cleaning.
- Utilize common Machine Learning algorithms such as Linear and Logistic Regression.
- Learn how to evaluate models and choose the most effective one.
- Understand how to interpret a Confusion Matrix
- Understand the uses of the AUC-ROC curve in model evaluation.
- Solidify understanding by completing hands-on exercises and milestones.
- Create two data science projects.
- Understand the big picture and the importance of data science in business, industry, and technology
Topic Outline:
- Course Introduction
- Installing Anaconda
- Overview of Data Science
- The Difference Between Business Analytics (BI), Data Analytics and Data Science
- The Field of Data Science
- The Data Science Process
- Define the Problem
- Get the Data
- Explore the Data
- Clean the Data
- Model the Data
- Communicate the Findings - Descriptive Statistics Fundamentals
- Central Tendency
- Mean
- Median
- Mode - Spread of the Data
- Variance
- Standard Deviation
- Range - Relative Standing
- Percentile
- Quartile
- Inter-quartile Range - Inferential Statistics Fundamentals
- Normal Distribution
- Central Limit Theorem
- Standard Error
- Confidence Intervals
- Other Distributions
- Samples
- Hypothesis Testing - Milestone 1: Perform statistical analysis on a given data set.
- Essential Python Data Science Libraries
- Numpy
- Pandas
- Matplotlib
- Scikit-learn
- Statsmodels - Data Exploration
- Describe
- Merging
- Grouping
- Evaluating Features - Data Visualization
- Line
- Scatterplot
- Pairplot
- Histogram
- Density Plot
- Bar Chart
- Boxplot
- Customizing Charts - Milestone 2: Perform Exploratory Data Analysis
- Data Cleaning
- Dropping Rows
- Imputing Missing Values
- Feature Evaluating - Feature Engineering
- Data Transformation
- One-Hot Encoding
- Standardization
- Normalization - Test/Train Split
- Model Training
- Machine Learning
- Linear Regression
- Logistic Regression
- Support Vector Machine
- Decision Tree
- K-Means
- Clustering - Milestone 3: Apply machine learning algorithms, select and refine the best model.
- Conclusion: Data Science in the real world, next steps.
University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.