Fundamentals of Data Science

Class Sessions

Date Location Cost
• Tue Nov 12, 9:00 am to 4:00 pm
Birch Hall 107 (Birch Lab B) \$400

ITS-1905

Class Description

This course exposes you to real-world applications of data science and why it's become such an integral part of business and academia. We will discuss the data science process and the tools used to analyze data sets.

Prerequisite: Basic Python Programming training, or equivalent experience

Learning Objectives:

In this class, you will have the opportunity to:

• Install Anaconda on a personal computer.
• Understand the Data Science Field.
• Become familiar with Descriptive and Inferential Statistics and statistical analysis.
• Learn the primary toolkit for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn.
• Learn how to perform exploratory data analysis.
• Learn the importance of data cleaning.
• Utilize common Machine Learning algorithms such as Linear and Logistic Regression.
• Learn how to evaluate models and choose the most effective one.
• Understand how to interpret a Confusion Matrix
• Understand the uses of the AUC-ROC curve in model evaluation.
• Solidify understanding by completing hands-on exercises and milestones.
• Create 2 data science projects.
• Understand the big picture and the importance of data science in business, industry, and technology

Topic Outline:

• Course Introduction
• Installing Anaconda
• Overview of Data Science
• The Difference Between Business Analytics (BI), Data Analytics and Data Science
• The Field of Data Science
• The Data Science Process
- Define the Problem
- Get the Data
- Explore the Data
- Clean the Data
- Model the Data
- Communicate the Findings
• Descriptive Statistics Fundamentals
• Central Tendency
- Mean
- Median
- Mode
• Spread of the Data
- Variance
- Standard Deviation
- Range
• Relative Standing
- Percentile
- Quartile
- Inter-quartile Range
• Inferential Statistics Fundamentals
- Normal Distribution
- Central Limit Theorem
- Standard Error
- Confidence Intervals
- Other Distributions
- Samples
- Hypothesis Testing
• Milestone 1: Perform statistical analysis on a given data set.
• Essential Python Data Science Libraries
- Numpy
- Pandas
- Matplotlib
- Scikit-learn
- Statsmodels
• Data Exploration
- Describe
- Merging
- Grouping
- Evaluating Features
• Data Visualization
- Line Chart
- Scatterplot
- Pairplot
- Histogram
- Density Plot
- Bar Chart
- Boxplot
- Customizing Charts
• Milestone 2: Perform Exploratory Data Analysis
• Data Cleaning
- Dropping Rows
- Imputing Missing Values
- Feature Evaluating
• Feature Engineering
• Data Transformation
- One-Hot Encoding
- Standardization
- Normalization
• Test/Train Split
• Model Training
• Machine Learning
- Linear Regression
- Logistic Regression
- Support Vector Machine
- Decision Tree
- K-Means
- Clustering
• Milestone 3: Apply machine learning algorithms, select and refine the best model.
• Conclusion: Data Science in the real world, next steps.

University IT Technology Training classes are only available to Stanford University staff, faculty, or students. A valid SUNet ID is needed in order to enroll in a class.