Skip to content Skip to site navigation Skip to service navigation

Fundamentals of Data Science

Most Technology Training classes will be delivered online until further notice.


Before each sesson, Tech Training will provide a Zoom link for live online classes, along with any required class materials.


This course exposes you to real-world applications of data science and why it's become an integral part of business and academia. We will discuss the data science process and the tools used to perform data exploration, analysis, and modeling.

Prerequisite: Basic Python Programming training, or equivalent experience

Learning Objectives
 

In this class, you will have the opportunity to:

  • Install Anaconda on a personal computer
  • Understand the Data Science Field
  • Become familiar with Descriptive and Inferential Statistics and statistical analysis
  • Learn primary tools used for data science in Python including Pandas and Scikit-learn
  • Learn how to perform exploratory data analysis
  • Learn the importance of data cleaning
  • Utilize common Machine Learning algorithms such as Linear and Logistic Regression
  • Solidify understanding by completing hands-on exercises and milestones
  • Walkthrough two data science projects
  • Understand the big picture and the importance of data science in learning from data
     

Course Outline

  • Course Introduction
  • Install Anaconda
  • Review the Essentials of Python
  • Overview of Data Science
  • The Difference Between Business Analytics (BI), Data Analytics and Data Science
  • Descriptive Statistics Fundamentals
  • Central Tendency
       - Mean
       - Median
       - Mode
  • Spread of the Data
       - Variance
       - Standard Deviation
       - Range
  • Relative Standing
       - Percentile
       - Quartile
       - Inter-quartile Range
  • Inferential Statistics Fundamentals
  • Data Distributions
       - Normal Distribution
       - Uniform Distribution
  • The Data Science Process
       - Define the Problem
       - Get the Data
       - Explore the Data
       - Clean the Data
       - Model the Data
       - Communicate the Findings
  • Feature Selection
  • Data Cleaning
    - Dropping Rows
    - Imputing Missing Values
  • Data Transformation
    - Binary Encoding
    - One-Hot Encoding
    - Standardization
    - Normalization
  • Machine Learning Overview
  • Introduction to Pandas
  • Milestone 1: Use Pandas to perform data analysis on a real-world dataset.
  • Data Exploration
    - Describe
    - Merge
    - Group
    - Feature Evaluation
  • Feature Engineering
  • Milestone 2: Perform exploratory data analysis and feature engineering
  • Test/Train Split
  • Model Training
  • Basic Machine Learning Implementation
     - Linear Regression
     - Logistic Regression
     - Support Vector Machine
     - Decision TreeBasic Machine Learning Implementation
  • Milestone 3: Perform an end-to-end project of the data science process.
  • Conclusion: Next steps
     
  • Structured Activity/Exercises/Case Studies
    • Milestone Project 1: Use Pandas to perform data analysis on a real-world dataset.
    • Milestone Project 2: Perform exploratory data analysis and feature engineering.
    • Milestone Project 3: Perform an end-to-end project of the data science process.

Custom training workshops are available for this program

Technology training sessions structured around individual or group learning objectives. Learn more about custom training


University IT Technology Training sessions are available to a wide range of participants, including Stanford University staff, faculty, students, and employees of Stanford Hospitals & Clinics, such as Stanford Health Care, Stanford Health Care Tri-Valley, Stanford Medicine Partners, and Stanford Medicine Children's Health.

Additionally, some of these programs are open to interested individuals not affiliated with Stanford, allowing for broader community engagement and learning opportunities.