Skip to content Skip to site navigation Skip to service navigation

Python for Data Science

Class Code


Class Description

Most Technology Training classes will be delivered online until further notice.

Before each sesson, Tech Training will provide a Zoom link for live online classes, along with any required class materials.


Python is the language of data science, and this class will expose you to the most important libraries (i.e., NumPy, Pandas, Matplotlib, and Scikit-learn) that will enable you to effectively do data science using Python.

Prerequisite: Basic Python Programming 

In this course, you will have an opportunity to:

  • Install Anaconda on a personal computer
  • Understand the various options for performing data science
  • Understand the reasons for Python's popularity in data science
  • Learn the primary libraries for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn
  • Perform exploratory data analysis using Pandas
  • Use Matplotlib and Seaborn to perform data visualization
  • Prepare data for machine learning
  • Apply machine learning on a variety of datasets
  • Understand the data science process
  • Understand the big picture and the importance of data science in business, industry, and technology

We will begin by installing Anaconda, which provides the libraries required for most data problems. We will discuss the focus and strengths of the most important libraries and how they enable data analysis and the application of machine learning to defined data problems. We will then use these libraries to perform data exploration, visualization, analysis and modeling on a variety of datasets as we work through the data science process.


Topics covered in this class include: 

  • Course Introduction
  • Overview of data science
  • Understand the reasons for Python's popularity in data science
  • Installing Anaconda
  • Milestone 1: Learn how to use Jupyter Notebooks
  • The data science process
  • Essential Python data science libraries
     - NumPy
    - Pandas
    - Matplotlib
    - Scikit-learn
  • Data Visualization
    - Line Chart
    - Scatterplot
    - Pairplot
    - Histogram
    - Density Plot
    - Bar Chart
    - Boxplot
  • Customizing Charts
    - Prepare data for machine learning
    - Milestone 2: Perform exploratory data analysis using Pandas
    - Milestone 3: Apply machine learning algorithms using Scikit-learn
    - Conclusion: Data Science in the real world, next steps

University IT Technology Training classes are only available to Stanford University staff, faculty, students, and Stanford Hospitals & Clinics employees, including Stanford Health Care, Stanford Health Care Tri-Valley, Stanford Medicine Partners, and Stanford Medicine Children's Health. A valid SUNet ID is needed to enroll in a class.