Most Technology Training classes will be delivered online until further notice.
Before each sesson, Tech Training will provide a Zoom link for live online classes, along with any required class materials.
Prerequisite:
Learners should have an understanding of Basic Python Programming.
The ability to locate and acquire important data is a valuable skill for doing data analysis and data science.
In this class, we will:
- Explore many sources and repositories for valuable data acquisition such as open government and university datasets
- Explore popular social APIs (e.g., Facebook, Spotify, Twitter) and domain-specific APIs (e.g., healthcare, news, science and math) that store a wealth of data
- Discuss methods to query web servers, and request and parse data to extract the information you need
- Explore scraping various types of data from websites and how to read and extract text from documents (e.g., PDF, Word) along with methods to clean and store sourced and scraped data
Learning Objectives
During this course, you will have the opportunity to:
- Explore a Variety of Public Data Repositories
- Understand Effective Means to Search for Valuable Data
- Use the Python Programming Language to Source and Scrape Data
- Use Popular Social and Domain-specific APIs to Access Data (e.g., Slack)
- Extract Text from Documents (e.g., data in PDFs, Word) and access PDF Tables
- Scrape Data from Web Pages
- Clean Scraped Data and store Sourced and Scraped Data
Topic Outline
Overview of Data Sourcing
- Public Open Dataset
- Government Data
- University Data
- Milestone 1 Learning Exercise: Explore public data repositories
Introduction to the Python Programming Language
- Installing Anaconda
- Milestone 2 Learning Exercise: Learn how to use Jupyter Notebooks
- Using Public APIs (Application Programming Interfaces)
- Explore Popular and Domain-specific APIs
- Common Conventions
- Parsing JSON
- Milestone 3 Learning Exercise: Access a public API (e.g., Facebook, Twitter, Google)
Extracting Text from Documents
- Milestone 4 Learning Exercise: Extract data from PDFs
Overview of Data Scraping
- Introduction to BeautifulSoup
- Parsing HTML and Javascript
- Milestone 5 Learning Exercise: Scrape data from a website
Cleaning Scraped Data
- Storing Sourced and Scraped Data
Conclusion: Next steps