[Data on the Mind 2017] Collecting data from the web
Описание
Abstract: This workshop will explore different ways to collect data from the web with Python. Have you ever needed to copy and paste hundreds (or thousands!) of tables on different web pages? Or click through combinations of dropdown menu selections and download files? Are you interested in collecting social media or news data? After this workshop, you'll be well on your way to automating these processes. We will first consider getting data from RESTful APIs. We will walk through using the documentation to build a query for a GET request. We'll then write our response to a CSV spreadsheet. We'll then discuss web scraping, keeping in mind the Terms of Service for websites and ensuring we are not in violation. We will look at two ways of scraping: 1) parsing the HTML response of a GET request using BeautifulSoup, and 2) utilizing the Selenium web driver to interact directly with dynamic web content.
Instructor: Christopher Hench (University of California, Berkeley)
---
Part of the Data on the Mind 2017 summer workshop: http://www.dataonthemind.org/2017-workshop
Funded by the Estes Fund: http://www.psychonomic.org/page/estesfund
Co-produced with the Berkeley D-Lab: http://dlab.berkeley.edu/
Organized in collaboration with Data on the Mind: http://www.dataonthemind.org
Videography by DeNoise Studios: http://www.denoise.com
Workshop hashtag: #dataonthemind
Рекомендуемые видео


















