Web scraping with Scrapy a workshop with Tomáš Bartek
Sunday, 16 June, 14:00 in room EB230
This workshop focuses on a web scraping project called Scrapy. After the workshop, you will have your own working Scrapy project and will be ready to use it as a starting point for scraping whatever web page you will want.
First, we'll give you a short overview of scraping possibilities in Python. We will introduce the Scrapy open-source project as a more advanced and powerful tool than requests library or Beautiful Soup and we will make a short tour through the Scrapy architecture.
After that, we will show you how to start your own project in various ways: from the cloud, as a local machine process, or as a call from another python script. Then you will choose one web page suitable for scraping and write your simple spider with overcoming common problems as following links and pagination.
We will prepare GitHub repo for necessary infrastructure pieces/skeleton code, so you can focus only on coding the web-specific parts. The workshop will be suitable for both beginners and advanced Pythonistas.
Workshop will take 3 hours.
There will be maximum of 15 attendees.
Bring your own laptop to event and you need to make sure to have the following installed:
- Python 3.6 or 3.7 recommended
- Shell (you need to be able to use the command line)
- Docker (repo2docker)
- A text editor.
We’re sorry but registration is not possible anymore.
I am a Python enthusiast with a background in physics (in prehistoric ages I studied biophysics). Currently, I work as an IT architect/programmer in a startup called Flatzone, where we scrape real estate related data from the whole Czech web with the help of a Scrapy project.
I like open data, physics, nature, meeting with other people and teaching other people new things. My favorite color is dark blue.