Web scraping with Scrapy a workshop with Tomáš Bartek

Sunday, 16 June, 14:00 in room EB230

This workshop focuses on a web scraping project called Scrapy. After the workshop, you will have your own working Scrapy project and will be ready to use it as a starting point for scraping whatever web page you will want.

First, we'll give you a short overview of scraping possibilities in Python. We will introduce the Scrapy open-source project as a more advanced and powerful tool than requests library or Beautiful Soup and we will make a short tour through the Scrapy architecture.

After that, we will show you how to start your own project in various ways: from the cloud, as a local machine process, or as a call from another python script. Then you will choose one web page suitable for scraping and write your simple spider with overcoming common problems as following links and pagination.

We will prepare GitHub repo for necessary infrastructure pieces/skeleton code, so you can focus only on coding the web-specific parts. The workshop will be suitable for both beginners and advanced Pythonistas.

This workshop is suitable for both beginner and advanced Pythonistas.

Workshop will take 3 hours.

There will be maximum of 15 attendees.

We’re sorry but registration is not possible anymore.

Prerequisites

Basics of web technologies (HTTP, HTML, CSS, XPath, JavaScript), ability to use a command line and Git.

Requirements

Bring your own laptop to event and you need to make sure to have the following installed:

Python 3.6 or 3.7 recommended
Git
Shell (you need to be able to use the command line)
Docker (repo2docker)
A text editor.

Tomáš Bartek

I am a Python enthusiast with a background in physics (in prehistoric ages I studied biophysics). Currently, I work as an IT architect/programmer in a startup called Flatzone, where we scrape real estate related data from the whole Czech web with the help of a Scrapy project.

I like open data, physics, nature, meeting with other people and teaching other people new things. My favorite color is dark blue.

itsx

previous workshop next workshop