PyCon CZ

PyCon CZ 23
15–17 September
Prague

Robust Data Transformation with Pandas: Typing, Validation, Testing a workshop with Jakub Urban & Jan Pipek

Sunday 17 September 10:00 (3 hours)
Room 349

We will explore the options for making our data analyses and transformations in Pandas resilient and production-ready.

We will use type annotations and schema validations with the Pandera library to make our code more readable and robust. We will also show the potential of property-based testing using the Hypothesis package, with strategies generated from Pandera schemas. Throughout the workshop, we'll work with large time series weather data, demonstrating advanced group-by, resample, and rolling aggregations.

As a bonus, you'll gain insights into Prague's climate. We will show how to avoid issues with time zones when working with time series data. By the end of the tutorial, you will have a deeper understanding of advanced Pandas aggregations and be able to write robust, production ready Pandas code.

Prerequisites

Before attending the workshop, it's important to prepare your environment by following the instructions in the repository located at https://github.com/coobas/robust-pandas-workshop. This will ensure that you have all the necessary tools and dependencies installed to participate in the workshop.

Please note that we will be continuously updating the repository leading up to the workshop, so it's important to pull the latest changes on the day of the workshop to ensure that you have the most up-to-date materials.

What do you need to know to enjoy this workshop

Python level

Medium knowledge: You use frameworks and third-party libraries.

About the topic

You used or did it just a few times.

Jakub Urban

Lead Science Platform Engineer by profession who loves empowering data scientists and their algorithms. In general a Python + Data (Science) enthusiast, with education and career in computational plasma physics. PyData Prague meetup co-organiser, university tutor of scientific Python.

Jan Pipek

A physicist turned data scientist working at Pace Revenue. Organizer of the PyData Prague meetups. Occasional lecturer of scientific Python.