Practical guide to designing implants for pandas a talk by Jan Pipek

Friday, 14 June, 15:40 in Club

Since version 0.23, the pandas library allows using custom user types for internal representation in series and data frames by introducing the ExtensionArray and ExtensionDtype interfaces (in places where a NumPy array would be used). Version 0.24 brings that forward by implementing all its “exotic” types in terms of the mentioned interfaces.

This has two main basic use cases: to make effective use of a data storage library or proxy (like Apache Arrow in the fletcher project), and to capture more complex objects seamlessly in pandas columns (like IP addresses in cyberpandas or topographical objects in geopandas).

The talk will explore the possibilities of extension arrays and will gradually build towards a simple proof-of-concept custom column.

This talk is aimed at advanced Pythonistas. While it might be interesting for beginners we recommend them to choose another talk.
Is part of the PyData track

Jan Pipek

I am a data scientist and engineer at Showmax, helping neural networks understand what happens in movies and building a video streaming platform for Africa. I only recently converted from Monte Carlo simulations in medical physics.

I've been using Python for more than ten years, with a strong inclination for data analysis and visualization (having written several useless and hopefully at least one useful library – physt), but also trying to enjoy the language in the broader sense.

I am both happy and fortunate to be one of the PyData Prague meetup organizers.

janpipek janpipek

I also lead a workshop The Data Trinity – Practical NumPy, pandas and Matplotlib