Polars vs Pandas: The battle of the bears a talk by Nathalie Vecten

Friday 15 September 15:40 (30 minutes)

__floor__

In Python, there are several other libraries for data analysis besides the famous Pandas, which are worth knowing because they address the speed problem that can be encountered when dealing with large datasets, e.g. Dask, Modin, Vaex or Polars.

I discovered the last one a few months ago via a colleague at work, but I wished I had know about it earlier when I encountered myself the problem mentioned above: the RAM of my laptop was not sufficient enough to analyse a bigger dataset with Pandas, and in the end I had to fall back on SAS :-(

At first, I will specify some technical characteristics of Polars and will mention the TPCH benchmark available in the documentation. Then I will try to demonstrate myself Polars' superior execution speed compared to Pandas' from a few small code snippets using a same dataset containing several million rows.

In the end I will try to show the differences in syntax between the two libraries, e.g. for filtering data or for selecting columns, in order to be able to approach this other kind of very useful bears.

What do you need to know to enjoy this talk

Python level

You can write basic scripts.

About the topic

No previous knowledge of the topic is required, basic concepts will be explained.

Nathalie Vecten

I am one of the few persons analyzing data in the corporate organization I work in (a major CZ bank) striving to use Python, while most others only use Excel or SAS.

I attended Czechitas Digital Academy Data in 2020 and PyLadies Vienna online Data course in 2022.

One of my quirks is maybe that I started to practice Czech at the same time as using programming languages, and depending on the day, I still wonder which is the most difficult.