In Python, there are several other libraries for data analysis besides the famous Pandas, which are worth knowing because they address the speed problem that can be encountered when dealing with large datasets, e.g. Dask, Modin, Vaex or Polars.
I discovered the last one a few months ago via a colleague at work, but I wished I had know about it earlier when I encountered myself the problem mentioned above: the RAM of my laptop was not sufficient enough to analyse a bigger dataset with Pandas, and in the end I had to fall back on SAS :-(
At first, I will specify some technical characteristics of Polars and will mention the TPCH benchmark available in the documentation.
Then I will try to demonstrate myself Polars' superior execution speed compared to Pandas' from a few small code snippets using a same dataset containing several million rows.
In the end I will try to show the differences in syntax between the two libraries, e.g. for filtering data or for selecting columns, in order to be able to approach this other kind of very useful bears.
What do you need to know to enjoy this talk
Python level
You can write basic scripts.
About the topic
No previous knowledge of the topic is required, basic concepts will be explained.