Have you ever wondered how social medias keep track of the number of likes and comments under each post? In a distributed world, performing accurate and fast queries on large amounts of data is challenging, as traditional query techniques can become impractical at scale.
Thankfully, probabilistic techniques allow us to trade a small amount of accuracy for significant performance gains. Introducing data structures like Bloom filters and Count-Min sketches, we’ll develop an intuition on how they are used for efficient set membership or frequency queries, and will compare their performance with non-probabilistic queries. We will also compare probabilistic data structures coded in Python with real-world alternatives used in databases.
After this talk, you will have a better understanding of the internals of popular databases and new tools at your disposal to let your software service grow at scale.
What do you need to know to enjoy this talk
Python level
Medium knowledge: You use frameworks and third-party libraries.
About the topic
No previous knowledge of the topic is required, basic concepts will be explained.