Topic modelling and textual analysis with gensim
a workshop by Bhargav Srinivasa Desikan

Topic Modelling is a great way to analyse completely unstructured textual data – and with the Python NLP framework gensim, it's very, very easy to do this. The purpose of this tutorial is to guide one through the whole process of topic modelling - right from pre-processing your raw textual data, creating your topic models, evaluating the topic models, to visualising them. Advanced topic modelling techniques will also be briefly covered in this tutorial, such as Dynamic Topic Modelling, Topic Coherence, Document Word Coloring, and LSI/HDP.

The python packages used during the tutorial will be spaCy (for pre-processing), gensim (for topic modelling), and pyLDAvis (for visualisation). The interface for the tutorial will be an Jupyter notebook.

The takeaway from the tutorial would be the participants ability to get their hands dirty with analysing their own textual data, through the entire lifecycle of cleaning raw data to visualising topics.

Bhargav Srinivasa Desikan


Bhargav is a student researcher currently working at INRIA, France. He is part of the MODAL (Models Of Data Analysis & Learning) team, and he works on Metric Learning, Predictor Aggregation and Data Visualisation.

When not at work (and sometimes when at work), he enjoys contributing to open source – particularly the Python Machine Learning community. He participated in Google Summer of Code 2016, where he implemented Dynamic Topic Models for Gensim. He has spoken about Gensim at PyCon France 2016 and PyCon Slovakia 2017.

When not coding or working on ML research he enjoys drinking cold beer and reading science fiction.

All workshops