An Introduction to Word Embeddings a talk by Galuh Sahid
Friday, 14 June, 15:00 in Club
“France is to Paris like Czech Republic is to _.” If I ask you to fill the blank, you would answer “Prague” right away even without me giving a clue such as “the answer is a capital city”. Our existing knowledge enables us to determine that France and Paris has a Country – Capital City connection and that the Czech Republic’s capital city is Prague, so that must be the answer.
However, computers don’t know that Prague belongs to the same “category” as Paris and other capital cities unless we tell them so. If we want to get computers to understand human language as well as we do, there are way too many things that we need to teach computers explicitly. Is there a better way?
With word embeddings, we represent words by a series of numbers. This opens up a whole new world for computers because now they can understand the context of a word and infer relationships between words using numbers and maths — the language they are proficient in. We’ll delve into the details of what word embeddings actually are, popular word embedding algorithms, what problems you can solve using word embeddings, and how you can use word embeddings with Python.
My love of both data and coding leads me to my current work as a data engineer at GOJEK, one of Indonesia’s unicorn startup companies. I love getting insights from data, especially unstructured texts, which I heavily used in my past researches on predicting donation campaign and automatically categorizing Indonesian e-commerce websites. I also co-organize PyLadies Indonesia and this year’s edition of Global Diversity CFP Day in Jakarta. When not coding, you can find me reading, sketching, or studying Python’s internals & quirks.