How to actually serve your ML models a talk by Matěj Račinský

Friday 15 September 11:50 (30 minutes)

__init__

How to actually serve your ML models - talk video

At GenDigital and Avast, we are using machine learning in production. Serving the models well so it's fast and reliable, is non-trivial. There are many tools for serving and it's hard to choose the right one.

I'll be sharing my experience with serving Tensorflow model with custom C++ code where we tried TF Serving, Triton and TorchServe, and serving Huggingface Transformers in Vertex AI using gunicorn and FastAPI on both CPU and GPU and how I was benchmarking and debugging it.

I'll also briefly talk about our experience with serving ML Models in Julia.

Code samples and presentation can be found here: https://github.com/racinmat/pycon_cz_2023_serving_models

What do you need to know to enjoy this talk

Python level

Medium knowledge: You use frameworks and third-party libraries.

About the topic

You used or did it just a few times.

Matěj Račinský

I studied Artificial Intelligence at FEE CTU and since then I never stopped working with it. Currently I'm Researcher/ML engineer at Gen Digital, where I was part of a research group with focus on ML in Julia and now I help other researchers to get their ML models to production and investigate the best ways to do so.

Sometimes I contribute to OSS either at work or my free time. I'm also otaku, so in my free time I'm watching anime or visiting local anime conventions where I'm having talks (not only) about AI related to the anime and manga community. I'm helping with technical stuff related to organizing anime conventions and I'm maintaining and developing IT infrastructure of one local anime convention.