Re­pro­ducible data science pipelines in produc­tion a workshop with Tania Allard

Sunday, 16 June, 10:00 in room EB226

How many times have you developed a model or a data application and tested it locally just to find out that it breaks in production? How many times have you seen a data science project that is ready to production but the closer it gets the harder it is to answer the following: “Why did the model predict this?”

These are two common issues faced by many many data scientist across the globe.

On one hand, explainability is essential for the trust of a model and its predictions. It also helps prevent situations in which nobody understands why a prediction is made. On the other hand, as more and more data-intensive applications are created, there is also a higher need for practical data operations and processes to improve the deployment of machine learning models.

In this workshop, you’ll learn how to level up your data science workflows with some practical DevOps! We will focus on how to improve the reliability and quality of your data applications preparing them better for production deployment and consumption.

We will build an end to end machine learning pipeline focusing on automated testing, integration and delivery, increasing its reliability.

What are the main takeaways? You will acquire an understanding of DataOps and how these can improve your data science workflows. We will focus on model explainability without compromising its accuracy. As we move over to the examples, you will better identify the many challenges faced during the production implementation of data applications and how these can be mitigated through best Ops practices. By the end of the workshop, you will have the knowledge required to automate the delivery of their data products, increasing their productivity and the quality of their work.

This workshop is suitable for both beginner and advanced Pythonistas.
Is part of the PyData track

Workshop will take 3 hours.

There will be maximum of 30 attendees.

Prerequisites

You need to have certain knowledge of python and git (version control system). You might also need some basic data manipulation knowledge to follow up on the tutorial content.

Requirements

Note that this is a bring your own laptop event and you need to make sure to have the following installed:

  • Python 3.6 or 3.7 recommended
  • Pytest
  • Git
  • Shell (you need to be able to use the command line)
  • Repo2docker
  • A text editor. We recommend using VS code

Repository for the workshop

We’re sorry but registration is not possible anymore.

Tania Allard

I’m a Research Engineer and developer advocate with years of experience in academic research and industrial environments. My main areas of expertise are within data-intensive applications, scientific computing, and machine learning.

I’m passionate about mentoring, open source, and its community and I'm involved in a number of initiatives aimed to build more diverse and inclusive communities. I'm also a contributor, maintainer, and developer of a number of open source projects and the Founder of Pyladies NorthWest UK.

ixek trallard

Check my talk Jupyter notebooks: Friends or foes?