sktime – python toolbox for time series: pipelines and benchmarking a workshop with Franz Kiraly & Benedikt Heidrich
Sktime is a widely used scikit-learn compatible library for learning with time series. Sktime is easily extensible by anyone, and interoperable with the pydata/numfocus stack. This tutorial is an introduction to advanced forecasting techniques using sktime and its objectives encompass:
- Learn how to create forecasting pipelines that integrate forecasters and feature extraction methods.
- Explore more sophisticated and advanced forecasting scenarios and realise them using graphical pipelines and auto-ML.
- Learn how to evaluate the performance of forecasters.
- Learn how benchmarks can be created, similar to M4/M5 competitions.
Additionally, this tutorial provides an introduction to reproducibility tools, such as auditably storing model blueprints and fitted models together with a methodological primer.
It is structured to cover the following topics:
Basic forecasting pipelines
- transformations of endogenous and exogenous data
- common feature sets: lags, window summaries, dates, holidays
- feature union and subsetting
- combination with tuning and reduction
Advanced forecasting pipelines
- multiplexing, auto-ML
- graphical pipelines
- pipeline diagnostics
- persising pipelines
Performance evaluation
- metrics for point forecasts
- metrics for probabilistic forecasts
- using metrics for multi-instance, hierarchical data
Full benchmarks
- single dataset and multiple dataset benchmarks
- data sets, data set collections
- serializing model blueprints and fitted models
The tutorial sheds light on certain experimental aspects of the benchmarking module and the graphical pipeline, highlighting opportunities for contributions and improvements. Developed collaboratively by an inclusive community, sktime aims to foster ecosystem integration within a neutral and charitable sphere.
The tutorial welcomes engagement, encouraging contributions from individuals across the world.
Requirements
Bring your own laptop.
The tutorial notebooks can be run on binder.
Alternatively, participants can run the notebooks from a clone of the tutorial repository on their local laptop.
What do you need to know to enjoy this workshop
Python level
Medium knowledge: You use frameworks and third-party libraries.About the topic
No previous knowledge of the topic is required, basic concepts will be explained.Benedikt Heidrich
I completed a Master of Science degree in informatics in 2019 with the Karlsruhe Institute of Technology. I am working towards a PhD in Informatics, which I finish this year. My research focuses on using deep generative models in energy systems and coping with concept drift in energy time series forecasting.
Additionally, I investigate how general pipeline architecture has to be designed for time series analysis tasks.