PyCon CZ

PyCon CZ 23
15–17 September

How to Build an Open‑Source Machine Learning Platform in Biology? a talk by Furkan M. Torun

Saturday 16 September 16:40 (30 minutes)

The global transition to personalized medicine has been accelerated by the advancements in high-throughput technologies in biology and increasing computational power and storage capacities. Accompanying the diversity and growing volume of the complex biological datasets, called “–omics” data (e.g., genomics and proteomics), revolutionized our understanding and our way of interpreting disease and health states, although they generate new challenges.

To reveal the complex and hidden patterns in omics data, machine learning (ML) showed promising results and brought new opportunities for transforming scientific discovery today. Popular packages such as scikit-learn or XGBoost enable predictive data analysis. However, the researchers still require programming skills to write their own ML pipelines which are not always easy to follow by non-specialists due to lacking domain knowledge and a graphical interface.

Furthermore, several parameters can be changed to tune the algorithms, which might show differences from version to version, resulting in reproducibility issues. To reproduce published results, the same software environment needs to be set up and configured with the matching package versions and algorithm parameters.

Additionally, omics sciences and ML require special domain knowledge since metrics can be deceiving and algorithms might need extra preselection or preprocessing steps.

Thus, transparent and open-source software is highly valuable for open and reproducible science. To address all the issues and to enable researchers to access state-of-the-art ML algorithms without requiring any prior bioinformatics and programming knowledge, we introduce OmicLearn (, a ready-to-use, open-source, web-based, ML platform specifically developed for omics datasets.

In this talk, you will see how to build a machine learning platform from open-source tools and how to apply state-of-the-art algorithms to omics datasets in a standardized format.

What do you need to know to enjoy this talk

Python level

Medium knowledge: You use frameworks and third-party libraries.

About the topic

You used or did it just a few times.

Furkan M. Torun

I am a molecular biologist and geneticist with research experience and programming background. After working as a computational biologist and data scientist at a rare disease research laboratory and OmicEra Diagnostics, I currently work at a cancer biotech company as a Researcher and Data Scientist.

The underlying ultimate goal of my works is to combine the power of computation with mysterious biological questions to reveal the unknown.

So, let's continue 🧬 debugging DNA software!