Writing production-level Machine Learning code from the gecko with least effort
Lais Carvalho (~laisbsc) |
Suppose that you are a Data Scientist. On the start of a new project, you wrote all the code in an experimental environment (your local laptop, using Jupyter notebook) and everything is working. Great!
The client now asked you to deploy the project in the production environment. You know that this migration process (from experimentation to deployment) will mean another huge project, which involves rewriting lots of code, increasing the risk of breaking things, and adding a significant time overhead to the project. The process often culminates on something that does not work in the production environment, wasting many weeks on debugging/rewriting code... Nightmare! To mitigate that common issue, we developed Kedro. Kedro is an open-source Python library that helps you apply software engineering best practice to data and machine-learning pipelines. It shines when multiple data scientists are collaborating on a project and need a consistent and standard way of working together. On this taIk, I will walk you through the basics of the library, show how it can be used in different environments and how it helps to streamline your data pipelines workflow.
Kedro is an open-source Python framework that helps you apply software engineering best-practice to data and machine-learning pipelines; it is sometimes described as the React of Data Science. We consider it to be the bridge between Machine Learning and Software engineering; introducing coding best practices and providing a boilerplate to building data pipelines that are robust, scalable, deployable, reproducible and versioned. Kedro has applications in:
- Academia producing reproducible experiments,
- And, while working on enterprise projects that require a model running in production.
The goal of this talk is to introduce the library and its features and show how it can be useful for virtually all use-cases. I will go into detail about its components and show how it can help Data Scientists, Data Engineers, and Machine Learning engineers to collaborate seamlessly on their projects, writing code that quickly moves from the experimentation phase into production. I will also show how the visualisation tool, Kedro-viz can help debug your pipeline code and showcase your project. I will show how the framework can be adopted on small local projects and on enterprise-level team ones. The demo will show the usability of Kedro on different projects and walk through the different plugins that can be used in your workflow.
1. Intro [3 min]
2. What is Kedro [3 min]
3. Kedro-viz, the pipeline visualisation tool [2 min]
4. Library application on different use-cases [6 min]
5. Demo [6 min]
6. Summary & Conclusion [5 min]
7. Q&A [5 min].
Attendees need basic knowledge of Python (3+) and a basic grasp of data engineering fundamentals to fully appreciate this talk.
Overall, everyone is welcome to join.
My name is Lais Carvalho and I am a Developer Advocate for QuantumBlack.
Final year of IT at CCT College Dublin, I have a background on Civil & Environmental Engineering and customer service. I started my programming journey with Java but quickly switched to Python, where I established a wide presence on the Python community by managing remote events for PyData Dublin and Python Ireland.
I was also part of the organisation of EuroPython 2020 and the first edition of FlaskCon on the same year.