TMVA SOFIE: CERN's Fast Machine Learning Inference Engine

Description:

SOFIE, or the S ystem for O ptimized F ast I nference code E mit, is a recently developed Fast Machine Learning Inference Engine by CERN, the European Organization for Nuclear Research, at Geneva. Being developed on open-standards, following the ONNX specifications, SOFIE aims to become an engine with the least latency and fewer dependencies, that can be easily integrated not only at the experiments of CERN for physics analysis, but also with any major ML deployment service.

SOFIE has some major components that includes:

Parser: Responsible for translating models trained in external frameworks like PyTorch or Tensorflow to SOFIE's Intermediate Representation.
Model Storage: Once a model is parsed, it stays within SOFIE's IR, that can be serialized into the .root format.
Inference Code Generator: From the stored IR, the generator produces the inference code, which is a C++ header file with an easily invokable function and the model weights.

SOFIE as a project started in 2021, from an engine only supporting inference of basic models having Dense layers and ReLU activation function, now supports CNNs, RNNs, and even complex architectures like Graph Neural Networks. With its recent advancements, SOFIE's is aiming to develop inference support for Transformers, GANs and VAEs to enhance the physics analysis that requires machine learning.

Outline:

Introducing TMVA SOFIE
- Motivation
- Why CERN needs super-fast inference of ML models with the least latency and fewer dependencies?
- Why frameworks like TensorFlow or PyTorch aren't much help at CERN for ML Inference?
SOFIE Architecture
- Parser
- Model Storage
- Inference Code Generator
SOFIE Parser
- ONNX Parser
- Keras Parser
- PyTorch Parser
SOFIE Inference Code Generator
SOFIE Advanced Models' Inference Support
- Graph Neural Networks
- Dynamic Computation Graph
Inference on Accelerators
Demo
Future Goals

Prerequisites:

Intermediate knowledge of machine learning and the underlying mathematics will be helpful. The project is an ML inference engine developed using C++ with Python interfaces through the C-Python API. Thus, a basic understanding of the required libraries will be beneficial. Familiarity with mathematical functions such as GEMM, ReLU, matrix multiplication, and hardware accelerators will be useful for following the latest developments of the project.

Content URLs:

GitHub repo: https://github.com/root-project/root/tree/master/tmva/sofie
CERN Documentation for TMVA/SOFIE: https://root.cern/manual/tmva/#sofie
Blog Post introducing SOFIE: https://sanjiban.hashnode.dev/root-project-introducing-sofie

Presentations/Reports:

TMVA Fast Inference System (SOFIE) - Presented at the 40th International Conference on High Energy Physics, Bologna, Italy
New developments of TMVA/SOFIE: Code Generation and Fast Inference for Graph Neural Networks - Presented at the 26th International Conference on Computing in High Energy and Nuclear Physics, Virginia, USA
Fast Inference of Machine Learning Models with SOFIE - Presented at the 1st European AI for Fundamental Physics Conference, Amsterdam, The Netherlands
Inference of ML models on Intel GPUs with SYCL and Intel OneAPI using SOFIE - Technical Report published for the CERN OpenLab Student Program by Ioanna-Maria Panagou

Speaker Info:

Sanjiban currently an External User with the EP-SFT Department of CERN, has been working in the open-source data science and engineering domain since his junior year of college in 2021. He was accepted to participate in Google Summer of Code 2021 for CERN-HSF and thus worked on developing storage functionalities for deep learning models. A year later, he was selected to participate in the CERN Summer Student Program in Geneva, Switzerland, and worked on enhancing SOFIE. He was particularly involved in the development of the Keras and PyTorch Parser, machine learning operators based on ONNX standards, Graph Neural Networks support, etc. Moreover, he volunteered as a Mentor for the contributors of Google Summer of Code 2022, and again in 2023, and the CERN Summer Students of 2023 working on CERN’s ROOT Data Analysis Project.

Previously, Sanjiban spoke at PyCon India 2023 about his work on developing python interfaces for Meta's Velox Engine. He also presented his extended work on the Velox architecture at PyCon Thailand 2023.

Sanjiban finds hackathon and ideation events very interesting, and has participated in many of them in different levels. Previously, he has worked with various startups as well as corporations, thus gaining industrial experience. During college, he acted as the Vice Chair, and then the Chair of the ACM Student Chapter of IIIT Bhubaneswar. He also acted as the ML Head of various student technical societies.

His work on CERN's TMVA SOFIE Machine Learning Inference Engine has been published/presented as follows:

Speaker Links:

Past talks:

Engineering Velox: the unified execution engine for varied data infrastructures
PyCon Thailand 2023, Bangkok; December 2023
Link to talk
PyVelox: Interfacing Python bindings for the unified execution engine by Meta
PyCon India 2023, Hyderabad; October 2023
Link to talk
TMVA SOFIE: Developing the Machine Learning Inference Engine
CERN Student Sessions 2022, Geneva; August 2022
Link to talk
ROOT Storage of Deep Learning models in TMVA
CERN-HSF’s GSoC 2021 End of Program Presentation Series; August 2021
Link to talk

GitHub Profile
LinkedIn Profile
Speaker's Personal website

Section:	Artificial Intelligence and Machine Learning
Type:	Talk
Target Audience:	Intermediate
Last Updated:	29 May, 2024