Building Super Bots with Python and OpenVINO™: Leveraging Multimodal AI for Vision, Audio, and Text.

Anisha Udayakumar (~AnishaUdayakumar)




Deploying complex multimodal AI models on both edge and client devices presents significant technical challenges due to their computational and resource demands. This session explores the cutting-edge of AI development with this deep dive into building intelligent 'Super Bots' that can process and interpret vision, audio, and text data seamlessly. Utilizing Python and Intel’s OpenVINO™ toolkit, alongside the powerful capabilities of Multimodal AI (LLaVA-NeXT), this session demonstrates the creation and deployment of AI systems that can operate intelligently across various sensory inputs. We'll also demonstrate how Intel's OpenVINO™ toolkit optimizes these models, enabling their efficient deployment in real-world, real-time applications across diverse environments.

We will delve into the complexities of deploying resource-intensive Large Language Models (LLMs) and visual generative AI, highlighting the seamless integration of Python with OpenVINO™ to streamline this process.

The Talk will specifically highlight:

  • Python’s Ecosystem in AI Development: Emphasizing how Python, supported by libraries such as NumPy, Matplotlib, and Hugging Face's Transformers, is essential for developing AI models and managing their lifecycle from prototype to deployment.
  • Optimizing Multimodal Models: Addressing the unique challenges of multimodal AI, including synchronizing varied data types and reducing computational load. Explore OpenVINO's role in model compression, quantization, and efficient inference.
  • Deployment Strategies for Edge and Client Devices: Discussing strategies to deploy optimized models effectively on both edge and client devices. This includes practical considerations for maximizing performance and maintaining data privacy in environments where quick response times are critical.
  • Live Demonstration of Smart Doc Detective using Pix2Struct: Unveil the power of our Document Visual Question Answering system that acts like a "Smart Doc Detective" powered by the Pix2Struct model. This fun, interactive tool dives into document images, extracting and interpreting information on the fly to answer your questions about any textual or visual detail. Watch how this AI detective deciphers the contents of complex documents, showcasing real-time AI capabilities optimized with OpenVINO™ on edge and client devices.
  • Interactive Chatbot with LLaVA-NeXT: Dive into the world of advanced multimodal interactions with our interactive chatbot powered by LLaVA-NeXT. This demonstration will illustrate how the model integrates vision, audio, and text to understand and respond to complex user queries in real-time. See firsthand how LLaVA-NeXT processes multimodal inputs to deliver cohesive and context-aware responses, exemplifying the next level of intelligent AI systems.

Talk is For: This talk is designed for developers, AI enthusiasts, data scientists, and technology innovators keen on harnessing the latest advancements in AI to build applications that are as versatile as they are powerful. Attendees should have a basic understanding of Python and AI/ML concepts, as well as an interest in edge computing.


  • Familiarity with Python and basic concepts of AI/ML.
    • An understanding of edge computing principles.
    • For the hands-on demo, prior installation of the toolkit is recommended. Follow the installation guide here: Installation Guide

Video URL:

Speaker Info:

Anisha is an AI Software Evangelist at Intel, specializing in the OpenVINO™ toolkit. With a solid background as an Innovation Consultant at a leading Indian IT services and consulting firm, she has adeptly steered business leaders towards harnessing emerging technologies for forward-thinking business solutions. Her expertise encompasses AI, Extended Reality, and 5G, where she has developed innovative solutions to meet diverse business challenges. Her passion particularly lies in the realm of computer vision, where she has excelled in devising computer vision-based solutions. One of her notable contributions includes developing vision-based algorithmic solutions that have significantly aided in achieving sustainability goals for a global retail client. At Intel, Anisha is dedicated to enriching the developer community. She illuminates the capabilities of the OpenVINO toolkit, aiding developers in elevating their AI projects. Her role involves actively engaging with developers, enhancing their understanding and application of OpenVINO in crafting cutting-edge AI solutions. A lifelong learner and an ardent innovator, Anisha is enthusiastic about exploring and sharing the transformative impact of technology, continually inspiring the developer community with her insights and discoveries.

Talks: IOTShow 2024 AI For Everyone

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Beginner
Last Updated: