Break into AI in audio and music domain with open python resources
Gopala Krishna Koduri (~gopala_krishna) |
Description:
In this whirlwind BoF session, we will cover the ABCs of how to get started with your journey into the realm of AI in audio and music domain, using a ton of Python resources available out in the open. The emphasis will be on various signal processing libraries, open data sources and LLMs.
The participants in the discussion will share their own stories of how they have leveraged the existing open stack to build amazing solutions that are went live.
The key takeaways for audience will be: 1. A thorough overview of the audio-music tech landscape, with ample pointers to kick start or further their journey along the way. 2. Get together with other people and companies enthusiastic and instrumental in this space.
Some of the open resources we will be touching upon include:
- Essentia - a comprehensive suite of algorithms to process audio-music data. Implemented in C++ and comes with python bindings
- Sonic Visualizer - helps you visualize audio, features. Supports plenty of VAMP plugins that you can use to process audios and visualize the outputs
- Open data sources including FreeSound, Acousticbrainz etc.
- Pretrained ML models (eg: these from UPF - Barcelona, Spain)
- Open LLMs (eg: Stable Audio Open)
We will be focussing on real-world examples of possible applications around these resources that constitute the open AI stack.
Prerequisites:
- Python programming language basics
- A ton of interest in creating music and audio applications
Content URLs:
We are planning to invite a few audio-music companies in India and abroad to participate in this session to each talk about how they leveraged this stack to build their applications.
A git repo with jupyter notebooks will be shared with the audience before the session kicks off. This repo will consist of starting points to build similar applications for the audience to get started.
Speaker Info:
Gopala is the founder and CEO of Musicmuni Labs, parent company of the popular AI driven singing practice application, Riyaz. His passion to build Riyaz stems from the fact that he could not avail music education despite his strong interest in it - for various reasons at different stages of life - including cost, availability of teacher and time. He saw technology as the only means that can truly address this issue at scale. He pursued a PhD in music technology from the Music Technology Group, Universitat Pompeu Fabra (Barcelona, Spain) to this end. Prior to that, he had a masters and bachelors degrees in computer science from IIIT - Hyderabad. The technology that powers Riyaz has its roots from the research that he and others carried out in the CompMusic project at MTG, UPF.
Speaker Links:
A couple of talks given on my PhD work: https://www.youtube.com/watch?v=noHdyfkzAm8&pp=ygUNZ29wYWxhIGtvZHVyaQ%3D%3D https://www.youtube.com/watch?v=CzlqU1nA4zg&pp=ygUNZ29wYWxhIGtvZHVyaQ%3D%3D
I have also spoken at International Society for Music Information Retrieval Conferences (the most prestigious venue for this domain) in the past, besides others. Here are the publication references:
- Serra, Joan; Koduri, Gopala K; Miron, Marius; Serra, Xavier;, Assessing the Tuning of Sung Indian Classical Music., ISMIR,157-162,2011
- Koduri, Gopala K; Serrà, Joan; Serra, Xavier; Characterization of Intonation in Carnatic Music by Parametrizing Pitch Histograms., ISMIR,199-204,2012,
- Sordo, Mohamed; Serrà, Joan; Koduri, Gopala K; Serra, Xavier; Extracting Semantic Information from an Online Carnatic Music Forum., ISMIR, 355-360,2012,
- Koduri, Gopala Krishna; Miron, Marius; Serrà Julià, Joan; Serra, Xavier; Computational approaches for the understanding of melody in Carnatic Music,. ISMIR 2011: Proceedings of the 12th International Society for Music Information Retrieval Conference; 2011 October 24-28; Miami, Florida (USA). Miami: University of Miami; 2011