Football (soccer) data analysis: a pedagogic introduction

Indranil Ghosh (~indrag49)


Description:

This talk introduces the following concepts to those who want to start working on football data analysis:

  1. One of the hindrances most people face is getting access to open football data. I will start my talk addressing this issue and how to get open access football event data using the statsbomb API using Python. [3 min]

  2. The next thing I will talk about is drawing a football pitch using the mplsoccer Python module so that we can start making most of our football data visualizations on this pitch. [3 min]

  3. I will then talk about simple data visualizations like drawing pass maps, and their corresponding heat maps. [4 min]

  4. Next I will teach how to visualize a pass network on this pitch of a particular team during a particular game. We will further advance our knowledge by analyzing this pass network using the "NetworkX" python module that is usually used in complex network analysis in mathematics. We will learn how to calculate pass degree distributions of each player, find out which player was the most central in that pass network by calculating the centrality of each player node, and so on. [8 min]

  5. Finally, I will teach how to implement computational geometric concepts like Convex Hulls, Voronoi diagrams, and Delaunay triangulations using the Python package scipy.spatial on open access football event and tracking data so that we can analyze how many passes were available to a player at a particular instance of a game, or how a group of players broke down space on the pitch at a particular instance, etc. [7 min]

  6. Q&A and references [5 min]

Many people want to get their hands dirty with football (soccer) data analysis and this talk is for those who are heavily interested in data science and applying data science to sports but have not yet found the right resources. This talk will be accompanied by Jupyter Notebooks (with pedagogic flow of Python codes written) and anyone with little experience with Python programming and knows how to import packages will be able to follow this talk smoothly.

Prerequisites:

  1. Beginner's knowledge of importing and usage of numpy, pandas and matplotlib packages, and
  2. A basic knowledge of undergraduate mathematics

Video URL:

https://youtu.be/xNblmcnwfwo

Speaker Info:

I am a first-year Ph.D. student in applied mathematics from the School of Fundamental Sciences, Massey University, NZ. My research is on dynamical systems and robust chaos. I have a master's in Physics from Jadavpur University, Kolkata, India. I am mostly interested in dynamical systems, computational mathematics, optimization, quantum computing, soccer analysis, etc. I am much fascinated with open source software development and write codes mostly in Python and R, and sometimes Fortran. I have developed the R package QGameTheory, which is an open-source R tool to work with the basics of quantum computing and game theory simulations. I love presenting my learnings on national/international platforms and have presented in conferences on Python, R, Open source software, etc.

Speaker Links:

Personal Website

Twitter

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: