What is xG? And why it tells you so much about a football game

Shreyas Khatri
5 min readMar 31, 2021

--

A love of data science, mathematics, and football will eventually lead you to explore how you can use data and mathematical models in this sport we all love. The data revolution in football is underway, Opta founded in 1996 is a leading pioneer, and so is Ted Knutson’s StatsBomb. There are gigabytes of free match data and event data available online, with amateur enthusiasts running to GitHub to avail of those features. Not to forget the accounts and findings of Simon Kuper and Stefan Szymanski in their fantastic book ‘Soccernomics: Why England Loses’ as well as Christoph Biermann in ‘Football Hackers’. Chris Anderson, who set out to be football’s Billy Beane, also penned a great account of his findings in ‘A Numbers Game’ with David Sally. And Professor of Applied Mathematics at the University of Uppsala, David Sumpter wrote ‘Soccermatics: Mathematical Adventures in the Beautiful Game’ to note down how the modern game is filled with numbers, patterns and shapes and how to make sense of all that with mathematical models.

Matthew Benham, professional gambler and owner of Brentford FC and FC Midtjylland, and a pioneer of using mathematical models in football

For those unaware of what xG is, I’ll try to simplify it. xG, or ‘Expected Goals’ was invented by Englishman Sam Green, who first described the idea in 2012. He was then working at OptaPro. xG measures the probability that a shot will result in a goal based on several factors. Such factors include the distance from where the shot was taken, angle to the goal line, the game state (what is the score), if the shot was from a header, if the shot came during a counter-attack and other factors. When you’re watching a football game and a striker misses from 5 yards out or an empty net, he gets more harshly criticized than when he skies a shot from an acute angle or 25 yards out because we intuitively assign a lower goal probability to the latter cases than the former ones. An ‘oomph’ resounded when Paul Scholes belted a 30 yarder against Barcelona in the Champions League semifinal back in 2008, which seemed like a goal for the ages then. But now mathematical models can assign a probability to that attempt: 3 per cent. That means, if a normal player took that shot 100 times, he’d only score 3 times. Paul Scholes being Paul Scholes, the probability would increase for that individual player, but xG doesn't consider individual cases.

xG helps us predict the expected results of games, and is a far more reliable predictor than head to head, the previous form of a team, which has been dismissed as noise in the data. Those who gamble on football matches already use such mathematical models to bet on games.

An xG map illustrating zones with different goal-scoring probabilities

Since xG is a probability, like all other probabilities it ranges from 0 to 1. However, it is exclusive of both of these numbers, since however tough a chance, there is always a possibility to score from it, and however easy a chance, there is always a possibility of missing it. Therefore, the xG value for any goalscoring opportunity lies in the open interval (0,1).

Now, xG is just a mathematical approximation of the game, based on an algorithm. xG may help establish the expected result of a game, but football’s complex, dynamic and low-scoring nature often means the ‘true’ result and expected result may differ variably, especially for low sample sizes. xG, like any other algorithm, is nothing but instructions designed to solve a problem. A human interested in making the best use of this data(to occupy the best positions to score a goal or stop one) crunches the numbers. The reliability of the findings obtained is dependent on the quantity and quality of the data, as well as how the algorithm has been designed and the parameters that are chosen. This explains why several different xG algorithms exist, with each providing different but not too diverging results.

(An awesome site for such xG data for the top 5 leagues: https://understat.com/)

Another insight we can obtain except the predicted result is the story of how the game went. xGPlots(or Expected Goals Plots) are arranged along a horizontal 90-minute timeline and illustrate the game’s narrative in chronological fashion, which add to our understanding of how best to employ Expected Goals. One example of an xGplot is:

xGPlot for the Manchester Derby on 7th March 2021

The above xGPlot was created by me through scraping data from https://understat.com/match/14700 and then writing a python code which I have added to a Github repository and whose link is https://github.com/shreyas7kha/xGPlotsUnderstat.

The code is basic and professional xGPlots are more informative but I’ll use this just as an example. Gabriel Jesus’ 1st minute foul on Anthony Martial gave a penalty to Manchester United which Bruno Fernandes duly converted. While City dominated the ball, they did not create any substantial goalscoring opportunity and their woes were doubled in the 49th minute when Luke Shaw doubled United’s lead.

We can see through the xGplot how United had a major jump in xG because of the early penalty(penalties are converted at around a 76% rate, so penalties have a 0.76 xG awarded to it). City had several small increments of xG due to their many but not high probability opportunities. Luke Shaw doubled from a relatively low xG shot, while United further missed two high xG opportunities courtesy of bad finishing from Anthony Martial and good goalkeeping from Ederson. In all, United created fewer but more substantive opportunities while City created abundant but low probability opportunities, barring 1–2 substantive spikes/high goalscoring opportunities. The non-penalty xG for the game was Man City 1.28–1.35 Man United, which is a fair reflection of how the game was evenly poised by neutrals, after the early penalty.

Luke Shaw doubling United’s lead in the 49th minute from a shot worth 0.09 xG

In conclusion, the data revolution is here to stay in football. It’s taken its time, but clubs now operate large data analytics team of over 20 analysts recruited from some of the best universities and without any prior footballing background. Clubs like Brentford and FC Midtjylland, owned by Smartodds owner Matthew Benham, are already some of the best run clubs in the world, due to the extensive usage of mathematical models and algorithms to rate their team and opposition. Data analytics and ‘Moneyball’ have enabled financially weak clubs like Leicester City and Montpellier FC to go on fairytale league wins. Pep Guardiola uses xG to enable his players to get into the best goalscoring areas, and the Centurions did not score a single goal from outside the box in the first half of the 2017–18 season. Jurgen Klopp has recruited throw-in specialist Thomas Gronnemark to teach his players to throw the ball longer and better. We’re embarking on a data revolution in the beautiful game, and it’s only going to get more complex and statistical.

--

--