How I Created My Own xG Calculator
Introduction to xG (expected goals)
Before you read, if you’re not familiar with xG, go watch this video made by Tifo Football.
Gathering data points and information
I’ve always been interested in technical/tactical analysis of the beautiful, fluid, multidimensional game, but often felt like there was a more effective way than just listening to pundits on TV analyzing the game with metrics like possession or shots on target. Although a lot has changed and statistical analysis of soccer has become largely popularized, I was still interested in trying to compute a more accurate representation of the game that will elevate the incredible tactical or individual ideas people have on the field.
Luckily, I had similar ideas to the people who created xG (expected goals) as a better, more rational metric of measuring the likelihood of individuals and teams to score goals. After I heard of xG a couple years ago, I was intrigued and was immediately enticed to attempt to replicate these numbers. But unlike major companies like Statsbomb or Opta, I don’t have the millions of data points off of thousands of games to analyze. I also don’t have a full team of qualified analysts to create sophisticated software to maximize my time in producing these numbers. So then, how did I do it?
The simple answer is that I watched a lot of games. And I mean a lot.
I remember in highschool (back home in Japan), I woke up at 3AM to catch a Premier League or Champions League game off of a reddit stream. But I didn’t just watch games, I took note of what happened. Interesting plays, stand out players, special goals etc. I started to expand the games I watched; J-league, MLS, USL, Liga MX and of course the ‘Big 5’ (Premier, La Liga, Bundes, Serie, Ligue 1). Coming to college in the US has also expanded my pool of data, as I started watching my own games as well as other NCAA games. Before I knew it, I had about 4 notebooks worth of sketches, notes, insights about different games and even training sessions that I was a part of or that I watched.
Started off simple
I also took advantage of free online data, such as Understat.com, the free data provided by Statsbomb and the thousands of different videos, articles and tweets regarding xG.
Once I set my eye on trying to replicate the few ‘real’ numbers available, I started out very simple. I separated the field into 35 different sections (as seen below) and just divided the number of goals scored from each spot by the number of shots taken.
This obviously was largely skewed and a terrible metric to follow, so I wanted to add a new dimension/variable that could affect this calculation. So I took into account which type/part of the body the shot was coming in from. Either dominant foot, weak foot, header or other. I did the same thing with these variables.
I then created two more variables in a similar fashion; pressure from defenders (1v1 situation? How many players does the shot need to go through? Player balanced/unbalanced?) and pass received (Positive pass, through ball on ground, cutback etc). Using these 4 variables in a terribly organized excel sheet, I created a very simple equation in which xG was calculated.
Obviously these numbers are only accurate within the right context. A large part of these numbers are based off of professional games that were televised and are also skewed by highschool, college level athletes. Nevertheless, this process has allowed me to recognize and watch the game from a whole new perspective. I have continued to add to these variables and now I have the field divided into 138 different sections with their own value (considering the distance and angle from goal), with increased sections in the 18 and 5 yard box, various types/qualities of passes and an increased awareness of outlying plays. For example, I initially calculated Gareth Bale’s overhead kick in the Champions League final against Liverpool as 0.117. However using common sense, we all know if 11.7% of overhead kicks from that spot went in, it wouldn’t be as special. But after adding to my variables, my current calculations give me a value of 0.098 or 9.8%. This still feels off, but I trust the creative instinct of players who will continue to attempt these outrageous shots to give me more data points. I also recently read about a theory called ‘Mean Reversion’ in my economics class, where prices/returns will revert to the mean in the long run.
So applying this concept, I hope the more data I collect, the more accurate they will become.
Now, what can I do with these numbers?
Not much really (haha). It was just fun for me to try to apply these numbers to different situations and take note of what happens. However, I would love to continue to invest in this and be able to apply this to the developmental level. xG may be mostly attributed to offensive skills, but through investing in xG against or even xA (expected assists), I think teams will be able to invest and develop players in a much more effective fashion.