ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

How I Created My Own xG Calculator

Updated on March 28, 2020
Hosanna Fukuzawa profile image

Not certified, not qualified, not verified. Just a kid with too much time in his hands.

Introduction to xG (expected goals)

Before you read, if you’re not familiar with xG, go watch this video made by Tifo Football.

Gathering data points and information

I’ve always been interested in technical/tactical analysis of the beautiful, fluid, multidimensional game, but often felt like there was a more effective way than just listening to pundits on TV analyzing the game with metrics like possession or shots on target. Although a lot has changed and statistical analysis of soccer has become largely popularized, I was still interested in trying to compute a more accurate representation of the game that will elevate the incredible tactical or individual ideas people have on the field.

Luckily, I had similar ideas to the people who created xG (expected goals) as a better, more rational metric of measuring the likelihood of individuals and teams to score goals. After I heard of xG a couple years ago, I was intrigued and was immediately enticed to attempt to replicate these numbers. But unlike major companies like Statsbomb or Opta, I don’t have the millions of data points off of thousands of games to analyze. I also don’t have a full team of qualified analysts to create sophisticated software to maximize my time in producing these numbers. So then, how did I do it?


The simple answer is that I watched a lot of games. And I mean a lot.


I remember in highschool (back home in Japan), I woke up at 3AM to catch a Premier League or Champions League game off of a reddit stream. But I didn’t just watch games, I took note of what happened. Interesting plays, stand out players, special goals etc. I started to expand the games I watched; J-league, MLS, USL, Liga MX and of course the ‘Big 5’ (Premier, La Liga, Bundes, Serie, Ligue 1). Coming to college in the US has also expanded my pool of data, as I started watching my own games as well as other NCAA games. Before I knew it, I had about 4 notebooks worth of sketches, notes, insights about different games and even training sessions that I was a part of or that I watched.

Notes on Indy Eleven vs NY Redbull 2
Notes on Indy Eleven vs NY Redbull 2

Started off simple

I also took advantage of free online data, such as Understat.com, the free data provided by Statsbomb and the thousands of different videos, articles and tweets regarding xG.

Once I set my eye on trying to replicate the few ‘real’ numbers available, I started out very simple. I separated the field into 35 different sections (as seen below) and just divided the number of goals scored from each spot by the number of shots taken.

The initial 35 sections
The initial 35 sections

This obviously was largely skewed and a terrible metric to follow, so I wanted to add a new dimension/variable that could affect this calculation. So I took into account which type/part of the body the shot was coming in from. Either dominant foot, weak foot, header or other. I did the same thing with these variables.

I then created two more variables in a similar fashion; pressure from defenders (1v1 situation? How many players does the shot need to go through? Player balanced/unbalanced?) and pass received (Positive pass, through ball on ground, cutback etc). Using these 4 variables in a terribly organized excel sheet, I created a very simple equation in which xG was calculated.

Obviously these numbers are only accurate within the right context. A large part of these numbers are based off of professional games that were televised and are also skewed by highschool, college level athletes. Nevertheless, this process has allowed me to recognize and watch the game from a whole new perspective. I have continued to add to these variables and now I have the field divided into 138 different sections with their own value (considering the distance and angle from goal), with increased sections in the 18 and 5 yard box, various types/qualities of passes and an increased awareness of outlying plays. For example, I initially calculated Gareth Bale’s overhead kick in the Champions League final against Liverpool as 0.117. However using common sense, we all know if 11.7% of overhead kicks from that spot went in, it wouldn’t be as special. But after adding to my variables, my current calculations give me a value of 0.098 or 9.8%. This still feels off, but I trust the creative instinct of players who will continue to attempt these outrageous shots to give me more data points. I also recently read about a theory called ‘Mean Reversion’ in my economics class, where prices/returns will revert to the mean in the long run.

So applying this concept, I hope the more data I collect, the more accurate they will become.

Moving forward

Now, what can I do with these numbers?

Not much really (haha). It was just fun for me to try to apply these numbers to different situations and take note of what happens. However, I would love to continue to invest in this and be able to apply this to the developmental level. xG may be mostly attributed to offensive skills, but through investing in xG against or even xA (expected assists), I think teams will be able to invest and develop players in a much more effective fashion.

working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)