Skip to main content

FIFA World Cup 2018: A not-so-artificially-intelligent predictor

JUN 20, 2018:
It's been close to a week since FIFA World Cup 2018 started. We have already seen almost as many upsets as there were fancy advanced models pre-tournament trying to predict the outcomes. 
There has been a flurry of articles and scholarly papers using Artificial Intelligence and Machine Learning to predict the tournament results this year. Included in that list are the usual suspects such as FiveThirtyEight and bookmakers, as well as unlikely participants such as Goldman Sachs and Cornell University (maybe not that unlikely).

Inspired by these articles, some heated arguments, a few cups of coffee, and with my trusted Microsoft Excel, I set about creating my own not-so-intelligent predictorIt's a fairly simple model that runs Monte Carlo simulations, and uses some hard coded inputs on pre-tournament form, such as:
  • FIFA Ranking Points
  • Football ELO Ratings
  • Goals Scored, Goal Conceded and Undefeated streaks since World Cup 2014
  • Number of Ballon d'Or winners in the team

I started on this passion project on Monday night and ran some vanilla simulation to see how different the results are from the other advanced models (spoiler alert: they are not really). And maybe not everything in the world needs a machine learning tadka.

If I had been living under a rock over the past week and didn't know the results of the matches already played, based on the model I would have said Brazil has the best chance to win the World Cup by defeating Germany in the Final. Overall probability for Brazil is 12% (which is significantly higher than 3% chance they would have had if all matches were decided by coin tosses).

The knockout stage would have probably looked like this:

The real interesting part comes in when you start coding in the results as the matches happen, and see how the draw changes. After Match Day 01, which ended with the Poland vs Senegal match on June 19th, the draw looks quite different for two reasons.

1. Germany's loss to Mexico means that they are more likely to finish second in their group, and that leads to a R16 showdown with Brazil (Final come early, yay). 
2. Argentina's draw with Iceland, coupled with Croatia's win over Nigeria means Argentina are also likely to finish second in their group and end up facing France in R16. 

Both these results combined mean the bottom half of the draw may completely open up. There is a not-so-unrealistic path to the Semis for England, Switzerland, Mexico and Senegal. Brazil is still most likely to win, but their probability has come down to 10% because they now have a more difficult match-up with Germany in R16.



As we head into the find round of group games, here is what the knock-out grid looks like. I have made an update to the model to take into account tournament from. This is done by maintaining a live track of ELO-like ratings based on match results.

Some of the groups are very close to call. 
  • In group D, Croatia have qualified, but are not guaranteed top spot. Argentina has a 50% chance of qualifying, followed by Nigeria at 35% and Iceland at 15%.
  • In group F, anyone can qualify. Mexico have the best odds at 80%, followed by Germany at 65%, Sweden at 45% and Korea at 10%. Mexico - Germany is the most likely 1-2.
  • Group G is practically a coin toss in who finishes first. The model gives a slight edge to England.
    As we had already seen, the bottom half of the draw is much more favorable. It will be interesting to see if the two teams actively try to finish second.
    Fun fact: If the Belgium-England game ends in a draw, the winner will be based on Fair Play points. England currently have one fewer yellow card than Belgium.
  • In Group H too, anyone other than Poland can qualify. Japan has the best odds of qualifying, following Senegal and Columbia.

Overall, Brazil is still the favorite to win. Depending on how the 1/2 rank plays out across the groups, any of Portugal, Spain, England or Belgium can make the finals.
Assuming we flip the order of group winners in Groups B and G, then the draw looks something like this:

I'll update the odds at the end of Match Day 03 at which time the R16 matches will be locked in.


A super late post again. A part of the tardiness was caused by the results that I was getting. Based on some of the results in Round 3, the model ends up picking Belgium as the eventual winner. So, I was making sure there were no errors. I did find one (I had coded the result of BRA-SUI match incorrectly), but it did not change model results.

Numerically, though, I can see why Belgium are favourites - they beat England in Groups, and gained a lot of ELO points. As a result their tournament form is assessed to be higher than that of Brazil - which isn't that much of a surprise. But even if I were a betting man (and I am not), I would not bet all my money on Belgium winning it.

This is what the model churned out in the end. Other than the Belgium predictions, I can fairly happy with the results so far. This in spite the fact that two out of four predictions have already been proven wrong. But Spain really shouldn't have lost to Russia. And I really believe Mexico have a better than 18% chance of beating Brazil that FiveThirtyEight has attributed.

UPDATE JULY 5th 2018: END OF R16

Finally a post on time, before the first QF kicks of tomorrow between Uruguay and France.

Spain's upset loss against Russia means a lot of churn in the bottom half of the table, leading to Croatia making the final. In the top half, a close game for Belgium and a comfortable win for Brazil means things are a little bit closer than the last run of the model. Belgium still picked to go through and win it all, if only by a whisker against Brazil.
Overall Belgium have the highest probability to win the cup, followed by Brazil, Croatia, England, and France - in that order.


Popular posts from this blog

I Smell Gold : This time it's personal

Won my first individual event ever in college. Since no one else was blowing my horn, decided to do that myself. So here is my gold winning speech :) Since you might get bored half way through the speech, let me thank the people who need to be thanked right now itself: Apurva, Myth, Zoo, Xar, Kamra, Harsha and even Bishnoi :) Motion: Increasing national security and surveillance is a cosmetic response to any extremist activity in a democratic and tolerant society . :::::::::::::::::::::::::::::: Let me start with a very clich├ęd “Picture this”. I am a 7 year old boy and I behave like any 7 year old does. I hate milk and I throw a tantrum when I see a glass full of that filthy white thing. As a seven year old I probably don’t know the words tantrum and filthy, but this ignorance does not save me from the wrath of my father. He takes less than 45 seconds to get that milk down my throat. I certainly don’t like it. Thankfully, sometime later my mother explains to me why I need that milk if

The Awesome Threesome

I expect the DC++ hoggers already know about "Three KGPians day out", well here is a new version of it. Four days before the end sem exams, and on the eve of the day which has three tests in store for them, three KGPians, decided to go out for a late night snack. Actually there wasnt much decision involved except for the place where they would be willing to hog down stuff. The local canteen won on the grounds that being the nearest, they would be WASTING much lesser time if they went there. The guftagu began, after the initial rite of ordering your stuff. Two Bread Butters, one beg sandwich, and a cup of tea. No maggie, no chowmein -- seriosly these people were low on budget. Before we get any further into their actual conversation, lets name the three dramatis personae. On account of confidentiality, they have requested that they be known by aliases. So lets call them MyTh, Quark and manGO. As the three waited for the food to arrive, manGO being in a counter reflective mood

.:ROBOTIX 2007:.

My message to the team ..... “If I need a cause for celebration Or a comfort I can use to ease my mind I rely on my imagination And I dream of an imaginary time.” That imaginary time has become real … When everything's been said and done.... When words run out ... and so does your ability to invent new ones to suit the moment To honor it ... to savor it What do you say ... it’s not the first time that I have been in this situation That night u guys asked for one line ... well keeda never speaks a single line ... and again well as i said I just write my blog So, what do I say about such an astonishing fest and such a great group of people ... well for one ... it was more than about just a fest ... more than about 546 teams and 1500 participants ....more than about 2 odd lakhs spent on it ... more than 4 dog tiring days and nights ... much much more than that. It was about self belief ... it was about a dream ... a dream come true in much bigger proport