JUN 20, 2018:
It's been close to a week since FIFA World Cup 2018 started. We have already seen almost as many upsets as there werefancy advanced models pre-tournament trying to predict the outcomes. There has been a flurry of articles and scholarly papers using Artificial Intelligence and Machine Learning to predict the tournament results this year. Included in that list are the usual suspects such as FiveThirtyEight and bookmakers, as well as unlikely participants such as Goldman Sachs and Cornell University (maybe not that unlikely).
It's been close to a week since FIFA World Cup 2018 started. We have already seen almost as many upsets as there were
Inspired by these articles, some heated arguments, a few cups of coffee, and with my trusted Microsoft Excel, I set about creating my own not-so-intelligent predictor. It's a fairly simple model that runs Monte Carlo simulations, and uses some hard coded inputs on pre-tournament form, such as:
- FIFA Ranking Points
- Football ELO Ratings
- Goals Scored, Goal Conceded and Undefeated streaks since World Cup 2014
- Number of Ballon d'Or winners in the team
I started on this passion project on Monday night and ran some vanilla simulation to see how different the results are from the other advanced models (spoiler alert: they are not really). And maybe not everything in the world needs a machine learning tadka.
RESULTS (PRE TOURNAMENT PREDICTIONS)
If I had been living under a rock over the past week and didn't know the results of the matches already played, based on the model I would have said Brazil has the best chance to win the World Cup by defeating Germany in the Final. Overall probability for Brazil is 12% (which is significantly higher than 3% chance they would have had if all matches were decided by coin tosses).
If I had been living under a rock over the past week and didn't know the results of the matches already played, based on the model I would have said Brazil has the best chance to win the World Cup by defeating Germany in the Final. Overall probability for Brazil is 12% (which is significantly higher than 3% chance they would have had if all matches were decided by coin tosses).
The knockout stage would have probably looked like this:
THE FUN STUFF: MATCH DAY 01
The real interesting part comes in when you start coding in the results as the matches happen, and see how the draw changes. After Match Day 01, which ended with the Poland vs Senegal match on June 19th, the draw looks quite different for two reasons.
1. Germany's loss to Mexico means that they are more likely to finish second in their group, and that leads to a R16 showdown with Brazil (Final come early, yay).
2. Argentina's draw with Iceland, coupled with Croatia's win over Nigeria means Argentina are also likely to finish second in their group and end up facing France in R16.
Both these results combined mean the bottom half of the draw may completely open up. There is a not-so-unrealistic path to the Semis for England, Switzerland, Mexico and Senegal. Brazil is still most likely to win, but their probability has come down to 10% because they now have a more difficult match-up with Germany in R16.
THE FUN STUFF: MATCH DAY 01
The real interesting part comes in when you start coding in the results as the matches happen, and see how the draw changes. After Match Day 01, which ended with the Poland vs Senegal match on June 19th, the draw looks quite different for two reasons.
1. Germany's loss to Mexico means that they are more likely to finish second in their group, and that leads to a R16 showdown with Brazil (Final come early, yay).
2. Argentina's draw with Iceland, coupled with Croatia's win over Nigeria means Argentina are also likely to finish second in their group and end up facing France in R16.
Both these results combined mean the bottom half of the draw may completely open up. There is a not-so-unrealistic path to the Semis for England, Switzerland, Mexico and Senegal. Brazil is still most likely to win, but their probability has come down to 10% because they now have a more difficult match-up with Germany in R16.
-----
UPDATE JUNE 25 2018: END OF MATCH DAY 02
UPDATE JUNE 25 2018: END OF MATCH DAY 02
As we head into the find round of group games, here is what the knock-out grid looks like. I have made an update to the model to take into account tournament from. This is done by maintaining a live track of ELO-like ratings based on match results.
Some of the groups are very close to call.
- In group D, Croatia have qualified, but are not guaranteed top spot. Argentina has a 50% chance of qualifying, followed by Nigeria at 35% and Iceland at 15%.
- In group F, anyone can qualify. Mexico have the best odds at 80%, followed by Germany at 65%, Sweden at 45% and Korea at 10%. Mexico - Germany is the most likely 1-2.
- Group G is practically a coin toss in who finishes first. The model gives a slight edge to England.
As we had already seen, the bottom half of the draw is much more favorable. It will be interesting to see if the two teams actively try to finish second.
Fun fact: If the Belgium-England game ends in a draw, the winner will be based on Fair Play points. England currently have one fewer yellow card than Belgium. - In Group H too, anyone other than Poland can qualify. Japan has the best odds of qualifying, following Senegal and Columbia.
Overall, Brazil is still the favorite to win. Depending on how the 1/2 rank plays out across the groups, any of Portugal, Spain, England or Belgium can make the finals.
Assuming we flip the order of group winners in Groups B and G, then the draw looks something like this:
I'll update the odds at the end of Match Day 03 at which time the R16 matches will be locked in.
_______
_______
UPDATE JULY 2nd 2018: END OF MATCH DAY 03
A super late post again. A part of the tardiness was caused by the results that I was getting. Based on some of the results in Round 3, the model ends up picking Belgium as the eventual winner. So, I was making sure there were no errors. I did find one (I had coded the result of BRA-SUI match incorrectly), but it did not change model results.
Numerically, though, I can see why Belgium are favourites - they beat England in Groups, and gained a lot of ELO points. As a result their tournament form is assessed to be higher than that of Brazil - which isn't that much of a surprise. But even if I were a betting man (and I am not), I would not bet all my money on Belgium winning it.
Numerically, though, I can see why Belgium are favourites - they beat England in Groups, and gained a lot of ELO points. As a result their tournament form is assessed to be higher than that of Brazil - which isn't that much of a surprise. But even if I were a betting man (and I am not), I would not bet all my money on Belgium winning it.
This is what the model churned out in the end. Other than the Belgium predictions, I can fairly happy with the results so far. This in spite the fact that two out of four predictions have already been proven wrong. But Spain really shouldn't have lost to Russia. And I really believe Mexico have a better than 18% chance of beating Brazil that FiveThirtyEight has attributed.
___
UPDATE JULY 5th 2018: END OF R16
Finally a post on time, before the first QF kicks of tomorrow between Uruguay and France.
Spain's upset loss against Russia means a lot of churn in the bottom half of the table, leading to Croatia making the final. In the top half, a close game for Belgium and a comfortable win for Brazil means things are a little bit closer than the last run of the model. Belgium still picked to go through and win it all, if only by a whisker against Brazil.
Overall Belgium have the highest probability to win the cup, followed by Brazil, Croatia, England, and France - in that order.
UPDATE JULY 5th 2018: END OF R16
Finally a post on time, before the first QF kicks of tomorrow between Uruguay and France.
Spain's upset loss against Russia means a lot of churn in the bottom half of the table, leading to Croatia making the final. In the top half, a close game for Belgium and a comfortable win for Brazil means things are a little bit closer than the last run of the model. Belgium still picked to go through and win it all, if only by a whisker against Brazil.
Overall Belgium have the highest probability to win the cup, followed by Brazil, Croatia, England, and France - in that order.
Comments
Post a Comment