8

Simulating the FIFA World Cup 2022

 3 years ago
source link: https://towardsdatascience.com/simulating-the-fifa-world-cup-2022-d363fad7da22
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Simulating the FIFA World Cup 2022

Who does the data choose to win the largest international football tournament yet?

Image for post
Image for post
Image by Michal Jarmoluk from Pixabay.

The grandest and most exciting of all football tournaments is still a ways off (2022), but in times like these I find solace in the fact that there are better things (like the next World Cup) that are rapidly approaching with every day that passes. The question on everyone’s mind is always: who wins? My mission is to see what the data says. If you’d like to follow along, the code can be found on my GitHub here.

Starting from the bottom: The Qualifiers.

Electronic Arts (the makers of the “FIFA” video game series) has made a lot of money from quantifying player skill. They have an entire methodology for numerically ranking every player that takes into account variables like weak-footed rating, shot accuracy, etc. For this study, I used their data from FIFA 20 (which came out in 2019).

I automatically disregarded any countries that had less than 15 players represented in the database (starting 11 + 4 reserves), then took the average rank of the top 30 players (or however many they have if it was less than 30) to form a “national average” score. I used this along with FIFA’s qualification rules to choose the countries that should qualify based on the quality of their individual players.

Here are the 32 teams that made it into the draws:

Image for post
Image for post
The 32 qualifying teams (italicized teams qualified through special means).

Qatar automatically qualifies because they are the host nation and Chile/Saudi Arabia “won” the inter-confederation matches to clinch the last two spots in the 2022 World Cup. Notice that the OFC (Oceania) is not represented here. This is because the best Oceania team still has to play the other teams in the inter-confederation matches. In this case, the representative from Oceania is New Zealand, which loses out to Chile according to the data.

Interestingly enough, Canada has never played in the World Cup since 1986 and Turkey/China PR have not qualified since 2002. The rest of the results seem fairly believable!

Let’s take it a step further.

Rising tensions: The Group Stage.

The Group Stage is based on random draws. Each of the eight groups (A through H) will contain four teams (which are not known until the official FIFA World Cup 2022 drawing). To account for this randomness, I performed a Monte Carlo simulation in which I randomize the groups 5,000 times and get the probability of each team making it into the elimination round.

It should be noted that I added a little bit of randomness to each team since their day-of performance could be slightly better or worse than what is expected of them. Below are the results.

Image for post
Image for post
Percentage of the 5000 simulations in which each country advanced to the Round of 16.

The usual European contenders have the highest chance of entering the Round of 16. It is useful to note that Spain has passed the group stage in the last five world cups (100.0% vs. model’s 99.7%) while Colombia advanced in three of their last five (60.0% vs. model’s 64.4%). In other words, it would seem there is some historical backing to this method of evaluation using Electronic Arts’ individual player rankings.

International glory: The Round of 16

To supplement my predictions for the most important of stages in the World Cup, I pulled some historical head-to-head data for each international team and used it in conjunction with the player data and the same randomness function as in the Group Stage to simulate the entire tournament.

I tallied the number of times (of 5,000 simulations) a team would progress and arrived at the following.

Image for post
Image for post
Percentage of the 5000 simulations won by each country.

Keep in mind, these are not probabilities of winning — they’re just the amount of times each team won out of the 5000 simulations. Brazil, for example, won 950 of those 5000. The USA won 1 of those 5000. They are, however, approximations of the odds a particular team wins. In this particular simulation, Brazil wins — but only exceeds Spain’s win count by 3.4%. A little more game-day anxiety on Brazil’s side and it might be Spain that snatches the win away. Some sports websites, like Goal.com, place France and Brazil at the top. Of course, these are just predictions — simplifications of an extremely complicated tournament with a lot of variables.

Who do you think wins the World Cup?


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK