The Dixon-Coles model for predicting football matches in R (part 2)
source link: http://opisthokonta.net/?p=913
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The Dixon-Coles model for predicting football matches in R (part 2)
Part 1 ended with running the optimizer function to estimate the parameters in the model:
library
(alabama)
res <-
auglag
(par=par.inits, fn=DCoptimFn, heq=DCattackConstr, DCm=dcm)
# Take a look at the parameters
res$par
In part 1 I fitted the model to data from the 2011-12 Premier League season. Now it’s time to use the model to make a prediction. As an example I will predict the result of Bolton playing at home against Blackburn.
The first thing we need to do is to calculate the mu and lambda parameters, which is (approximately anyway) the expected number of goals scored by the home and away team. To do this wee need to extract the correct parameters from the res$par vector. Recall that I in the last post gave the parameters informative names that consists of the team name prefixed by either Attack or Defence.
Also notice that I have to multiply the team parameters and then exponentiate the result to get the correct answer.
Update: For some reason I got the idea that the team parameters should be multiplied together, instead of added together, but I have now fixed the code and the results.
# Expected goals home
lambda <-
exp
(res$par[
'HOME'
] + res$par[
'Attack.Bolton'
] + res$par[
'Defence.Blackburn'
])
# Expected goals away
mu <-
exp
(res$par[
'Attack.Blackburn'
] + res$par[
'Defence.Bolton'
])
We get that Bolton is expected to score 2.07 goals and Blackburn is expected to score 1.59 goals.
Since the model assumes dependencies between the number of goals scored by the two teams, it is insufficient to just plug the lambda and mu parameters into R’s built-in Poisson function to get the probabilities for the number of goals scored by the two teams. We also need to incorporate the adjustment for the low-scoring results as well. One strategy to do this is to first create a matrix based on the simple independent Poisson distributions:
maxgoal <- 6
# will be useful later
probability_matrix <-
dpois
(0:maxgoal, lambda) %*%
t
(
dpois
(0:maxgoal, mu))
The number of home goals follows the vertical axis and the away goals follow the horizontal.
Now we can use the estimated dependency parameter rho to create a 2-by-2 matrix with scaling factors, that is then element-wise multiplied with the top left elements of the matrix calculated above:
Update: Thanks to Mike who pointed out a mistake in this code.
scaling_matrix <-
matrix
(
tau
(
c
(0,1,0,1),
c
(0,0,1,1), lambda, mu, res$par[
'RHO'
]), nrow=2)
probability_matrix[1:2, 1:2] <- probability_matrix[1:2, 1:2] * scaling_matrix
With this matrix it is easy to calculate the probabilities for the three match outcomes:
HomeWinProbability <-
sum
(probability_matrix[
lower.tri
(probability_matrix)])
DrawProbability <-
sum
(
diag
(probability_matrix))
AwayWinProbability <-
sum
(probability_matrix[
upper.tri
(probability_matrix)])
This gives a probability of 0.49 for home win, 0.21 for draw and 0.29 for away win.
Calculating the probabilities for the different goal differences is a bit trickier. The probabilities for each goal difference can be found by adding up the numbers on the diagonals, with the sum of the main diagonal being the probability of a draw.
awayG <-
numeric
(maxgoal)
for
(gg
in
2:maxgoal){
awayG[gg-1] <-
sum
(
diag
(probability_matrix[,gg:(maxgoal+1)]))
}
awayG[maxgoal] <- probability_matrix[1,(maxgoal+1)]
homeG <-
numeric
(maxgoal)
for
(gg
in
2:maxgoal){
homeG[gg-1] <-
sum
(
diag
(probability_matrix[gg:(maxgoal+1),]))
}
homeG[maxgoal] <- probability_matrix[(maxgoal+1),1]
goaldiffs <-
c
(
rev
(awayG),
sum
(
diag
(probability_matrix)), homeG)
names
(goaldiffs) <- -maxgoal:maxgoal
It is always nice to plot the probability distribution:
We can also see compare this distribution with the distribution without the Dixon-Coles adjustment (i.e. the goals scored by the two teams are independent):
As expected, we see that the adjustment gives higher probability for draw, and lower probabilities for goal differences of one goal.
Recommend
-
1
The Dixon-Coles model for predicting football matches in R (part 1) Posted on November 1, 2014 Please have a look at the improved code for this model that I ha...
-
7
Underdispersed Poisson alternatives seem to be better at predicting football results Posted on August 24, 2015 In the
-
13
The Dixon-Coles approach to time-weighted Poisson regression Posted on January 19, 2015 In the previous blog posts about predicting football results using Poisson...
-
8
Category Archives: Dixon-Coles model I have written a lot about different models for forecasting football results, and provided a lot of R code along the way. Especially popular are my posts about the
-
5
Predicting football results with Poisson regression pt. 2 Posted on March 7, 2013 In part 1 I wrote about the basics o...
-
74
Post navigation ← Previous Next → The Dixon-Coles model for predicting football matches in R (part 3) Post...
-
4
Intro One of the ultimate goals of sports analytics is to predict future outcomes based on some observed patterns in past events. In this article, I will review the problem of predicting the percentage of games won by a...
-
5
Finally, on CBS, the football matches the business cards John Teti October 01, 2021
-
7
Football Manager 2022 Mobile launches with a fresh set of mechanics to jump into matches faster By Matthew Sholtz Published 2 days ago...
-
2
A simple re-implementation of the Dixon-Coles model Posted on May 6, 2018 A couple of years ago...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK