6

The Dixon-Coles model for predicting football matches in R (part 2)

 3 years ago
source link: http://opisthokonta.net/?p=913
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Dixon-Coles model for predicting football matches in R (part 2)

Posted on November 25, 2014

Part 1 ended with running the optimizer function to estimate the parameters in the model:

library(alabama)
res <- auglag(par=par.inits, fn=DCoptimFn, heq=DCattackConstr, DCm=dcm)
# Take a look at the parameters
res$par

In part 1 I fitted the model to data from the 2011-12 Premier League season. Now it’s time to use the model to make a prediction. As an example I will predict the result of Bolton playing at home against Blackburn.

The first thing we need to do is to calculate the mu and lambda parameters, which is (approximately anyway) the expected number of goals scored by the home and away team. To do this wee need to extract the correct parameters from the res$par vector. Recall that I in the last post gave the parameters informative names that consists of the team name prefixed by either Attack or Defence.
Also notice that I have to multiply the team parameters and then exponentiate the result to get the correct answer.

Update: For some reason I got the idea that the team parameters should be multiplied together, instead of added together, but I have now fixed the code and the results.

# Expected goals home
lambda <- exp(res$par['HOME'] + res$par['Attack.Bolton'] + res$par['Defence.Blackburn'])
# Expected goals away
mu <- exp(res$par['Attack.Blackburn'] + res$par['Defence.Bolton'])

We get that Bolton is expected to score 2.07 goals and Blackburn is expected to score 1.59 goals.

Since the model assumes dependencies between the number of goals scored by the two teams, it is insufficient to just plug the lambda and mu parameters into R’s built-in Poisson function to get the probabilities for the number of goals scored by the two teams. We also need to incorporate the adjustment for the low-scoring results as well. One strategy to do this is to first create a matrix based on the simple independent Poisson distributions:

maxgoal <- 6 # will be useful later
probability_matrix <- dpois(0:maxgoal, lambda) %*% t(dpois(0:maxgoal, mu))

The number of home goals follows the vertical axis and the away goals follow the horizontal.

Now we can use the estimated dependency parameter rho to create a 2-by-2 matrix with scaling factors, that is then element-wise multiplied with the top left elements of the matrix calculated above:

Update: Thanks to Mike who pointed out a mistake in this code.

scaling_matrix <- matrix(tau(c(0,1,0,1), c(0,0,1,1), lambda, mu, res$par['RHO']), nrow=2)
probability_matrix[1:2, 1:2] <- probability_matrix[1:2, 1:2] * scaling_matrix

With this matrix it is easy to calculate the probabilities for the three match outcomes:

HomeWinProbability <- sum(probability_matrix[lower.tri(probability_matrix)])
DrawProbability <- sum(diag(probability_matrix))
AwayWinProbability <- sum(probability_matrix[upper.tri(probability_matrix)])

This gives a probability of 0.49 for home win, 0.21 for draw and 0.29 for away win.

Calculating the probabilities for the different goal differences is a bit trickier. The probabilities for each goal difference can be found by adding up the numbers on the diagonals, with the sum of the main diagonal being the probability of a draw.

awayG <- numeric(maxgoal)
for (gg in 2:maxgoal){
awayG[gg-1] <- sum(diag(probability_matrix[,gg:(maxgoal+1)]))
}
awayG[maxgoal] <- probability_matrix[1,(maxgoal+1)]
homeG <- numeric(maxgoal)
for (gg in 2:maxgoal){
homeG[gg-1] <- sum(diag(probability_matrix[gg:(maxgoal+1),]))
}
homeG[maxgoal] <- probability_matrix[(maxgoal+1),1]
goaldiffs <- c(rev(awayG), sum(diag(probability_matrix)), homeG)
names(goaldiffs) <- -maxgoal:maxgoal

It is always nice to plot the probability distribution:

We can also see compare this distribution with the distribution without the Dixon-Coles adjustment (i.e. the goals scored by the two teams are independent):

As expected, we see that the adjustment gives higher probability for draw, and lower probabilities for goal differences of one goal.


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK