The Dixon-Coles model for predicting football matches in R (part 2)

Posted on November 25, 2014

Part 1 ended with running the optimizer function to estimate the parameters in the model:

library(alabama)

res <- auglag(par=par.inits, fn=DCoptimFn, heq=DCattackConstr, DCm=dcm)

# Take a look at the parameters

res$par

In part 1 I fitted the model to data from the 2011-12 Premier League season. Now it’s time to use the model to make a prediction. As an example I will predict the result of Bolton playing at home against Blackburn.

The first thing we need to do is to calculate the mu and lambda parameters, which is (approximately anyway) the expected number of goals scored by the home and away team. To do this wee need to extract the correct parameters from the res$par vector. Recall that I in the last post gave the parameters informative names that consists of the team name prefixed by either Attack or Defence.
Also notice that I have to multiply the team parameters and then exponentiate the result to get the correct answer.

Update: For some reason I got the idea that the team parameters should be multiplied together, instead of added together, but I have now fixed the code and the results.

# Expected goals home

lambda <- exp(res$par['HOME'] + res$par['Attack.Bolton'] + res$par['Defence.Blackburn'])

# Expected goals away

mu <- exp(res$par['Attack.Blackburn'] + res$par['Defence.Bolton'])

We get that Bolton is expected to score 2.07 goals and Blackburn is expected to score 1.59 goals.

Since the model assumes dependencies between the number of goals scored by the two teams, it is insufficient to just plug the lambda and mu parameters into R’s built-in Poisson function to get the probabilities for the number of goals scored by the two teams. We also need to incorporate the adjustment for the low-scoring results as well. One strategy to do this is to first create a matrix based on the simple independent Poisson distributions:

maxgoal <- 6 # will be useful later

probability_matrix <- dpois(0:maxgoal, lambda) %*% t(dpois(0:maxgoal, mu))

The number of home goals follows the vertical axis and the away goals follow the horizontal.

Now we can use the estimated dependency parameter rho to create a 2-by-2 matrix with scaling factors, that is then element-wise multiplied with the top left elements of the matrix calculated above:

Update: Thanks to Mike who pointed out a mistake in this code.

scaling_matrix <- matrix(tau(c(0,1,0,1), c(0,0,1,1), lambda, mu, res$par['RHO']), nrow=2)

probability_matrix[1:2, 1:2] <- probability_matrix[1:2, 1:2] * scaling_matrix

With this matrix it is easy to calculate the probabilities for the three match outcomes:

HomeWinProbability <- sum(probability_matrix[lower.tri(probability_matrix)])

DrawProbability <- sum(diag(probability_matrix))

AwayWinProbability <- sum(probability_matrix[upper.tri(probability_matrix)])

This gives a probability of 0.49 for home win, 0.21 for draw and 0.29 for away win.

Calculating the probabilities for the different goal differences is a bit trickier. The probabilities for each goal difference can be found by adding up the numbers on the diagonals, with the sum of the main diagonal being the probability of a draw.

awayG <- numeric(maxgoal)

for (gg in 2:maxgoal){

awayG[gg-1] <- sum(diag(probability_matrix[,gg:(maxgoal+1)]))

}

awayG[maxgoal] <- probability_matrix[1,(maxgoal+1)]

homeG <- numeric(maxgoal)

for (gg in 2:maxgoal){

homeG[gg-1] <- sum(diag(probability_matrix[gg:(maxgoal+1),]))

}

homeG[maxgoal] <- probability_matrix[(maxgoal+1),1]

goaldiffs <- c(rev(awayG), sum(diag(probability_matrix)), homeG)

names(goaldiffs) <- -maxgoal:maxgoal

It is always nice to plot the probability distribution:

We can also see compare this distribution with the distribution without the Dixon-Coles adjustment (i.e. the goals scored by the two teams are independent):

As expected, we see that the adjustment gives higher probability for draw, and lower probabilities for goal differences of one goal.

The Dixon-Coles model for predicting football matches in R (part 2)

The Dixon-Coles model for predicting football matches in R (part 2)

Recommend

The Dixon-Coles model for predicting football matches in R (part 1)

Underdispersed Poisson alternatives seem to be better at predicting football res...

The Dixon-Coles approach to time-weighted Poisson regression

Dixon-Coles model | opisthokonta.net

Predicting football results with Poisson regression pt. 2

The Dixon-Coles model for predicting football matches in R (part 3)

Predicting Football Clubs Winning Percentage in the English Premier League Using...

Finally, on CBS, the football matches the business cards

Football Manager 2022 Mobile launches with a fresh set of mechanics to jump into...

A simple re-implementation of the Dixon-Coles model

About Joyk