Bradley–Terry model

From Wikipedia, the free encyclopedia

The Bradley–Terry model is a probability model that can predict the outcome of a paired comparison. Given a pair of individuals i and j drawn from some population, it estimates the probability that the pairwise comparison i > j turns out true, as

P(i>j)=pipi+pj{\displaystyle P(i>j)={\frac {p_{i}}{p_{i}+p_{j}}}} $P(i>j)={\frac {p_{i}}{p_{i}+p_{j}}}$

where pi is a positive real-valued score assigned to individual i. The comparison i > j can be read as "i is preferred to j", "i ranks higher than j", or "i beats j", depending on the application.

For example, pi may represent the skill of a team in a sports tournament, estimated from the number of times i has won a match. P(i>j){\displaystyle P(i>j)} $P(i>j)$ then represents the probability that i will win a match against j.[1][2] Another example used to explain the model's purpose is that of scoring products in a certain category by quality. While it's hard for a person to draft a direct ranking of (many) brands of wine, it may be feasible to compare a sample of pairs of wines and say, for each pair, which one is better. The Bradley–Terry model can then be used to derive a full ranking.[2]

History and applications[edit]

The model is named after R. A. Bradley and M. E. Terry,[3] who presented it in 1952,[4] although it had already been studied by Zermelo in the 1920s.[1][5][6]

Real-world applications of the model include estimation of the influence of statistical journals, or ranking documents by relevance in machine-learned search engines.[7] In the latter application, P(i>j){\displaystyle P(i>j)} $P(i>j)$ may reflect that document i is more relevant to the user's query than document j, so it should be displayed earlier in the results list. The individual pi then express the relevance of the document, and can be estimated from the frequency with which users click particular "hits" when presented with a result list.[8]

Definition[edit]

The Bradley–Terry model can be parametrized in various ways. One way to do so is to pick a single parameter per observation, leading to a model of n parameters p1, ..., pn.[9] Another variant, in fact the version considered by Bradley and Terry,[2] uses exponential score functions pi=eβi{\displaystyle p_{i}=e^{\beta _{i}}} $p_{i}=e^{{\beta _{i}}}$ so that

P(i>j)=eβieβi+eβj{\displaystyle P(i>j)={\frac {e^{\beta _{i}}}{e^{\beta _{i}}+e^{\beta _{j}}}}} $P(i>j)={\frac {e^{{\beta _{i}}}}{e^{{\beta _{i}}}+e^{{\beta _{j}}}}}$

or, using the logit (and disallowing ties),[1]

logit⁡(P(i>j))=log⁡(P(i>j)1−P(i>j))=log⁡(P(i>j)P(j>i))=βi−βj{\displaystyle \operatorname {logit} (P(i>j))=\log \left({\frac {P(i>j)}{1-P(i>j)}}\right)=\log \left({\frac {P(i>j)}{P(j>i)}}\right)=\beta _{i}-\beta _{j}} $\operatorname {logit} (P(i>j))=\log \left({\frac {P(i>j)}{1-P(i>j)}}\right)=\log \left({\frac {P(i>j)}{P(j>i)}}\right)=\beta _{i}-\beta _{j}$

reducing the model to logistic regression on pairs of individuals.

Estimating the parameters[edit]

The following algorithm computes the parameters pi of the basic version of the model from a sample of observations. Formally, it computes a maximum likelihood estimate, i.e., it maximizes the likelihood of the observed data. The algorithm dates back to the work of Zermelo.[1]

The observations required are the outcomes of previous comparisons, for example, pairs (i, j) where i beats j. Summarizing these outcomes as wij, the number of times i has beaten j, we obtain the log-likelihood of the parameter vector p = p1, ..., pn as[1]

L(p)=∑in∑jnwijln⁡pi−wijln⁡(pi+pj).{\displaystyle L(\mathbf {p} )=\sum _{i}^{n}\sum _{j}^{n}w_{ij}\ln p_{i}-w_{ij}\ln(p_{i}+p_{j}).} $L({\mathbf {p}})=\sum _{i}^{n}\sum _{j}^{n}w_{{ij}}\ln p_{i}-w_{{ij}}\ln(p_{i}+p_{j}).$

Denote the number of comparisons "won" by i as Wi. Starting from an arbitrary vector p, the algorithm iteratively performs the update

pi′=Wi(∑j≠iwij+wjipi+pj)−1{\displaystyle p'_{i}=W_{i}\left(\sum _{j\neq i}{\frac {w_{ij}+w_{ji}}{p_{i}+p_{j}}}\right)^{-1}} $p'_{i}=W_{i}\left(\sum _{j\neq i}{\frac {w_{ij}+w_{ji}}{p_{i}+p_{j}}}\right)^{-1}$

for all i. After computing all of the new parameters, they should be renormalized,

pi←pi′∑j=1npj′.{\displaystyle p_{i}\leftarrow {\frac {p'_{i}}{\sum _{j=1}^{n}p'_{j}}}.} $p_{i}\leftarrow {\frac {p'_{i}}{\sum _{j=1}^{n}p'_{j}}}.$

This estimation procedure improves the log-likelihood in every iteration, and eventually converges to a unique maximum.

References[edit]

^ Jump up to: a b c d e Hunter, David R. (2004). "MM algorithms for generalized Bradley–Terry models". The Annals of Statistics. 32 (1): 384–406. CiteSeerX 10.1.1.110.7878. doi:10.1214/aos/1079120141. JSTOR 3448514.
^ Jump up to: a b c Agresti, Alan (2014). Categorical Data Analysis. John Wiley & Sons. pp. 436–439.
^ E.E.M. van Berkum. "Bradley-Terry model". Encyclopedia of Mathematics. Retrieved 18 November 2014.
^ Bradley, Ralph Allan; Terry, Milton E. (1952). "Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons". Biometrika. 39 (3/4): 324–345. doi:10.2307/2334029. JSTOR 2334029.
^ Zermelo, Ernst (1929). "Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung". Mathematische Zeitschrift. 29 (1): 436–460. doi:10.1007/BF01180541.
^ Heinz-Dieter Ebbinghaus (2007), Ernst Zermelo: An Approach to His Life and Work, pp. 268–269, ISBN 9783540495536
^ Szummer, Martin; Yilmaz, Emine (2011). Semi-supervised learning to rank with preference regularization (PDF). CIKM.
^ Radlinski, Filip; Joachims, Thorsten (2007). Active Exploration for Learning Rankings from Clickthrough Data (PDF). KDD '07 Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 570–579. doi:10.1145/1281192.1281254.
^ Fangzhao Wu; Jun Xu; Hang Li; Xin Jiang (2014). Ranking Optimization with Constraints. CIKM '14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. pp. 1049–1058. doi:10.1145/2661829.2661895.

Bradley–Terry model

Bradley–Terry model

History and applications[edit]

Definition[edit]

Estimating the parameters[edit]

See also[edit]

References[edit]

Recommend

Two Bayesian regression models for football results

iFixit拆解AirPods Max：内部复杂不易拆解配两个电池

JSTOR: Access Check

The Dixon-Coles model for predicting football matches in R (part 1)

Prediction and Retrospective Analysis of Soccer Matches in a League

A small adjustment to the Poisson model that improves predictions.

[1802.08848] Combining historical data and bookmakers'odds in modelling football...

Get expected goals from probabilities · GitHub

The Betting Odds Rating System: Using soccer forecasts to forecast soccer

Hierarchical Prism Trees for Scalable Time Geographic Analysis

About Joyk