PlackettLuce.Rd
Fit a PlackettLuce model to a set of rankings. The rankings may be partial (each ranking completely ranks a subset of the items) and include ties of arbitrary order.
PlackettLuce(rankings, npseudo = 0.5, normal = NULL, gamma = NULL, adherence = NULL, weights = freq(rankings), na.action = getOption("na.action"), start = NULL, method = c("iterative scaling", "BFGS", "LBFGS"), epsilon = 1e07, steffensen = 0.1, maxit = c(500, 10), trace = FALSE, verbose = TRUE, ...)
rankings  a 

npseudo  when using pseudodata: the number of wins and losses to add between each object and a hypothetical reference object. 
normal  a optional list with elements named 
gamma  a optional list with elements named 
adherence  an optional vector of adherence values for each ranker. If
missing, adherence is fixed to 1 for all rankers. If 
weights  an optional vector of weights for each ranking. 
na.action  a function to handle any missing rankings, see

start  starting values for the worth parameters and the tie parameters
on the raw scale (worth parameters need not be scaled to sum to 1). If

method  the method to be used for fitting: 
epsilon  the maximum absolute difference between the observed and expected sufficient statistics for the ability parameters at convergence. 
steffensen  a threshold defined as for 
maxit  a vector specifying the maximum number of iterations. If

trace  logical, if 
verbose  logical, if 
...  additional arguments passed to 
An object of class "PlackettLuce"
, which is a list containing the
following elements:
The matched call.
The model coefficients.
The maximized loglikelihood.
The maximized loglikelihood for the null model (all alternatives including ties have equal probability).
The residual degrees of freedom.
The residual degrees of freedom for the null model.
The rank of the model.
If a prior was specified, the maximised log posterior.
If a gamma prior was specified, the list of parameters.
If a normal prior was specified, the list of parameters.
The number of iterations run.
The rankings passed to rankings
, converted to a
"rankings"
object if necessary.
The weights applied to each ranking in the fitting.
The fixed or estimated adherence per ranker.
The ranker index mapping rankings to rankers (the
"index"
attribute of rankings
if specified as a
"grouped_rankings"
object.)
The maximum number of objects observed in a tie.
The convergence code: 0 for successful convergence; 1 if reached
maxit
(outer) iterations without convergence; 2 if Steffensen acceleration
cause loglikelihood to increase; negative number if LBFGS algorithm failed
for other reason.
As the maximum tie order increases, the number of possible choices for
each rank increases rapidly, particularly when the total number of items is
high. This means that the model will be slower to fit with higher \(D\).
In addition, due to the current implementation of the vcov()
method,
computation of the standard errors (as by summary()
) can take almost as
long as the model fit and may even become infeasible due to memory limits.
As a rule of thumb, for > 10 items and > 1000 rankings, we recommend
PlackettLuce()
for ties up to order 4. For higher order ties, a
rankordered logit model, see ROlogit::rologit()
or
generalized Mallows Model as in BayesMallows::compute_mallows()
may be
more suitable, as they do not model tied events explicitly.
A single ranking is given by $$R = \{C_1, C_2, \ldots, C_J\}$$ where the items in set \(C_1\) are ranked higher than (better than) the items in \(C_2\), and so on. If there are multiple objects in set \(C_j\) these items are tied in the ranking.
For a set if items \(S\), let $$f(S) = \delta_{S} \left(\prod_{i \in S} \alpha_i \right)^\frac{1}{S}$$ where \(S\) is the cardinality (size) of the set, \(\delta_n\) is a parameter related to the prevalence of ties of order \(n\) (with \(\delta_1 \equiv 1\)), and \(\alpha_i\) is a parameter representing the worth of item \(i\). Then under an extension of the PlackettLuce model allowing ties up to order \(D\), the probability of the ranking \(R\) is given by $$\prod_{j = 1}^J \frac{f(C_j)}{ \sum_{k = 1}^{\min(D_j, D)} \sum_{S \in {A_j \choose k}} f(S)}$$ where \(D_j\) is the cardinality of \(A_j\), the set of alternatives from which \(C_j\) is chosen, and \(A_j \choose k\) is all the possible choices of \(k\) items from \(A_j\). The value of \(D\) can be set to the maximum number of tied items observed in the data, so that \(\delta_n = 0\) for \(n > D\).
When the worth parameters are constrained to sum to one, they represent the probability that the corresponding item comes first in a ranking of all items, given that first place is not tied.
The 2way tie prevalence parameter \(\delta_2\) is related to the probability that two items of equal worth tie for first place, given that the first place is not a 3way or higher tie. Specifically, that probability is \(\delta_2/(2 + \delta_2)\).
The 3way and higher tieprevalence parameters are similarly interpretable, in terms of tie probabilities among equalworth items.
In order for the maximum likelihood estimate of an object's worth to be defined, the network of rankings must be strongly connected. This means that in every possible partition of the objects into two nonempty subsets, some object in the second set is ranked higher than some object in the first set at least once.
If the network of rankings is not strongly connected then pseudorankings
may be used to connect the network. This approach posits a hypothetical
object with logworth 0 and adds npseudo
wins and npseudo
losses to the set of rankings.
The parameter npseudo
is the prior strength. With npseudo = 0
the MLE is the posterior mode. As npseudo
approaches
infinity the logworth estimates all shrink towards 0. The default,
npseudo = 0.5
, is sufficient to connect the network and has a weak
shrinkage effect. Even for networks that are already connected, adding
pseudorankings typically reduces both the bias and variance of the
estimators of the worth parameters.
Prior information can be incorporated by using normal
to specify a
multivariate normal prior on the logworths. The logworths are then
estimated by maximum a posteriori (MAP) estimation. Model summaries (deviance, AIC,
standard errors) are based on the loglikelihood evaluated at the MAP
estimates, resulting in a finite sample bias that should disappear as
the number of rankings increases. Inference based on these model summaries
is valid as long as the prior is considered fixed and not tuned as part of
the model.
Incorporating a prior is an alternative method of penalization, therefore
npseudo
is set to zero when a prior is specified.
When rankings come from different rankers, the model can be extended to allow for varying reliability of the rankers, as proposed by Raman and Joachims (2014). In particular, replacing \(f(S)\) by $$h(S) = \delta_{S} \left(\prod_{i \in S} \alpha_i \right)^\frac{\eta_g}{S}$$ where \(\eta_g > 0\) is the adherence parameter for ranker \(g\). In the standard model, all rankers are assumed to have equal reliability, so \(\eta_g = 1\) for all rankers. Higher \(\eta_g = 1\) increases the distance between item worths, giving greater weight to the ranker's choice. Conversely, lower \(\eta_g = 1\) shrinks the item worths towards equality so the ranker's choice is less relevant.
The adherence parameters are not estimable by maximum likelihood, since
for given item worths the maximum likelihood estimate of adherence would be
infinity for rankers that give rankings consistent with the items ordered by
worth and zero for all other rankers. Therefore it is essential to include a
prior on the adherence parameters when these are estimated rather than fixed.
Setting gamma = TRUE
specifies the default
\(\Gamma(10,10)\) prior, which has a mean of
1 and a probability of 0.99 that the adherence is between 0.37 and 2.
Alternative parameters can be specified by a list with elements shape
and rate
. Setting scale and rate to a common value \(\theta\)
specifies a mean of 1; \(\theta \ge\) 2 will give low prior probability
to nearzero adherence; as \(\theta\) increases the density becomes
more concentrated (and more symmetrical) about 1.
Since the number of adherence parameters will typically be large and it is assumed the worth and tie parameters are of primary interest, the adherence parameters are not included in model summaries, but are included in the returned object.
For models without priors, using nspseudo = 0
will use standard
maximum likelihood, if the network is connected (and throw an error
otherwise).
The fitting algorithm is set by the method
argument. The default
method "iterative scaling"
is a slow but reliable approach. In
addition, this has the most control on the accuracy of the final fit, since
convergence is determined by direct comparison of the observed and expected
values of the sufficient statistics for the worth parameters, rather than a
tolerance on change in the loglikelihood.
The "iterative scaling"
algorithm is slow because it is a first order
method (does not use derivatives of the likelihood). From a set of starting
values that are 'close enough' to the final solution, the algorithm can be
accelerated using
Steffensen's method.
PlackettLuce
attempts to apply Steffensen's acceleration when all
differences between the observed and expected values of the sufficient
statistics are less than steffensen
. This is an adhoc rule defining
'close enough' and in some cases the acceleration may produce negative
worth parameters or decrease the loglikelihood. PlackettLuce
will
only apply the update when it makes an improvement.
The "BFGS"
and "LBFGS"
algorithms are second order methods,
therefore can be quicker than the default method. Control parameters can be
passed on to optim
or lbfgs
.
For models with priors, the iterative scaling method cannot be used, so BFGS is used by default.
Raman, K. and Joachims, T. (2014) Methods for Ordinal Peer Grading. arXiv:1404.3656.
Handling rankings: rankings
, aggregate
,
group
, choices
,
adjacency
, connectivity
.
Inspect fitted PlackettLuce models: coef
, deviance
,
fitted
, itempar
, logLik
, print
,
qvcalc
, summary
, vcov
.
Fit PlackettLuce tree: pltree
.
# Six partial rankings of four objects, 1 is top rank, e.g # first ranking: item 1, item 2 # second ranking: item 2, item 3, item 4, item 1 # third ranking: items 2, 3, 4 tie for first place, item 1 second R < matrix(c(1, 2, 0, 0, 4, 1, 2, 3, 2, 1, 1, 1, 1, 2, 3, 0, 2, 1, 1, 0, 1, 0, 3, 2), nrow = 6, byrow = TRUE) colnames(R) < c("apple", "banana", "orange", "pear") # create rankings object R < as.rankings(R) # Standard maximum likelihood estimates mod_mle < PlackettLuce(R, npseudo = 0) coef(mod_mle)#> apple banana orange pear tie2 tie3 #> 0.0000000 0.2942875 0.7335113 0.1190960 1.8619467 0.7369735#> apple banana orange pear tie2 tie3 #> 0.00000000 0.25287379 0.61350684 0.08688475 2.15068111 0.79245358# independent N(0, 9) priors on logworths, as in Raman and Joachims prior < list(mu = rep(0, ncol(R)), Sigma = diag(rep(9, ncol(R)))) mod_normal < PlackettLuce(rankings = R, normal = prior) # slightly weaker shrinkage effect vs pseudorankings, # with less effect on tie parameters (but note small number of rankings here) coef(mod_normal)#> apple banana orange pear tie2 tie3 #> 0.0000000 0.2753696 0.6772960 0.1030173 1.8679502 0.7453151# estimate adherence assuming every ranking is from a separate ranker mod_separate < PlackettLuce(rankings = R, normal = prior, gamma = TRUE) coef(mod_separate)#> apple banana orange pear tie2 tie3 #> 0.0000000 0.2310944 0.8436801 0.1654011 1.8637839 0.7405627# gives more weight to rankers 4 & 6 which rank apple first, # so worth of apple increased relative to banana mod_separate$adherence#> [1] 0.8889778 0.8732159 0.8876840 0.9431440 0.8818594 0.9506536# estimate adherence based on grouped rankings #  assume two rankings from each ranker G < group(R, rep(1:3, each = 2)) mod_grouped < PlackettLuce(rankings = G, normal = prior, gamma = TRUE) coef(mod_grouped)#> apple banana orange pear tie2 tie3 #> 0.0000000 0.2855958 0.7611189 0.1201862 1.8678374 0.7453447# first ranker is least consistent so downweighted mod_grouped$adherence#> [1] 0.8834957 0.9137346 0.9144347