Title: | Correlation Coefficients for Information Retrieval |
---|---|
Description: | Provides implementation of various correlation coefficients of common use in Information Retrieval. In particular, it includes Kendall (1970, isbn:0852641990) tau coefficient as well as tau_a and tau_b for the treatment of ties. It also includes Yilmaz et al. (2008) <doi:10.1145/1390334.1390435> tauAP correlation coefficient, and versions tauAP_a and tauAP_b developed by Urbano and Marrero (2017) <doi:10.1145/3121050.3121106> to cope with ties. |
Authors: | Julián Urbano [aut, cre], Mónica Marrero [aut] |
Maintainer: | Julián Urbano <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0 |
Built: | 2024-11-05 05:27:40 UTC |
Source: | https://github.com/julian-urbano/ircor |
Rank Correlation Coefficientstau
is the rank correlation coefficient by Kendall, where neither vector can contain tied
items. tau_a
and tau_b
are the versions developed to cope with ties under the
scenarios of accuracy and agreement, respectively. See the references for details.
tau(x, y) tau_a(x, y) tau_b(x, y)
tau(x, y) tau_a(x, y) tau_b(x, y)
x |
a numeric vector. In |
y |
a numeric vector of the same length as |
The correlation coefficient.
M.G. Kendall (1970). Rank Correlation Methods. Charles Griffin & Company Limited.
tauAP
for AP correlation coefficients.
# No ties x <- c(0.67, 0.45, 0.29, 0.12, 0.57, 0.24, 0.94, 0.75, 0.08, 0.54) y <- c(0.48, 0.68, 0.32, 0.09, 0.06, 0.61, 0.87, 0.22, 0.44, 0.84) tau(x, y) tau_a(x,y) # same as tau tau_b(x,y) # same as tau # Ties in y y <- round(y, 1) tau_a(x, y) tau_b(x, y) # Ties in x too x <- round(x, 1) tau_b(x, y)
# No ties x <- c(0.67, 0.45, 0.29, 0.12, 0.57, 0.24, 0.94, 0.75, 0.08, 0.54) y <- c(0.48, 0.68, 0.32, 0.09, 0.06, 0.61, 0.87, 0.22, 0.44, 0.84) tau(x, y) tau_a(x,y) # same as tau tau_b(x,y) # same as tau # Ties in y y <- round(y, 1) tau_a(x, y) tau_b(x, y) # Ties in x too x <- round(x, 1) tau_b(x, y)
tauAP
is the AP rank correlation coefficient by Yilmaz et al., where neither vector can
contain tied items. tauAP_a
and tauAP_b
are the versions developed by Urbano and
Marrero to cope with ties under the scenarios of accuracy and agreement, respectively. See the
references for details.
tauAP(x, y, decreasing = TRUE) tauAP_a(x, y, decreasing = TRUE) tauAP_b(x, y, decreasing = TRUE)
tauAP(x, y, decreasing = TRUE) tauAP_a(x, y, decreasing = TRUE) tauAP_b(x, y, decreasing = TRUE)
x |
a numeric vector. In |
y |
a numeric vector of the same length as |
decreasing |
logical. Should the sort order be increasing or decreasing (default)? |
Note that the sorting order is decreasing by default, as should be for instance if the scores
represent the effectiveness of systems. When the sorting order is ascending, as is for instance when the vectors represent ranks, the parameter
decreasing
must be set to FALSE
.
The correlation coefficient.
E. Yilmaz, J.A. Aslam and S. Robertson (2008). A New Rank Correlation Coefficient for Information Retrieval. ACM SIGIR.
J. Urbano and M. Marrero (2017). The Treatment of Ties in AP Correlation. ACM ICTIR.
tau
for Kendall correlation coefficients.
# No ties x <- c(0.67, 0.45, 0.29, 0.12, 0.57, 0.24, 0.94, 0.75, 0.08, 0.54) y <- c(0.48, 0.68, 0.32, 0.09, 0.06, 0.61, 0.87, 0.22, 0.44, 0.84) tauAP(x, y) tauAP_a(x,y) # same as tauAP # Ties in y y <- round(y, 1) tauAP_a(x, y) tauAP_b(x, y) # Ties in x too x <- round(x, 1) tauAP_b(x, y) # Set decreasing to FALSE when x and y already represent ranks x <- rank(-x) y <- rank(-y) tauAP_b(x, y, FALSE) # same as above
# No ties x <- c(0.67, 0.45, 0.29, 0.12, 0.57, 0.24, 0.94, 0.75, 0.08, 0.54) y <- c(0.48, 0.68, 0.32, 0.09, 0.06, 0.61, 0.87, 0.22, 0.44, 0.84) tauAP(x, y) tauAP_a(x,y) # same as tauAP # Ties in y y <- round(y, 1) tauAP_a(x, y) tauAP_b(x, y) # Ties in x too x <- round(x, 1) tauAP_b(x, y) # Set decreasing to FALSE when x and y already represent ranks x <- rank(-x) y <- rank(-y) tauAP_b(x, y, FALSE) # same as above