Reliability and predictive accuracy

Created 6 Jun 2014 • Last modified 7 Jun 2014

I show with an example that even in situations as simple as linear regression with two predictors, it's difficult to estimate (in advance of measuring the criterion) the consequences of unreliability for predictive accuracy.

In classical test theory, the reliability of a test and the correlation of true scores with some (perfectly reliable) criterion measure together determine the accuracy with which observed scores can predict the criterion measure. Here I show that this theorem doesn't generalize to linear regression models for the criterion with two predictors instead of just one. I give two examples where the reliabilities of the two tests are the same, and so are the predictive accuracies with true scores, but the predictive accuracies with observed scores differ dramatically.

Setup:

library(MASS)
set.seed(3)

n = 10000
xcor = .8
d.true = data.frame(mvrnorm(n, c(0, 0), matrix(c(1, xcor, xcor, 1), nrow = 2)))
merr.x1.sd = .5
merr.x2.sd = .5

d.observed = transform(d.true,
    X1 = X1 + rnorm(n, sd = merr.x1.sd),
    X2 = X2 + rnorm(n, sd = merr.x2.sd))

go = function(model, ε.sd)
   {y = with(d.true,
        model(X1, X2) + rnorm(n, sd = ε.sd))
    round(d = 2, `colnames<-`(
        as.matrix(c(
            d.true = with(d.true, summary(lm(y ~ X1 + X2))$r.squared),
            d.observed = with(d.observed, summary(lm(y ~ X1 + X2))$r.squared))),
        "R^2"))}

Notice that the true scores are highly correlated (xcor = .8).

Here we set the data-generating model to be the sum of the true scores, and we choose the residual SD of this model such that the proportion of variance accounted for with true scores is about 1/2.

go(function(X1, X2) X1 + X2, ε.sd = 1.9)
  R^2
d.true 0.50
d.observed 0.44

Now we set the data-generating model to be the difference of the true scores. Again, we choose the residual SD of this model such that the proportion of variance accounted for with true scores is about 1/2.

go(function(X1, X2) X1 - X2, ε.sd = .63)
  R^2
d.true 0.51
d.observed 0.24

Notice that the R2 for observed scores took a dive.