[R] Linear regression with a rounded response variable
Charles C. Berry
ccberry at ucsd.edu
Wed Oct 21 19:57:14 CEST 2015
On Wed, 21 Oct 2015, Ravi Varadhan wrote:
> Hi, I am dealing with a regression problem where the response variable,
> time (second) to walk 15 ft, is rounded to the nearest integer. I do
> not care for the regression coefficients per se, but my main interest is
> in getting the prediction equation for walking speed, given the
> predictors (age, height, sex, etc.), where the predictions will be real
> numbers, and not integers. The hope is that these predictions should
> provide unbiased estimates of the "unrounded" walking speed. These
> sounds like a measurement error problem, where the measurement error is
> due to rounding and hence would be uniformly distributed (-0.5, 0.5).
>
Not the usual "measurement error model" problem, though, where the errors
are in X and not independent of XB.
Look back at the proof of the unbiasedness of least squares under the
Gauss-Markov setup. The errors in Y need to have expectation zero.
>From your description (but see caveat below) this is true of walking
*time*, but not not exactly true of walking *speed* (modulo the usual
assumptions if they apply to time). In fact if E(epsilon) = 0 were true of
unrounded time, it would not be true of unrounded speed (and vice versa).
> Are there any canonical approaches for handling this type of a problem?
Work out the bias analytically? Parametric bootstrap? Data augmentation
and friends?
> What is wrong with just doing the standard linear regression?
>
Well, what do the actual values look like?
If half the subjects have a value of 5 seconds and the rest are split
between 4 and 6, your assertion that rounding induces an error of
dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6
second group and more negative errors in the 4 second group under any
plausible model).
HTH,
Chuck
More information about the R-help
mailing list