[R] Linear regression with a rounded response variable

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 21 22:11:57 CEST 2015


This could be modeled directly using Bayesian techniques. Consider the
Bayesian version of the following model where we only observe y and X.  y0
is not observed.

   y0 <- X b + error
   y <- round(y0)

The following code is based on modifying the code in the README of the CRAN
rcppbugs R package.


library(rcppbugs)
set.seed(123)

# set up the test data - y and X are observed but not y0
NR <- 1e2L
NC <- 2L
X <- cbind(1, rnorm(10))
y0 <- X %*% 1:2
y <- round(y0)

# for comparison run a normal linear model w/ lm.fit using X and y
lm.res <- lm.fit(X,y)
print(coef(lm.res))
##        x1        x2
## 0.9569366 1.9170808

# RCppBugs Model
b <- mcmc.normal(rnorm(NC),mu=0,tau=0.0001)
tau.y <- mcmc.gamma(sd(as.vector(y)),alpha=0.1,beta=0.1)
y.hat <- deterministic(function(X,b) { round(X %*% b) }, X, b)
y.lik <- mcmc.normal(y,mu=y.hat,tau=tau.y,observed=TRUE)
m <- create.model(b, tau.y, y.hat, y.lik)

# run the Bayesian model based on y and X
cat("running model...\n")
runtime <- system.time(ans <- run.model(m, iterations=1e5L, burn=1e4L,
adapt=1e3L, thin=10L))
print(apply(ans[["b"]],2,mean))
## [1] 0.9882485 2.0009989


On Wed, Oct 21, 2015 at 10:53 AM, Ravi Varadhan <ravi.varadhan at jhu.edu>
wrote:

> Hi,
> I am dealing with a regression problem where the response variable, time
> (second) to walk 15 ft, is rounded to the nearest integer.  I do not care
> for the regression coefficients per se, but my main interest is in getting
> the prediction equation for walking speed, given the predictors (age,
> height, sex, etc.), where the predictions will be real numbers, and not
> integers.  The hope is that these predictions should provide unbiased
> estimates of the "unrounded" walking speed. These sounds like a measurement
> error problem, where the measurement error is due to rounding and hence
> would be uniformly distributed (-0.5, 0.5).
>
> Are there any canonical approaches for handling this type of a problem?
> What is wrong with just doing the standard linear regression?
>
> I googled and saw that this question was asked by someone else in a
> stackexchange post, but it was unanswered.  Any suggestions?
>
> Thank you,
> Ravi
>
> Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg)
> Associate Professor,  Department of Oncology
> Division of Biostatistics & Bionformatics
> Sidney Kimmel Comprehensive Cancer Center
> Johns Hopkins University
> 550 N. Broadway, Suite 1111-E
> Baltimore, MD 21205
> 410-502-2619
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

	[[alternative HTML version deleted]]



More information about the R-help mailing list