[R] How to get variable name while doing series of regressions in an automated manner?
Ravi Varadhan
ravi.varadhan at jhu.edu
Tue Oct 27 19:27:10 CET 2015
Thank you very much, Marc & Bert.
Bert - I think you're correct. With Marc's solution, I am not able to get the response variable name in the call to lm(). But, your solution works well.
Best regards,
Ravi
-----Original Message-----
From: Bert Gunter [mailto:bgunter.4567 at gmail.com]
Sent: Tuesday, October 27, 2015 1:50 PM
To: Ravi Varadhan <ravi.varadhan at jhu.edu>
Cc: r-help at r-project.org
Subject: Re: [R] How to get variable name while doing series of regressions in an automated manner?
Marc,Ravi:
I may misunderstand, but I think Marc's solution labels the list components but not necessarily the summary() outputs. This might be sufficient, as in:
> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10)
>
> ##1
> results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z))
> lapply(results1,summary)
$y1
Call:
lm(formula = log(y) ~ x, data = z)
Residuals:
Min 1Q Median 3Q Max
-0.2185 -0.1259 -0.0643 0.1340 0.3988
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.69319 0.14375 11.779 2.47e-06 ***
x -0.01495 0.02317 -0.645 0.537
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2104 on 8 degrees of freedom
Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937
F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369
$y2
Call:
lm(formula = log(y) ~ x, data = z)
Residuals:
Min 1Q Median 3Q Max
-0.229072 -0.094579 -0.006498 0.134303 0.188158
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.084846 0.104108 20.026 4.03e-08 ***
x -0.006226 0.016778 -0.371 0.72
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1524 on 8 degrees of freedom
Multiple R-squared: 0.01692, Adjusted R-squared: -0.106
F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202
## 2
Alternatively, if you want output with the correct variable names,
bquote() can be used, as in:
> results2 <-lapply(names(z)[1:2],
+ function(nm){
+ fo <-formula(paste0("log(",nm,")~x"))
+ eval(bquote(lm(.(u),data=z),list(u=fo)))
+ })
> lapply(results2,summary)
[[1]]
Call:
lm(formula = log(y1) ~ x, data = z)
Residuals:
Min 1Q Median 3Q Max
-0.2185 -0.1259 -0.0643 0.1340 0.3988
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.69319 0.14375 11.779 2.47e-06 ***
x -0.01495 0.02317 -0.645 0.537
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2104 on 8 degrees of freedom
Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937
F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369
[[2]]
Call:
lm(formula = log(y2) ~ x, data = z)
Residuals:
Min 1Q Median 3Q Max
-0.229072 -0.094579 -0.006498 0.134303 0.188158
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.084846 0.104108 20.026 4.03e-08 ***
x -0.006226 0.016778 -0.371 0.72
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1524 on 8 degrees of freedom
Multiple R-squared: 0.01692, Adjusted R-squared: -0.106
F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202
HTH or apologies if I've missed the point and broadcasted noise.
Cheers,
Bert
Bert Gunter
"Data is not information. Information is not knowledge. And knowledge is certainly not wisdom."
-- Clifford Stoll
On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:
> Hi,
>
> I am running through a series of regression in a loop as follows:
>
> results <- vector("list", length(mydata$varnames))
>
> for (i in 1:length(mydata$varnames)) { results[[i]] <-
> summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex +
> CMV.status, data=mydata)) }
>
> Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following:
>
> Call:
> lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status,
> data = mydata)
>
>
> Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion?
>
> Thank you very much,
> Ravi
>
> Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg)
> Associate Professor, Department of Oncology Division of Biostatistics
> & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns
> Hopkins University
> 550 N. Broadway, Suite 1111-E
> Baltimore, MD 21205
> 410-502-2619
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list