[R] How to get variable name while doing series of regressions in an automated manner?
Bert Gunter
bgunter.4567 at gmail.com
Sun Nov 1 23:06:32 CET 2015
Ravi et. al:
My prior "solution" nagged at me, as I thought it was pretty clumsy --
I was hoping someone would show how to fix it up. As no one did, I
finally realized how to do it myself. Here's how to do the iteration
to get the right labeling with no pasting or formula() call by using
as.name() to substitute via bquote() directly into the (parsed) lm()
call. As one can see, it's a general approach to this sort of thing.
(It's also been offered in the past by others, but I forgot it).
z <- list(y1=rnorm(10,5),y2=rnorm(10,8),x=runif(10))
lapply(names(z)[-3],function(u) {
eval(bquote(lm(log(.(y)) ~ x, data=z), list(y=as.name(u))))
})
There -- now I feel better. No need to respond.
Cheers,
Bert
Bert Gunter
"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
-- Clifford Stoll
On Tue, Oct 27, 2015 at 10:50 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> Marc,Ravi:
>
> I may misunderstand, but I think Marc's solution labels the list
> components but not necessarily the summary() outputs. This might be
> sufficient, as in:
>
>> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10)
>>
>> ##1
>> results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z))
>> lapply(results1,summary)
> $y1
>
> Call:
> lm(formula = log(y) ~ x, data = z)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.2185 -0.1259 -0.0643 0.1340 0.3988
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.69319 0.14375 11.779 2.47e-06 ***
> x -0.01495 0.02317 -0.645 0.537
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.2104 on 8 degrees of freedom
> Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937
> F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369
>
>
> $y2
>
> Call:
> lm(formula = log(y) ~ x, data = z)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.229072 -0.094579 -0.006498 0.134303 0.188158
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.084846 0.104108 20.026 4.03e-08 ***
> x -0.006226 0.016778 -0.371 0.72
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1524 on 8 degrees of freedom
> Multiple R-squared: 0.01692, Adjusted R-squared: -0.106
> F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202
>
>
> ## 2
>
> Alternatively, if you want output with the correct variable names,
> bquote() can be used, as in:
>
>> results2 <-lapply(names(z)[1:2],
> + function(nm){
> + fo <-formula(paste0("log(",nm,")~x"))
> + eval(bquote(lm(.(u),data=z),list(u=fo)))
> + })
>> lapply(results2,summary)
> [[1]]
>
> Call:
> lm(formula = log(y1) ~ x, data = z)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.2185 -0.1259 -0.0643 0.1340 0.3988
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.69319 0.14375 11.779 2.47e-06 ***
> x -0.01495 0.02317 -0.645 0.537
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.2104 on 8 degrees of freedom
> Multiple R-squared: 0.04945, Adjusted R-squared: -0.06937
> F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369
>
>
> [[2]]
>
> Call:
> lm(formula = log(y2) ~ x, data = z)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.229072 -0.094579 -0.006498 0.134303 0.188158
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.084846 0.104108 20.026 4.03e-08 ***
> x -0.006226 0.016778 -0.371 0.72
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1524 on 8 degrees of freedom
> Multiple R-squared: 0.01692, Adjusted R-squared: -0.106
> F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202
>
>
> HTH or apologies if I've missed the point and broadcasted noise.
>
> Cheers,
> Bert
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> -- Clifford Stoll
>
>
> On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:
>> Hi,
>>
>> I am running through a series of regression in a loop as follows:
>>
>> results <- vector("list", length(mydata$varnames))
>>
>> for (i in 1:length(mydata$varnames)) {
>> results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata))
>> }
>>
>> Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following:
>>
>> Call:
>> lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status,
>> data = mydata)
>>
>>
>> Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion?
>>
>> Thank you very much,
>> Ravi
>>
>> Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg)
>> Associate Professor, Department of Oncology
>> Division of Biostatistics & Bionformatics
>> Sidney Kimmel Comprehensive Cancer Center
>> Johns Hopkins University
>> 550 N. Broadway, Suite 1111-E
>> Baltimore, MD 21205
>> 410-502-2619
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list