[R] splitting into multiple dataframes and then create a loop to work

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Mon Aug 29 21:37:27 CEST 2011


well, if a pooled estimate of the residual standard error is not 
desirable, then you just need to set argument 'pool' of lmList() to 
FALSE, e.g.,

mlis <- lmList(yvar ~ .  - clvar | clvar, data = df, pool = FALSE)
summary(mlis)


Best,
Dimitris


On 8/29/2011 9:20 PM, Dennis Murphy wrote:
> Hi:
>
> Dimitris' solution is appropriate, but it needs to be mentioned that
> the approach I offered earlier in this thread differs from the
> lmList() approach. lmList() uses a pooled measure of error MSE (which
> you can see at the bottom of the output from summary(mlis) ), whereas
> the plyr approach subdivides the data into distinct sub-data frames
> and analyzes them as separate entities. As a result, the residual MSEs
> will differ between the two approaches, which in turn affects the
> significance tests on the model coefficients. You need to decide which
> approach is better for your purposes.
>
> Cheers,
> Dennis
>
> On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos
> <d.rizopoulos at erasmusmc.nl>  wrote:
>> You can do this using function lmList() from package nlme, without having to
>> split the data frames, e.g.,
>>
>> library(nlme)
>>
>> mlis<- lmList(yvar ~ .  - clvar | clvar, data = df)
>> mlis
>> summary(mlis)
>>
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
>>>
>>> Dear All
>>>
>>> Sorry for this simple question, I could not solve it by spending days.
>>>
>>> My data looks like this:
>>>
>>> # data
>>> set.seed(1234)
>>> clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
>>> level for this factor var;
>>> yvar<-  rnorm(40, 10,6);
>>> var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
>>> var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
>>> df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
>>>
>>> # manual splitting
>>> df1<- subset(df, clvar == 1)
>>> df2<- subset(df, clvar == 2)
>>> df3<- subset(df, clvar == 3)
>>> df4<- subset(df, clvar == 4)
>>> df5<- subset(df, clvar == 5)
>>>
>>> # i tried to mechanize it
>>> *
>>>
>>> for(i in 1:5) {
>>>
>>>            df[i]<- subset(df, clvar == i)
>>>
>>> }
>>>
>>> I know it should not work as df[i] is single variable, do it did. But I
>>> could not find away to output multiple dataframes from this loop. My
>>> limited
>>> R knowledge, did not help at all !
>>>
>>> *
>>>
>>> # working on each of variable, just trying simple function
>>>   a<- 3:8
>>> out1<- lapply(1:5, function(ind){
>>>                     lm(df1$yvar ~ df1[, a[ind]])
>>>   })
>>> p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
>>> p1<- do.call(rbind, p1)
>>>
>>>
>>> My ultimate objective is to apply this function to all the dataframes
>>> created (i.e. df1, df2, df3, df4, df5) and create five corresponding
>>> p-value
>>> vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
>>> correponding p values
>>> clvar       var1   var2  var3  var4   var5
>>> 1
>>> 2
>>> 3
>>> 4
>>>
>>> Please help me !
>>>
>>> Thanks
>>>
>>> NIL
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Dimitris Rizopoulos
>> Assistant Professor
>> Department of Biostatistics
>> Erasmus University Medical Center
>>
>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> Tel: +31/(0)10/7043478
>> Fax: +31/(0)10/7043014
>> Web: http://www.erasmusmc.nl/biostatistiek/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/



More information about the R-help mailing list