[R] How to extract same columns from identical dataframes in a list?
Bert Gunter
bgunter.4567 at gmail.com
Wed Feb 10 16:27:29 CET 2016
Google! (e.g. on "R Language tutorials")
Some specific recommendations can be found here:
https://www.rstudio.com/resources/training/online-learning/#R
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Feb 10, 2016 at 1:04 AM, Wolfgang Waser
<waser at frankenfoerder-fg.de> wrote:
> Hi,
>
> sapply(l,"[",T,2)
>
> and
>
> sapply(l, function(e) e[, 2])
>
>
> work fine!
>
>
> Thanks a lot!
>
> Why is the second version "brute force and ignorance"? Is one of the
> versions to be preferred? If so, which and why (very briefly, please)?
>
>
> Results of the other options mentioned:
>
>> sapply(l,"[[",2)
>
> results in a single vector of length 7
>
>
>> sapply(l,"[",,2)
> Error in lapply(X = X, FUN = FUN, ...) :
> argument is missing, with no default
>
> These versions probably don't work due the "data frames" in the list
> actually being matrices.
>
>
> I'm not enough of a programer to always make complete sense of the R
> help pages. Should I have found this information in the sapply - R help
> page?
> Where else could I check before pestering the R mailing list, which, of
> course, provides quick and valuable answers.
>
>
> Cheers,
>
> Wolfgang
>
>
>
>
> On 09/02/16 16:19, peter dalgaard wrote:
>> Like this?
>>
>>> l <- replicate(3,data.frame(w1=sample(1:4),w2=sample(1:4)), simplify=FALSE)
>>> l
>> [[1]]
>> w1 w2
>> 1 2 2
>> 2 3 3
>> 3 1 1
>> 4 4 4
>>
>> [[2]]
>> w1 w2
>> 1 3 4
>> 2 2 2
>> 3 1 3
>> 4 4 1
>>
>> [[3]]
>> w1 w2
>> 1 1 4
>> 2 4 3
>> 3 2 1
>> 4 3 2
>>
>>> sapply(l,"[[",2)
>> [,1] [,2] [,3]
>> [1,] 2 4 4
>> [2,] 3 2 3
>> [3,] 1 3 1
>> [4,] 4 1 2
>>
>> Or even
>>
>>> sapply(l,"[",,2)
>> [,1] [,2] [,3]
>> [1,] 2 4 4
>> [2,] 3 2 3
>> [3,] 1 3 1
>> [4,] 4 1 2
>>
>>
>> Notice that if dd[1:24] gives you the 1st column, then dd is not a data frame but rather a matrix, and indexing semantics are different. In that case, for some unspeakable reason, the empty index does not work and you'll need something like
>>
>>> l <- replicate(3,cbind(w1=sample(1:4),w2=sample(1:4)), simplify=FALSE)
>>> sapply(l,"[",T,2)
>> [,1] [,2] [,3]
>> [1,] 4 3 2
>> [2,] 1 1 4
>> [3,] 3 2 3
>> [4,] 2 4 1
>>
>> Or, brute-force-and-ignorance:
>>
>>> sapply(l, function(e) e[, 2])
>> [,1] [,2] [,3]
>> [1,] 4 3 2
>> [2,] 1 1 4
>> [3,] 3 2 3
>> [4,] 2 4 1
>>
>>
>>
>>
>>
>> On 09 Feb 2016, at 10:03 , Wolfgang Waser <waser at frankenfoerder-fg.de> wrote:
>>
>>> Hi,
>>>
>>> sorry if my description was too short / unclear.
>>>
>>>> I have a list of 7 data frames, each data frame having 24 rows (hour of
>>>> the day) and 5 columns (weeks) with a total of 5 x 24 values
>>>
>>> [1]
>>> week1 week2 week3 ...
>>> 1 x a m ...
>>> 2 y b n
>>> 3 z c o
>>> . . . .
>>> . . . .
>>> . . . .
>>> 24 . . .
>>>
>>>
>>> [2]
>>> week1 week2 week3 ...
>>> 1 x2 a2 m2 ...
>>> 2 y2 b2 n2
>>> 3 z2 c2 o2
>>> . . . .
>>> . . . .
>>> . . . .
>>> 24 . . .
>>>
>>>
>>> [3]
>>> ...
>>>
>>> .
>>> .
>>> .
>>>
>>>
>>> [7]
>>> ...
>>>
>>>
>>>
>>> I now would like to extract e.g. all week2 columns of all data frames in
>>> the list and combine them in a new data frame using cbind.
>>>
>>> new data frame
>>>
>>> week2 ([1]) week2 ([2]) week2 ([3]) ...
>>> a a2 .
>>> b b2 .
>>> c c2 .
>>> .
>>> .
>>> .
>>>
>>> I will then do further row-wise calculations using e.g. apply(x,1,mean),
>>> the result being a vector of 24 values.
>>>
>>>
>>> I have not found a way to extract specific columns of the data frames in
>>> a list.
>>>
>>>
>>> As mentioned I can use
>>>
>>> sapply(list_of_dataframes,"[",1:24)
>>>
>>> which will pick the first 24 values (first column) of each data frame in
>>> the list and arrange them as an array of 24 rows and 7 columns (7 data
>>> frames are in the list).
>>> To pick the second column (week2) using sapply I have to use the next 24
>>> values from 25 to 48:
>>>
>>> sapply(list_of_dataframes,"[",25:48)
>>>
>>>
>>> It seems that sapply treats the data frames in the list as vectors. I
>>> can of course extract all consecutive weeks using consecutive blocks of
>>> 24 values, but this seems cumbersome.
>>>
>>>
>>> The question remains, how to select specific columns from data frames in
>>> a list, e.g. all columns 3 of all data frames in the list.
>>>
>>>
>>> Reformatting (unlist(), dim()) in one data frame with one column for
>>> each week does not help, since I'm not calculating colMeans etc, but
>>> row-wise calculations using apply(x,1,FUN) ("applying a function to
>>> margins of an array or matrix").
>>>
>>>
>>>
>>> Thanks for you help and suggestions!
>>>
>>>
>>> Wolfgang
>>>
>>>
>>>
>>> On 08/02/16 18:00, Dénes Tóth wrote:
>>>> Hi,
>>>>
>>>> Although you did not provide any reproducible example, it seems you
>>>> store the same type of values in your data.frames. If this is true, it
>>>> is much more efficient to store your data in an array:
>>>>
>>>> mylist <- list(a = data.frame(week1 = rnorm(24), week2 = rnorm(24)),
>>>> b = data.frame(week1 = rnorm(24), week2 = rnorm(24)))
>>>>
>>>> myarray <- unlist(mylist, use.names = FALSE)
>>>> dim(myarray) <- c(nrow(mylist$a), ncol(mylist$a), length(mylist))
>>>> dimnames(myarray) <- list(hour = rownames(mylist$a),
>>>> week = colnames(mylist$a),
>>>> other = names(mylist))
>>>> # now you can do:
>>>> mean(myarray[, "week1", "a"])
>>>>
>>>> # or:
>>>> colMeans(myarray)
>>>>
>>>>
>>>> Cheers,
>>>> Denes
>>>>
>>>>
>>>> On 02/08/2016 02:33 PM, Wolfgang Waser wrote:
>>>>> Hello,
>>>>>
>>>>> I have a list of 7 data frames, each data frame having 24 rows (hour of
>>>>> the day) and 5 columns (weeks) with a total of 5 x 24 values
>>>>>
>>>>> I would like to combine all 7 columns of week 1 (and 2 ...) in a
>>>>> separate data frame for hourly calculations, e.g.
>>>>>> apply(new.data.frame,1,mean)
>>>>>
>>>>> In some way sapply (lapply) works, but I cannot directly select columns
>>>>> of the original data frames in the list. As a workaround I have to
>>>>> select a range of values:
>>>>>
>>>>>> sapply(list_of_dataframes,"[",1:24)
>>>>>
>>>>> Values 1:24 give the first column, 25:48 the second and so on.
>>>>>
>>>>> Is there an easier / more direct way to select for specific columns
>>>>> instead of selecting a range of values, avoiding loops?
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Wolfgang
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list