[R] aggregate

Gang Chen gangchen6 at gmail.com
Wed Aug 24 22:42:09 CEST 2016


Yes, this works out perfectly! Thanks a lot, David. Have a wonderful day...

On Wed, Aug 24, 2016 at 4:24 PM, David L Carlson <dcarlson at tamu.edu> wrote:
> This will work, but you should double-check to be certain that CP and unique(myData[, 3:5]) are in the same order. It will fail if N is not identical for all rows of the same S-Z combination.
>
>> CP <- sapply(split(myData, paste0(myData$S, myData$Z)), function(x)
> +       crossprod(x[, 1], x[, 2]))
>> data.frame(CP, unique(myData[, 3:5]))
>     CP   N  S Z
> S1A 22 2.1 S1 A
> S1B 38 2.1 S1 B
> S2A 38 3.2 S2 A
> S2B 22 3.2 S2 B
>
> David C
>
> -----Original Message-----
> From: Gang Chen [mailto:gangchen6 at gmail.com]
> Sent: Wednesday, August 24, 2016 2:51 PM
> To: David L Carlson
> Cc: r-help mailing list
> Subject: Re: [R] aggregate
>
> Thanks again for patiently offering great help, David! I just learned
> dput() and paste0() now. Hopefully this is my last question.
>
> Suppose a new dataframe is as below (one more numeric column):
>
> myData <- structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
> 5, 4, 3, 2, 1), N =c(rep(2.1, 4), rep(3.2, 4)), S = structure(c(1L,
> 1L, 1L, 1L, 2L, 2L, 2L, 2L
> ), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
> 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")),
> .Names = c("X",
> "Y", "N", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")
>
>> myData
>
>   X Y   N  S Z
> 1 1 8 2.1 S1 A
> 2 2 7 2.1 S1 A
> 3 3 6 2.1 S1 B
> 4 4 5 2.1 S1 B
> 5 5 4 3.2 S2 A
> 6 6 3 3.2 S2 A
> 7 7 2 3.2 S2 B
> 8 8 1 3.2 S2 B
>
> Once I obtain the cross product,
>
>> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 1], x[, 2]))
> S1A S1B S2A S2B
>  22  38  38  22
>
> how can I easily add the other 3 columns (N, S, and Z) in a new
> dataframe? For S and Z, I can play with the names from the cross
> product output, but I have trouble dealing with the numeric column N.
>
>
>
>
> On Wed, Aug 24, 2016 at 1:07 PM, David L Carlson <dcarlson at tamu.edu> wrote:
>> You need to spend some time with a basic R tutorial. Your data is messed up because you did not use a simple text editor somewhere along the way. R understands ', but not ‘ or ’. The best way to send data to the list is to use dput:
>>
>>> dput(myData)
>> structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
>> 5, 4, 3, 2, 1), S = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
>> ), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
>> 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names = c("X",
>> "Y", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")
>>
>> Combining two labels just requires the paste0() function:
>>
>>> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 1], x[, 2]))
>> S1A S1B S2A S2B
>>  22  38  38  22
>>
>> David C
>>
>> -----Original Message-----
>> From: Gang Chen [mailto:gangchen6 at gmail.com]
>> Sent: Wednesday, August 24, 2016 11:56 AM
>> To: David L Carlson
>> Cc: Jim Lemon; r-help mailing list
>> Subject: Re: [R] aggregate
>>
>> Thanks a lot, David! I want to further expand the operation a little
>> bit. With a new dataframe:
>>
>> myData <- data.frame(X=c(1, 2, 3, 4, 5, 6, 7, 8), Y=c(8, 7, 6, 5, 4,
>> 3, 2, 1), S=c(‘S1’, ‘S1’, ‘S1’, ‘S1’, ‘S2’, ‘S2’, ‘S2’, ‘S2’),
>> Z=c(‘A’, ‘A’, ‘B’, ‘B’, ‘A’, ‘A’, ‘B’, ‘B’))
>>
>>> myData
>>
>>   X Y  S Z
>> 1 1 8 S1 A
>> 2 2 7 S1 A
>> 3 3 6 S1 B
>> 4 4 5 S1 B
>> 5 5 4 S2 A
>> 6 6 3 S2 A
>> 7 7 2 S2 B
>> 8 8 1 S2 B
>>
>> I would like to obtain the same cross product between columns X and Y,
>> but at each combination level of factors S and Z. In other words, the
>> cross product would be still performed each two rows in the new
>> dataframe myData. How can I achieve that?
>>
>> On Wed, Aug 24, 2016 at 11:54 AM, David L Carlson <dcarlson at tamu.edu> wrote:
>>> Your is fine, but it will be a little simpler if you use sapply() instead:
>>>
>>>> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z),
>>> +     function(x) crossprod(x[, 1], x[, 2])))
>>>   Z CP
>>> A A 10
>>> B B 10
>>>
>>> David C
>>>
>>>
>>> -----Original Message-----
>>> From: Gang Chen [mailto:gangchen6 at gmail.com]
>>> Sent: Wednesday, August 24, 2016 10:17 AM
>>> To: David L Carlson
>>> Cc: Jim Lemon; r-help mailing list
>>> Subject: Re: [R] aggregate
>>>
>>> Thank you all for the suggestions! Yes, I'm looking for the cross
>>> product between the two columns of X and Y.
>>>
>>> A follow-up question: what is a nice way to merge the output of
>>>
>>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>>>
>>> with the column Z in myData so that I would get a new dataframe as the
>>> following (the 2nd column is the cross product between X and Y)?
>>>
>>> Z   CP
>>> A   10
>>> B   10
>>>
>>> Is the following legitimate?
>>>
>>> data.frame(Z=levels(myData$Z), CP= unlist(lapply(split(myData,
>>> myData$Z), function(x) crossprod(x[, 1], x[, 2]))))
>>>
>>>
>>> On Wed, Aug 24, 2016 at 10:37 AM, David L Carlson <dcarlson at tamu.edu> wrote:
>>>> Thank you for the reproducible example, but it is not clear what cross product you want. Jim's solution gives you the cross product of the 2-column matrix with itself. If you want the cross product between the columns you need something else. The aggregate function will not work since it will treat the columns separately:
>>>>
>>>>> A <- as.matrix(myData[myData$Z=="A", 1:2])
>>>>> A
>>>>   X Y
>>>> 1 1 4
>>>> 2 2 3
>>>>> crossprod(A) # Same as t(A) %*% A
>>>>    X  Y
>>>> X  5 10
>>>> Y 10 25
>>>>> crossprod(A[, 1], A[, 2]) # Same as t(A[, 1] %*% A[, 2]
>>>>      [,1]
>>>> [1,]   10
>>>>>
>>>>> # For all the groups
>>>>> lapply(split(myData, myData$Z), function(x) crossprod(as.matrix(x[, 1:2])))
>>>> $A
>>>>    X  Y
>>>> X  5 10
>>>> Y 10 25
>>>>
>>>> $B
>>>>    X  Y
>>>> X 25 10
>>>> Y 10  5
>>>>
>>>>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>>>> $A
>>>>      [,1]
>>>> [1,]   10
>>>>
>>>> $B
>>>>      [,1]
>>>> [1,]   10
>>>>
>>>> -------------------------------------
>>>> David L Carlson
>>>> Department of Anthropology
>>>> Texas A&M University
>>>> College Station, TX 77840-4352
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon
>>>> Sent: Tuesday, August 23, 2016 6:02 PM
>>>> To: Gang Chen; r-help mailing list
>>>> Subject: Re: [R] aggregate
>>>>
>>>> Hi Gang Chen,
>>>> If I have the right idea:
>>>>
>>>> for(zval in levels(myData$Z))
>>>> crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))
>>>>
>>>> Jim
>>>>
>>>> On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen <gangchen6 at gmail.com> wrote:
>>>>> This is a simple question: With a dataframe like the following
>>>>>
>>>>> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 'B'))
>>>>>
>>>>> how can I get the cross product between X and Y for each level of
>>>>> factor Z? My difficulty is that I don't know how to deal with the fact
>>>>> that crossprod() acts on two variables in this case.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list