[R] Follow-up Question: data frames; matching/merging
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Feb 8 19:56:57 CET 2010
It works if you use character class rather than factors for your date
strings. Note last arg to data.frame:
> DF <- data.frame(V1=c(1,1,2,3,1,2),
+ V2=c('2002-03-13','1989-03-10','1988-01-20','1997-05-15','1996-11-18','2000-01-12'),
+ stringsAsFactors = FALSE)
> aggregate(DF[2], DF[1], min)
V1 V2
1 1 1989-03-10
2 2 1988-01-20
3 3 1997-05-15
On Mon, Feb 8, 2010 at 1:05 PM, Jonathan <jonsleepy at gmail.com> wrote:
> Wow.. thanks for the deluge of responses!
>
> Aggregate seems like the way to go here.
>
> But, suppose that instead of integers in column V2, I actually have
> dates (and instead of keeping the minimum integer, I want to keep the
> earliest date):
>
>> df = data.frame(V1=c(1,1,2,3,1,2),V2=c('2002-03-13','1989-03-10','1988-01-20','1997-05-15','1996-11-18','2000-01-12'));
>> aggregate(df$V2,df['V1'],min) #thanks, Phil
> Error in Summary.factor(c(6L, 2L, 3L), na.rm = FALSE) :
> min not meaningful for factors
>
> So I checked:
>> min(df$V2)
> Error in Summary.factor(c(6L, 2L, 1L, 4L, 3L, 5L), na.rm = FALSE) :
> min not meaningful for factors
>
> This can be fixed with:
>> min(as.Date(df$V2))
> [1] "1988-01-20"
>
>
> However, I get a strange response when I try:
>> aggregate(as.Date(df$V2),df['V1'],min)
> V1 x
> 1 1 7008
> 2 2 6593
> 3 3 9996
>
> What I'm looking for, of course, is:
>> aggregate(as.Date(df$V2),df['V1'],min)
> V1 V2
> 1 1 1996-11-18
> 2 2 1988-01-20
> 3 3 1997-05-15
>
>
> Any idea how to produce the desired output?
>
> Thanks,
> Jonathan
>
>
>
> On Mon, Feb 8, 2010 at 12:11 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> Here are 3 solutions assuming DF contains the data frame:
>>
>>> # 1. aggregate
>>> aggregate(DF[2], DF[1], min)
>> V1 V2
>> 1 a 2
>> 2 b 9
>> 3 c 4
>>
>>> # 2. aggregate.formula - requires R 2.11.x
>>> aggregate(V2 ~ V1, DF, min)
>> V1 V2
>> 1 a 2
>> 2 b 9
>> 3 c 4
>>
>>> # 3. SQL using sqldf
>>> library(sqldf)
>>> sqldf("select V1, min(V2) V2 from DF group by V1")
>> V1 V2
>> 1 a 2
>> 2 b 9
>> 3 c 4
>>
>>> # 4. summaryBy in the doBy package
>>> library(doBy)
>>> summaryBy(V2 ~., DF, FUN = min, keep.names = TRUE)
>> V1 V2
>> 1 a 2
>> 2 b 9
>> 3 c 4
>>
>> On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com> wrote:
>>> Hi all,
>>> I'm feeling a little guilty to ask this question, since I've
>>> written a solution using a rather clunky for loop that gets the job
>>> done. But I'm convinced there must be a faster (and probably more
>>> elegant) way to accomplish what I'm looking to do (perhaps using the
>>> "merge" function?). I figured somebody out there might've already
>>> figured this out:
>>>
>>> I have a dataframe with two columns (let's call them V1 and V2). All
>>> rows are unique, although column V1 has several redundant entries.
>>>
>>> Ex:
>>>
>>> V1 V2
>>> 1 a 3
>>> 2 a 2
>>> 3 b 9
>>> 4 c 4
>>> 5 a 7
>>> 6 b 11
>>>
>>>
>>> What I'd like is to return a dataframe cut down to have only unique
>>> entires in V1. V2 should contain a vector, for each V1, that is the
>>> minimum of all the possible choices from the set of redundant V1's.
>>>
>>> Example output:
>>>
>>> V1 V2
>>> 1 a 2
>>> 2 b 9
>>> 3 c 4
>>>
>>>
>>> If somebody could (relatively easily) figure out how to get closer to
>>> a solution, I'd appreciate hearing how. Also, I'd be interested to
>>> hear how you came upon the answer (so I can get better at searching
>>> the R resources myself).
>>>
>>> Regards,
>>> Jonathan
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
More information about the R-help
mailing list