[R] adding an additional column for preserving uniqueness

Chel Hee Lee chl948 at mail.usask.ca
Thu Jan 29 04:20:59 CET 2015


I like the way presented by William Dunlap in the previous post.  You 
may also try this:

 > dat1$item <- Reduce(c,lapply(table(dat1$Date), seq_len))
 > dat2$item <- Reduce(c,lapply(table(dat2$Date), seq_len))
 > dat1
          Date ConcAve item
1  2009-07-08       7    1
2  2009-08-26       1    1
3  2009-08-26       2    2
4  2009-09-15       2    1
5  2009-10-14       2    1
6  2009-10-14       2    2
7  2009-10-16     101    1
8  2009-10-16      93    2
9  2009-11-18       4    1
10 2009-11-18       3    2
11 2010-01-04       4    1
 > dat2
          Date  ConcAve item
1  2009-08-26 4.84e-05    1
2  2009-09-15 4.58e-05    1
3  2009-10-14 3.86e-05    1
4  2009-10-14 3.55e-05    2
5  2009-10-16 3.07e-05    1
6  2009-10-16 2.35e-05    2
7  2009-11-18 2.00e-05    1
8  2009-11-18 1.96e-05    2
9  2010-01-04 1.52e-05    1
10 2010-01-04 1.53e-05    2
11 2010-02-10 2.23e-05    1
 >

I hope this helps.

Chel Hee Lee

On 01/28/2015 07:38 PM, William Dunlap wrote:
>> with(dat1, ave(integer(length(Date)), Date, FUN=seq_along))
>   [1] 1 1 2 1 1 2 1 2 1 2 1
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jan 28, 2015 at 4:54 PM, Morway, Eric <emorway at usgs.gov> wrote:
>
>> The two datasets below are excerpts from much larger datasets.  Note that
>> there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14".
>>
>> dat1 <- read.table(textConnection("Date ConcAve
>> 2009-07-08       7
>> 2009-08-26       1
>> 2009-08-26       2
>> 2009-09-15       2
>> 2009-10-14       2
>> 2009-10-14       2
>> 2009-10-16     101
>> 2009-10-16      93
>> 2009-11-18       4
>> 2009-11-18       3
>> 2010-01-04       4"),header=T)
>> closeAllConnections()
>>
>> dat2 <- read.table(textConnection("Date  ConcAve
>> 2009-08-26 4.84e-05
>> 2009-09-15 4.58e-05
>> 2009-10-14 3.86e-05
>> 2009-10-14 3.55e-05
>> 2009-10-16 3.07e-05
>> 2009-10-16 2.35e-05
>> 2009-11-18 2.00e-05
>> 2009-11-18 1.96e-05
>> 2010-01-04 1.52e-05
>> 2010-01-04 1.53e-05
>> 2010-02-10 2.23e-05"),header=T)
>> closeAllConnections()
>>
>> I'm seeking an R operation that will append a third column to both
>> data.frame's such that it makes these duplicates unique when I run merge().
>> The desired result for dat1 would be:
>>
>>        Date ConcAve  item
>> 2009-07-08       7     1
>> 2009-08-26       1     1
>> 2009-08-26       2     2
>> 2009-09-15       2     1
>> 2009-10-14       2     1
>> 2009-10-14       2     2
>> 2009-10-16     101     1
>> 2009-10-16      93     2
>> 2009-11-18       4     1
>> 2009-11-18       3     2
>> 2010-01-04       4     1
>>
>> this way, I don't get this:
>>
>> merge(dat1, dat2, by="Date")
>> #         Date ConcAve.x ConcAve.y
>> #1  2009-08-26         1  4.84e-05
>> #2  2009-08-26         2  4.84e-05
>> #3  2009-09-15         2  4.58e-05
>> #4  2009-10-14         2  3.55e-05
>> #5  2009-10-14         2  3.86e-05
>> #6  2009-10-14         2  3.55e-05
>> #7  2009-10-14         2  3.86e-05
>> #8  2009-10-16       101  3.07e-05
>> #9  2009-10-16       101  2.35e-05
>> #10 2009-10-16        93  3.07e-05
>> #11 2009-10-16        93  2.35e-05
>> #12 2009-11-18         4  1.96e-05
>> #13 2009-11-18         4  2.00e-05
>> #14 2009-11-18         3  1.96e-05
>> #15 2009-11-18         3  2.00e-05
>> #16 2010-01-04         4  1.52e-05
>> #17 2010-01-04         4  1.53e-05
>>
>> With the new column, which I've inserted manually in this small example, I
>> instead get the merge result below, which is what I'm after for the larger
>> problem:
>>
>> dat3 <- read.table(textConnection("Date ConcAve  item
>> 2009-07-08       7     1
>> 2009-08-26       1     1
>> 2009-08-26       2     2
>> 2009-09-15       2     1
>> 2009-10-14       2     1
>> 2009-10-14       2     2
>> 2009-10-16     101     1
>> 2009-10-16      93     2
>> 2009-11-18       4     1
>> 2009-11-18       3     2
>> 2010-01-04       4     1"),header=T)
>> closeAllConnections()
>>
>> dat4 <- read.table(textConnection("Date  ConcAve  item
>> 2009-08-26 4.84e-05     1
>> 2009-09-15 4.58e-05     1
>> 2009-10-14 3.86e-05     1
>> 2009-10-14 3.55e-05     2
>> 2009-10-16 3.07e-05     1
>> 2009-10-16 2.35e-05     2
>> 2009-11-18 2.00e-05     1
>> 2009-11-18 1.96e-05     2
>> 2010-01-04 1.52e-05     1
>> 2010-01-04 1.53e-05     2
>> 2010-02-10 2.23e-05     1"),header=T)
>> closeAllConnections()
>>
>> merge(dat3, dat4, by=c("Date","item"))
>> #        Date item ConcAve.x ConcAve.y
>> #1 2009-08-26    1         1  4.84e-05
>> #2 2009-09-15    1         2  4.58e-05
>> #3 2009-10-14    1         2  3.86e-05
>> #4 2009-10-14    2         2  3.55e-05
>> #5 2009-10-16    1       101  3.07e-05
>> #6 2009-10-16    2        93  2.35e-05
>> #7 2009-11-18    1         4  2.00e-05
>> #8 2009-11-18    2         3  1.96e-05
>> #9 2010-01-04    1         4  1.52e-05
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list