[R] Create sequential vector for values in another column
William Dunlap
wdunlap at tibco.com
Fri Oct 11 18:50:54 CEST 2013
At this point 3 functions have been suggested and I'll add a 4th:
f1 <- function(x)unlist(lapply(unname(split(rep.int(1L,length(x)), x)), cumsum))
f2 <- function(x)unlist(sapply(rle(x)$lengths, function(k) 1:k ))
f3 <- function(x)ave(x,x,FUN=seq)
f4 <- function(x)ave(seq_along(x), x, FUN=seq_along)
You can compare their results with ftest (as long as their results have the
same lengths):
ftest <- function(x) {
data.frame(x, f1=f1(x), f2=f2(x), f3=f3(x), f4=f4(x))
}
They all return the same result for the Steven's sample data, which is numeric
and in sorted order:
x0 <- c(123.45, 123.45, 123.45, 123.45, 234.56,
234.56, 234.56, 234.56, 234.56, 234.56, 234.56, 345.67, 345.67,
345.67, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78,
456.78, 456.78)
However, f1() gives the wrong answer if x is not sorted:
> ftest(c(30,30,30, 20,20))
x f1 f2 f3 f4
1 30 1 1 1 1
2 30 2 2 2 2
3 30 1 3 3 3
4 20 2 1 1 1
5 20 3 2 2 2
f1() and f2() give the wrong answer if the groups are split up in the data
> ftest(c(10,10, 8,8,8, 10,10,10)) # 10's not contiguous
x f1 f2 f3 f4
1 10 1 1 1 1
2 10 2 2 2 2
3 8 3 1 1 1
4 8 1 2 2 2
5 8 2 3 3 3
6 10 3 1 3 3
7 10 4 2 4 4
8 10 5 3 5 5
(It is not clear what result the OP wants here.)
f3() gives the wrong answer if x is not numeric
> f3(c("a","a","a", "b","b"))
[1] "1" "2" "3" "1" "2"
f3() also gives an ominous warning if there is singleton in x (be
> f3(c(1,1,1, 11))
[1] 1 2 3 1
Warning message:
In `split<-.default`(`*tmp*`, g, value = lapply(split(x, g), FUN)) :
number of items to replace is not a multiple of replacement length
f2() fails to give an answer if x is a factor
> f2(factor(c("x","y","z")))
Error in rle(x) : 'x' must be an atomic vector
I think f4 gives the correct result for all those cases.
I think all of the above call lapply(split()) at some point and that can use
a lot of memory when there are lots of unique values in x. You can use
a sort-based algorithm to avoid that problem.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of arun
> Sent: Friday, October 11, 2013 6:43 AM
> To: Steven Ranney; r-help at r-project.org
> Subject: Re: [R] Create sequential vector for values in another column
>
>
>
> Also,
>
> it might be faster to use ?data.table()
> library(data.table)
> dt1<- data.table(dat1,key='id.name')
> dt1[,x:=seq(.N),by='id.name']
> A.K.
>
>
> On , arun <smartpink111 at yahoo.com> wrote:
> Hi,
> Try:
> dat1<-
>
> structure(list(id.name = c(123.45, 123.45, 123.45, 123.45, 234.56,
> 234.56, 234.56, 234.56, 234.56, 234.56, 234.56, 345.67, 345.67,
> 345.67, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78,
> 456.78, 456.78)), .Names = "id.name", class = "data.frame", row.names = c(NA,
> -23L))
> dat1$x <- with(dat1,ave(id.name,id.name,FUN=seq))
> A.K.
>
>
>
> On Friday, October 11, 2013 9:28 AM, Steven Ranney <steven.ranney at gmail.com>
> wrote:
> Hello all -
>
> I have an example column in a dataFrame
>
> id.name
> 123.45
> 123.45
> 123.45
> 123.45
> 234.56
> 234.56
> 234.56
> 234.56
> 234.56
> 234.56
> 234.56
> 345.67
> 345.67
> 345.67
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> ...
> [truncated]
>
> And I'd like to create a second vector of sequential values (i.e., 1:N) for
> each unique id.name value. In other words, I need
>
> id.name x
> 123.45 1
> 123.45 2
> 123.45 3
> 123.45 4
> 234.56 1
> 234.56 2
> 234.56 3
> 234.56 4
> 234.56 5
> 234.56 6
> 234.56 7
> 345.67 1
> 345.67 2
> 345.67 3
> 456.78 1
> 456.78 2
> 456.78 3
> 456.78 4
> 456.78 5
> 456.78 6
> 456.78 7
> 456.78 8
> 456.78 9
>
> The number of unique id.name values is different; for some values, nrow()
> may be 42 and for others it may be 36, etc.
>
> The only way I could think of to do this is with two nested for loops. I
> tried it but because this data set is so large (nrow = 112,679 with 2,161
> unique values of id.name), it took several hours to run.
>
> Is there an easier way to create this vector? I'd appreciate your thoughts.
>
> Thanks -
>
> SR
> Steven H. Ranney
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list