[R] Counting occurences of variables in a dataframe
Petr Savicky
savicky at cs.cas.cz
Sun Feb 12 08:11:06 CET 2012
On Sat, Feb 11, 2012 at 04:05:25PM -0500, David Winsemius wrote:
>
> On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:
>
> >Hi everybody,
> >I have a large dataframe similar to this one:
> >knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
> >kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
> >'20101201', '20110105', '20101001', '20110504', '20110603',
> >'20110201'),
> >format="%Y%m%d")
> >kdata <- data.frame (knames, kdate)
>
> > ave(unclass(kdate), knames, FUN=order )
> [1] 2 2 1 1 1 2 1 2 1 1
>
>
> That was actually not using the dataframe values but you could also do
> this:
>
> > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order ))
> > kdata
> knames kdate ord
> 1 ab 2011-10-01 2
> 2 aa 2011-11-02 2
> 3 ac 2010-10-01 1
> 4 ad 2010-03-15 1
> 5 ab 2010-12-01 1
> 6 ac 2011-01-05 2
> 7 aa 2010-10-01 1
> 8 ad 2011-05-04 2
> 9 ae 2011-06-03 1
> 10 af 2011-02-01 1
Hi.
This is a good solution, if there are at most two occurrences
of each name. If there are more occurrences, then function "order"
should be replaced by "rank". Replacing name "aa" at row 2 by "ab",
we get
knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order))
kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
kdata
knames kdate ord rank
1 ab 2011-10-01 3 2
2 ab 2011-11-02 1 3
3 ac 2010-10-01 1 1
4 ad 2010-03-15 1 1
5 ab 2010-12-01 2 1
6 ac 2011-01-05 2 2
7 aa 2010-10-01 1 1
8 ad 2011-05-04 2 2
9 ae 2011-06-03 1 1
10 af 2011-02-01 1 1
The names "ab" occur in the order row 5, row 1, row 2, so
row 1 should get index 2, row 2 index 3.
If some of the dates repeat, then rank() by default computes
the average index. In this case, the following function f()
may be used
knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
kdate <- as.Date( c('20111001', '20111001', '20101001', '20100315',
'20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
f <- function(x) rank(x, ties.method="first")
kdata$f <- with(kdata, ave(unclass(kdate), knames, FUN=f))
kdata
knames kdate rank f
1 ab 2011-10-01 2.5 2
2 ab 2011-10-01 2.5 3
3 ac 2010-10-01 1.0 1
4 ad 2010-03-15 1.0 1
5 ab 2010-12-01 1.0 1
6 ac 2011-01-05 2.0 2
7 aa 2010-10-01 1.0 1
8 ad 2011-05-04 2.0 2
9 ae 2011-06-03 1.0 1
10 af 2011-02-01 1.0 1
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list