[R] how to select the first observation only?
William Dunlap
wdunlap at tibco.com
Thu Apr 22 05:41:19 CEST 2010
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of gallon li
> Sent: Wednesday, April 21, 2010 7:18 PM
> To: r-help
> Subject: [R] how to select the first observation only?
>
> Dear r-helpers,
>
> I have a very simple question. Suppose my data is like
>
> id=c(rep(1,2),rep(2,2))
> b=c(2,3,4,5)
> m=cbind(id,b)
>
> > m
> id b
> [1,] 1 2
> [2,] 1 3
> [3,] 2 4
> [4,] 2 5
> I wish to select the first observation for each id. That is, I want to
> quickly select two rows:
>
> id b
> 1 2
> 2 4
The following will quickly select the first row
in each run of identical 'id's. If your data
is sorted by 'id' then it solves your problem.
> isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
> m[ isFirstInRun(m[,"id"]), , drop=FALSE]
id b
[1,] 1 2
[2,] 2 4
If the 'id' column contains NA's then you need
to decide how a run of NA's should be handled.
E.g., turning it into a factor with an NA in the
levels:
m[ isFirstInRun(factor(m[,"id"], exclude=NULL)), ]
will select the first in a run of NA's and
isNaOrTrue <- function(x) is.na(x) | x
m[ isNaOrTrue(isFirstInRun(m[,"id"])), ]
will treat each NA in 'id' as a unique value (a run
of length 1).
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> only. how should i do this?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list