[R] using match to obtain non-sorted index values from non-sortedvector

David Winsemius dwinsemius at comcast.net
Wed Jul 9 23:01:29 CEST 2014


On Jul 9, 2014, at 1:13 PM, Folkes, Michael wrote:

> So nice! 
> Apply wins again.

I doubt that `sapply( ..., which(,) )` would win a foot race with `match`:

> match(Tset, pop.df$pop)
[1] 5 4 2

-- 
David.
> Thanks David.
> Michael
> 
> -----Original Message-----
> From: David L Carlson [mailto:dcarlson at tamu.edu] 
> Sent: July-09-14 1:11 PM
> To: Folkes, Michael; r-help at r-project.org
> Subject: RE: using match to obtain non-sorted index values from
> non-sortedvector
> 
> There may be a faster way, but 
> 
>> sapply(Tset, function(x) which(pop.df$pop==x))
> [1] 5 4 2
> 
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Folkes, Michael
> Sent: Wednesday, July 9, 2014 2:58 PM
> To: r-help at r-project.org
> Subject: [R] using match to obtain non-sorted index values from
> non-sorted vector
> 
> Hello all,
> 
> I've been struggling with the best way to find index values from a large
> vector with elements that will match elements of a subset vector [the
> table argument in match()]. 
> 
> BUT the index values can't come out sorted (as we'd get in  which(X %in%
> Y) ).
> 
> My 'population' vector can't be sorted. 
> 
> pop.df <- data.frame(pop=c(1,6,4,3,10)) 
> 
> The subset:  Tset = c(10,3,6)
> 
> 
> 
> So I'd like to get these index values (from pop.df) , in this order:
> 5,4,2
> 
> 
> 
> If it could be sorted I could use:
> 
> which(sort(pop.df$pop) %in% sort(Tset))
> 
> 
> 
> But sorting will cause more grief later, so best not mess with it.
> 
> Here is my hopefully adequate MWE of a solution. I'm keen to see if
> anybody has a better suggestion. 
> 
> Thanks!
> 
> _____________________
> 
> ###BEGIN R
> 
> #pop is the full set of values, it has no info on their ranking
> 
> # I don't want to sort these data. They need to remain in this order.
> 
> pop.df <- data.frame(pop=c(1,6,4,3,10))
> 
> 
> 
> #rank.df is my dataframe that tells me the top three rankings (derived
> elsewhere)
> 
> rank.df <- data.frame(rank=1:3, Tset = c(10,3,6))   # Target set
> 
> 
> 
> #match.df will be my source of row index based on rank
> 
> match.df <- data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset),
> index.vec=1:nrow(pop.df))
> 
> 
> #rank.df will now include the index location in the pop.df where I can
> find the top three ranks.
> 
> rank.df  <- merge(rank.df, match.df, by.x='rank', by.y='match.vec')
> 
> rank.df
> 
> 
> ####END
> 
> 
> 
> _______________________________________________________
> 
> Michael Folkes
> 
> Salmon Stock Assessment
> 


David Winsemius
Alameda, CA, USA



More information about the R-help mailing list