[R] How to match vector with a list ?

William Dunlap wdunlap at tibco.com
Fri Mar 5 19:29:44 CET 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Carlos Petti
> Sent: Friday, March 05, 2010 9:43 AM
> To: r-help at r-project.org
> Subject: [R] How to match vector with a list ?
> 
> Dear list,
> 
> I have a vector of characters and a list of two named elements :
> 
> i <- c("a","a","b","b","b","c","c","d")
> 
> j <- list(j1 = c("a","c"), j2 = c("b","d"))
> 
> I'm looking for a fast way to obtain a vector with names, as follows :
> 
> [1] "j1" "j1" "j2" "j2" "j2" "j1" "j1" "j2"

A request with a such a nice copy-and-pastable
example in it deserves an answer.

It looks to me like you want to map the item names
in i to the group names that are the names of the list j,
which maps group names to the items in each group.
When there are lots of groups it can be faster to
first invert the list j into a mapping vector pair,
as in:

f2 <- function (i, j) {
    groupNames <- rep(names(j), sapply(j, length)) # map to groupName
    itemNames <- unlist(j, use.names = FALSE) # map from itemName
    groupNames[match(i, itemNames, nomatch = NA)]
}

I put your original code into a function, as this makes
testing and development easier:

f0 <- function (i, j) {
    match <- lapply(j, function(x) {
        which(i %in% x)
    })
    k <- vector()
    for (y in 1:length(match)) {
        k[match[[y]]] <- names(match[y])
    }
    k
}

With your original data these give identical results:

> identical(f0(i,j), f2(i,j))
[1] TRUE

I made a list describing 1000 groups, each containing
an average of 10 members:

jBig <- split(paste("N",1:10000,sep=""),
sample(paste("G",1:1000,sep=""),size=10000,replace=TRUE))

and a vector of a million items sampled from the those
member names:

iBig <- sample(paste("N",1:10000,sep=""), replace=TRUE, size=1e6)

Then I compared the times it took f0 and f2 to compute
the result and verified that their outputs were identical:

> system.time(r0<-f0(iBig,jBig))
   user  system elapsed 
 100.89   10.20  111.27 
> system.time(r2<-f2(iBig,jBig))
   user  system elapsed 
   0.14    0.00    0.14 
> identical(r0,r2)
[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> I used :
> 
> match <- lapply(j, function (x) {which(i %in% x)})
> k <- vector()
> for (y  in 1:length(match)) {
> k[match[[y]]] <- names(match[y])}
> k
> [1] "j1" "j1" "j2" "j2" "j2" "j1" "j1" "j2"
> 
> But, I think a better way exists ...
> 
> Thanks in advance,
> Carlos
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list