[R] matching vectors against vectors
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Thu Mar 31 14:36:25 CEST 2005
You can use merge but to do so you will need to define the common key
first. This can be a rowname in the case of a matrix or names in the
case of a vector.
v1 <- 1:10
names(v1) <- LETTERS[1:10]
v2 <- 101:105
names(v2) <- sample( LETTERS[1:10], 5 )
> merge( v1, v2, by=0, all=TRUE )
Row.names x y
1 A 1 NA
2 B 2 102
3 C 3 104
4 D 4 103
5 E 5 105
6 F 6 NA
7 G 7 NA
8 H 8 101
9 I 9 NA
10 J 10 NA
Regards, Adai
On Tue, 2005-03-29 at 22:47 +0200, Piet van Remortel wrote:
> Hi all.
>
> I have a re-occuring typical problem that I don't know how to solve
> efficiently.
>
> The situation is the following: I have a number of data-sets
> (A,B,C,...) , consisting of an identifier (e.g. 11,12,13,...,20) and a
> measurement (e.g. in the range 100-120). I want to compile a large
> table, with all availabe identifiers in all data-sets in the rows, and
> a column for every dataset.
>
> Now, not all datasets have a measurement for every identifier, so I
> want NA if the set does not contain the identifier.
>
> an example for a single dataset:
>
> #all identifiers
> > rep <- c(10:20)
>
> #Identifiers in my dataset (a subset of rep)
> > rep1 <- c(12,13,15,16,17,18)
>
> #measurements in this dataset
> > rep1.r <- c(112,113,115,116,117,118)
>
> #a vector which should become a column in the final table, now
> containing all NAs
> > res <- rep(NA,10)
>
> #the IDs and values of my dataset together
> > data <- cbind(rep1, rep1.r)
>
> data looks like this:
> rep1 rep1.r
> [1,] 12 112
> [2,] 13 113
> [3,] 15 115
> [4,] 16 116
> [5,] 17 117
> [6,] 18 118
>
> Now, I want to put the values 112, 113, 115,... in the correct rows of
> the final table, using the identifiers as an indicator of which row to
> put it in, so that I finally obtain:
>
> rep res
> 10 NA
> 11 NA
> 12 112
> 13 113
> 14 NA
> 15 115
> 16 116
> 17 117
> 18 118
> 19 NA
> 20 NA
>
> I try to avoid repeating 'which' a lot and filling in every
> identifier's observation etc, since I will be doing this for thousands
> of rows at once. There must be an efficient way using factors,
> tapply etc, but I have trouble finding it. Ideal would be if this
> could be done in one go, instead of looping.
>
> Any suggestions ?
>
> Thanks,
>
> Piet
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list