[R] populating matrix with binary variable after matching data from data frame
William Dunlap
wdunlap at tibco.com
Thu Aug 14 18:00:23 CEST 2014
This is what I got:
> x1 <- data.frame(V1=c("K","D","K","M"), V2=c("L","A","M","A"))
> X <- array(0, c(4,4), rep(list(LETTERS[1:4]), 2))
> f(X, x1, badEntryAction="omitRows")
A B C D
A 0 0 0 0
B 0 0 0 0
C 0 0 0 0
D 1 0 0 0
> table(lapply(x1, factor, levels=LETTERS[1:4]))
V2
V1 A B C D
A 0 0 0 0
B 0 0 0 0
C 0 0 0 0
D 1 0 0 0
I think you should sort out how your attempts went wrong.
My original 'f' assumed, perhaps foolishly, that x1 had columns names
"V1" and "V2",
perhaps it should have said just i<-as.matrix(x1) and checked that the result
was a 2-column matrix of character data. E.g.,
f <- function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
{
badEntryAction <- match.arg(badEntryAction)
i <- as.matrix(x1)
stopifnot(is.character(i), ncol(i)==2)
if (badEntryAction == "omitRows") {
i <- i[is.element(i[, 1], dimnames(x)[[1]]) &
is.element(i[, 2], dimnames(x)[[2]]), , drop = FALSE]
}
else if (badEntryAction == "expandX") {
extraDimnames <- lapply(1:2, function(k) setdiff(i[,
k], dimnames(x)[[k]]))
# if you want the same dimnames on both axes,
# take union of the 2 extraDimnames
if ((n <- length(extraDimnames[[1]])) > 0) {
x <- rbind(x, array(0, c(n, ncol(x)),
dimnames = list(extraDimnames[[1]], NULL)))
}
if ((n <- length(extraDimnames[[2]])) > 0) {
x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
extraDimnames[[2]])))
}
}
x[i] <- 1
x
}
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Aug 14, 2014 at 8:15 AM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
> Hi Bill,
> sorry for trouble. It did not work both solutions.
> Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
>
>
> my x matrix is may not have items that x1 has.
>
> say x only has A,B, C, D , whereas x1 has K, L, M , A and D. However
> x1 does not have any relationship between B and C thus B-C will be a
> zero anyway.
>
> x1 :
>
> K L
> D A
> K M
> M A
> Although M associates with A, since M is not present in X - we will
> not map this association with 1. Since A and D are present in X - we
> will assign 1.
>
>
>
> A B C D
>
> A 0 0 0 0
>
> B 0 0 0 0
>
> C 0 0 0 0
>
> D 1 0 0 0
>
>
> I tried this simple for loop but I get same subset error:
>
>
> for(k in nrow(x1)){
> x[x1[k,]$V1,x1[k,]$V2] <- 1
> x[x1[,k]$V1,x1[,k]$V2] <- 1
> x[x1[,k]$V2,x1[,k]$V1] <- 1
> }
>
> Error in `[<-`(`*tmp*`, hprd[x, ]$V1, hprd[x, ]$V2, value = 1) :
> subscript out of bounds
>
> Thanks again.
>
> On Wed, Aug 13, 2014 at 6:02 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> Another solution is to use table to generate your x matrix, instead of
>> trying to make one and adding to it. If you want the table to have
>> the same dimnames on both sides, make factors out of the columns of x1
>> with the same factor levels in both. E.g., using a *small* example:
>>
>>> X1 <- data.frame(V1=c("A","A","B"), V2=c("C","C","A"))
>>> X <- table(lapply(X1, factor, levels=union(levels(X1[[1]]), levels(X1[[2]]))))
>>> X
>> V2
>> V1 A B C
>> A 0 0 2
>> B 1 0 0
>> C 0 0 0
>>
>> If you don't want counts, but just a TRUE for presence and FALSE for
>> absence, use X>0. If you want 1 for presence and 0 for absence you
>> can use pmin(X, 1).
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>> I may have missed something, but I didn't see the result you want for
>>> your example. Also,
>>> none of the entries in the x1 you showed are row or column names in x,
>>> making it hard to show what you want to happen.
>>>
>>> Here is a function that gives you the choice of
>>> *error: stop if any row of x1 is 'bad'
>>> *omitRows: ignore rows of x1 are 'bad'
>>> *expandX: expand the x matrix to include all rows or columns named in x1
>>> (Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
>>> is not a column name of x).
>>>
>>> f
>>> function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
>>> {
>>> badEntryAction <- match.arg(badEntryAction)
>>> i <- as.matrix(x1[, c("V1", "V2")])
>>> if (badEntryAction == "omitRows") {
>>> i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
>>> 2], dimnames(x)[[2]]), , drop = FALSE]
>>> }
>>> else if (badEntryAction == "expandX") {
>>> extraDimnames <- lapply(1:2, function(k) setdiff(i[,
>>> k], dimnames(x)[[k]]))
>>> # if you want the same dimnames on both axes, take union of
>>> the 2 extraDimnames
>>> if ((n <- length(extraDimnames[[1]])) > 0) {
>>> x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
>>> list(extraDimnames[[1]],
>>> NULL)))
>>> }
>>> if ((n <- length(extraDimnames[[2]])) > 0) {
>>> x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
>>> extraDimnames[[2]])))
>>> }
>>> }
>>> x[i] <- 1
>>> x
>>> }
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>> On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
>>> <oriolebaltimore at gmail.com> wrote:
>>>> Hello again. sorry for question again.
>>>>
>>>> may be I was not clear in asking before.
>>>>
>>>> I don't want to remove rows from matrix, since row names and column
>>>> names are identical in matrix.
>>>>
>>>>
>>>> I tried your suggestion and here is what I get:
>>>>
>>>>> fx <- function(x,x1){
>>>> + i <- as.matrix(x1[,c("V1","V2")])
>>>> + x[i]<-1
>>>> + x
>>>> + }
>>>>> fx(x, x1)
>>>>
>>>> Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
>>>>
>>>>
>>>>
>>>>
>>>>> x[1:4,1:4]
>>>> ABCA10 ABCA12 ABCA13 ABCA4
>>>> ABCA10 0 0 0 0
>>>> ABCA12 0 0 0 0
>>>> ABCA13 0 0 0 0
>>>> ABCA4 0 0 0 0
>>>>
>>>>
>>>>> x1[1:10,]
>>>> V1 V2
>>>> 1 AKT3 TCL1A
>>>> 2 AKTIP VPS41
>>>> 3 AKTIP PDPK1
>>>> 4 AKTIP GTF3C1
>>>> 5 AKTIP HOOK2
>>>> 6 AKTIP POLA2
>>>> 7 AKTIP KIAA1377
>>>> 8 AKTIP FAM160A2
>>>> 9 AKTIP VPS16
>>>> 10 AKTIP VPS18
>>>>
>>>>
>>>> For instance, now I will loop over x1, I go to first row, I get V1 and
>>>> check if if I have a row in x that have item in V1 and then check V2
>>>> exist in colnames, if match then I assign 1. If not I go to row 2.
>>>>
>>>> In some rows, it is possible that I will only see element in V2 that
>>>> exist in row names and since element in V1 does not exist in X
>>>> matrix, I will give 0. (since matrix X has identical row and column
>>>> names, i feel it does not matter to check an element in column names
>>>> after we check in row names)
>>>>
>>>>
>>>>
>>>> now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
>>>> x1$V2 then in matrix X column 1 and row 1 should get 1.
>>>>
>>>> dput - follows..
>>>>
>>>> x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
>>>> 4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
>>>> ), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))
>>>>
>>>>
>>>> x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
>>>> "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
>>>> "VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
>>>> "VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
>>>> 10L), class = "data.frame")
>>>>
>>>>
>>>>
>>>> Thanks for your time.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>>>> You can replace the loop
>>>>>> for (i in nrow(x1)) {
>>>>>> x[x1$V1[i], x1$V2[i]] <- 1;
>>>>>> }
>>>>> by
>>>>> f <- function(x, x1) {
>>>>> i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
>>>>> x[ i ] <- 1
>>>>> x
>>>>> }
>>>>> f(x, x1)
>>>>>
>>>>> You will get an error if not all the strings in the subscript matrix
>>>>> are in the row or
>>>>> column names of x. What do you want to happen in this case. You can choose
>>>>> to first omit the bad rows in the subscript matrix
>>>>> goodRows <- is.element(i[,1], dimnames(x)[1]) & is.element(i[,2],
>>>>> dimnames(x)[2])
>>>>> i <- i[goodRows, , drop=FALSE]
>>>>> x[ i ] <- 1
>>>>> or you can choose to expand x to include all the names found in x1.
>>>>>
>>>>> It would be good if you included some toy data to better illustrate
>>>>> what you want to do.
>>>>> E.g., with
>>>>> x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
>>>>> x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
>>>>> the above f() gives
>>>>>> f(x, x1)
>>>>> Col
>>>>> Row C1 C2 C3
>>>>> R1 0 1 0
>>>>> R2 0 0 0
>>>>> R3 1 0 0
>>>>> Is that what you are looking for?
More information about the R-help
mailing list