[R] Creating a sparse matrix from a file
Martin Maechler
maechler at stat.math.ethz.ch
Tue Oct 27 11:42:46 CET 2009
PP> Hi all,
PP> I used sparseM package for creating sparse Matrix and
PP> followed below commands.
I'd strongly recommend to use package 'Matrix' which is part of
every R distribution (since R 2.9.0).
PP> The sequence of commands are:
>> ex <- read.table('fileName',sep=',')
>> M <- as.matrix.csr(0,22638,80914)
>> for (i in 1:nrow(ex)) { M[ex[i,1],ex[i,2]]<-ex[i,3]}
This is very slow in either 'Matrix' or 'SparseM'
as soon as nrow(ex) is non-small.
However, there are very efficient ways to construct the sparse
matrix directly from your 'ex' structure:
In 'Matrix' you should use the sparseMatrix() function as you
had proposed.
Here I provide a reproducible example,
using a random 'ex':
n <- 22638
m <- 80914
nnz <- 300000 # no idea if this is realistic for you
set.seed(101)
ex <- cbind(i = sample(n,nnz, replace=TRUE),
j = sample(m,nnz, replace=TRUE),
x = round(100 * rnorm(nnz)))
library(Matrix)
M <- sparseMatrix(i = ex[,"i"],
j = ex[,"j"],
x = ex[,"x"])
MM. <- tcrossprod(M) # == MM' := M %*% t(M)
M.1 <- M %*% rep(1, ncol(M))
stopifnot(identical(drop(M.1), rowSums(M)))
## .... and now do other stuff with your sparse matrix M
PP> Even after 4 hours, I can still see the above command running. But, I am not
PP> sure whether it got stuck some where.
PP> Also, when I initialize matrix M and try to display the values, I can see
PP> something like this
PP> [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
PP> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
PP> 2 2 2 2 2 2 2 2 2 2
PP> [85] 2 2
PP> And, after I stopped executing above initialize command from table(after 4
PP> hours). I could see a different values.
PP> Could some one kindly explain what these number are about and how can I test
PP> that my command is running and not just stuck some where.
PP> Also, it would be great if some one point me to a tutorial if any on sparse
PP> matricies on R as I couldn't get one from internet.
PP> Thanks
PP> Pallavi
PP> Pallavi Palleti wrote:
>>
>> Hi David,
>>
>> Thanks for your help. This is exactly what I want.
>> But, I have number of rows of my matrix = 25k and columns size as 80k. So,
>> when I define a matrix object, it is throwing an error saying can not
>> allocate a vector of length (25K * 80k). I heard that, this data can still
>> be loaded into R using sparseMatrix. However, I couldn't get a syntax for
>> creating the same. Could someone kindly help me in this regard.
>>
>> Thanks
>> Pallavi
>>
>>
>> David Winsemius wrote:
>>>
>>>
>>> On Oct 26, 2009, at 5:06 AM, Pallavi Palleti wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> I am new to R and learning the same. I would like to create a sparse
>>>> matrix
>>>> from an existing file whose contents are in the format
>>>> "rowIndex,columnIndex,value"
>>>>
>>>> for ex:
>>>> 1,2,14
>>>> 2,4,15
>>>>
>>>> I would like to create a sparse matrix by taking the above as input.
>>>> However, I couldn't find an example where the data was being read
>>>> from a
>>>> file. I tried searching in R tutorial and also searched for the same
>>>> in web
>>>> but in vain. Could some one kindly help me how to give the above
>>>> format as
>>>> input in R to create a sparse matrix.
>>>
>>> ex <- read.table(textConnection("1,2,14
>>> 2,4,15") , sep=",")
>>> ex
>>> # V1 V2 V3
>>> #1 1 2 14
>>> #2 2 4 15
>>>
>>> M <- Matrix(0, 20, 20)
>>>
>>> > M
>>> #20 x 20 sparse Matrix of class "dsCMatrix"
>>>
>>> [1,] . . . . . . . . . . . . . . . . . . . .
>>> [2,] . . . . . . . . . . . . . . . . . . . .
>>> [3,] . . . . . . . . . . . . . . . . . . . .
>>> snip
>>>
>>> for (i in 1:nrow(ex) ) { M[ex[i, 1], ex[i, 2] ] <- ex[i, 3] }
>>>
>>> > M
>>> 20 x 20 sparse Matrix of class "dgCMatrix"
>>>
>>> [1,] . 14 . . . . . . . . . . . . . . . . . .
>>> [2,] . . . 15 . . . . . . . . . . . . . . . .
>>> [3,] . . . . . . . . . . . . . . . . . . . .
>>> snip
>>> >
>>> --
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
PP> --
PP> View this message in context: http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html
PP> Sent from the R help mailing list archive at Nabble.com.
PP> ______________________________________________
PP> R-help at r-project.org mailing list
PP> https://stat.ethz.ch/mailman/listinfo/r-help
PP> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
PP> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list