[R] Cube of Matrices or list of Matrices
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Tue Jan 20 04:13:31 CET 2015
I use plyr and am learning dplyr and magrittr, but those are just syntactic sugar. What I have been having difficulty with in this thread is the idea that it somehow makes sense to pad vectors with NA values... because I really don't think it does. It seems more like a hammer looking for a nail because that is what it knows how to deal with.
You have a list of matrices with data in them, and switching from for loops to lapply is not in itself going to fix a memory or speed problem... normally the big improvements are in the way you allocate and use your data. Burns talks about pre-allocating the result to speed things up, but I don't understand the problem well enough to suggest an efficient data structure to pre-allocate.
I suggest that Karim read and adhere to the Posting Guide (particularly the bits about giving a reproducible example and posting in plain text so it doesn't get scrambled) if help with optimizing is desired. The discussion at [1] might clarify what "reproducible" means.
I will also mention that efficient algorithms for this subject area are frequently available in the Bioconductor project, so I hope you are not re-inventing the wheel and have already reviewed their tools.
[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On January 19, 2015 6:11:38 PM PST, Ben Tupper <btupper at bigelow.org> wrote:
>Hi,
>
>On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
>
>> Thanks Ben.
>> I need to learn more about apply. Have you a link or tutorial about
>apply. R documentation is very short.
>>
>> How can obtain:
>> z <- list (Col1, Col2, Col3, Col4......)?
>>
>
>This may not be the most efficient way and there certainly is no error
>checking, but you can wrap one lapply within another as shown below.
>The innermost iterates over your list of input matrices, extracting one
>column specified per list element. The outer lapply iterates over the
>various column numbers you want to extract.
>
>
>getMatrices <- function(colNums, dataList = x){
> # the number of rows required
> n <- max(sapply(dataList, nrow))
>lapply(colNums, function(x, dat, n) { # iterate along requested columns
>do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input
>data list
> }, dataList, n)
>}
>
>getMatrices(c(1,3), dataList = x)
>
>If we are lucky, one of the plyr package users might show us how to do
>the same with a one-liner.
>
>
>There are endless resources online, here are some gems.
>
>http://www.r-project.org/doc/bib/R-books.html
>http://www.rseek.org/
>http://www.burns-stat.com/documents/
>http://www.r-bloggers.com/
>
>Also, I found "Data Manipulation with R" (
>http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 )
>helpful.
>
>Ben
>
>> Thanks
>>
>> Ô__
>> c/ /'_;~~~~kmezhoud
>> (*) \(*) ⴽⴰⵔⵉⵎ ⵎⴻⵣⵀⵓⴷ
>> http://bioinformatics.tn/
>>
>>
>>
>> On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org>
>wrote:
>> Hi again,
>>
>> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com>
>wrote:
>>
>>> Yes Many thanks.
>>> That is my request using lapply.
>>>
>>> do.call(cbind,col1)
>>>
>>> converts col1 to matrix but does not fill empty value with NA.
>>>
>>> Even for
>>>
>>> matrix(unlist(col1), ncol=5,byrow = FALSE)
>>>
>>>
>>> How can get Matrix class of col1? And fill empty values with NA?
>>>
>>
>> Perhaps best is to determine the maximum number of rows required
>first, then force each subset to have that length.
>>
>> # make a list of matrices, each with nCol columns and differing
>> # number of rows
>> nCol <- 3
>> nRow <- sample(3:10, 5)
>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol =
>nc, nrow = x)}, nCol)
>> x
>>
>> # make a simple function to get a single column from a matrix
>> getColumn <- function(x, colNum, len = nrow(x)) {
>> y <- x[,colNum]
>> length(y) <- len
>> y
>> }
>>
>> # what is the maximum number of rows
>> n <- max(sapply(x, nrow))
>>
>> # use the function to get the column from each matrix
>> col1 <- lapply(x, getColumn, 1, len = n)
>> col1
>>
>> do.call(cbind, col1)
>> [,1] [,2] [,3] [,4] [,5]
>> [1,] 3 8 5 7 9
>> [2,] 4 9 6 8 10
>> [3,] 5 10 7 9 11
>> [4,] NA 11 8 10 12
>> [5,] NA 12 9 11 13
>> [6,] NA 13 NA 12 14
>> [7,] NA 14 NA 13 15
>> [8,] NA 15 NA NA 16
>> [9,] NA NA NA NA 17
>>
>> Ben
>>
>>> Thanks
>>> Karim
>>>
>>>
>>> Ô__
>>> c/ /'_;~~~~kmezhoud
>>> (*) \(*) ⴽⴰⵔⵉⵎ ⵎⴻⵣⵀⵓⴷ
>>> http://bioinformatics.tn/
>>>
>>>
>>>
>>> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com>
>wrote:
>>> Hi,
>>>
>>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com>
>wrote:
>>>
>>> > Dear All,
>>> > I am trying to get correlation between Diseases (80) in columns
>and
>>> > samples in rows (UNEQUAL) using gene expression (at less
>1000,numeric). For
>>> > this I can use CORREP package with cor.unbalanced function.
>>> >
>>> > But before to get this final matrix I need to load and to store
>the
>>> > expression of 1000 genes for every Disease (80). Every disease has
>>> > different number of samples (between 50 - 500).
>>> >
>>> > It is possible to get a cube of matrices with equal columns but
>unequal
>>> > rows? I think NO and I can't use array function.
>>> >
>>> > I am trying to get à list of matrices having the same number of
>columns but
>>> > different number of rows. as
>>> >
>>> > Cubist <- vector("list", 1)
>>> > Cubist$Expression <- vector("list", 1)
>>> >
>>> >
>>> > for (i in 1:80){
>>> >
>>> > matrix <- function(getGeneExpression[i])
>>> > Cubist$Expression[[Disease[i]]] <- matrix
>>> >
>>> > }
>>> >
>>> > At this step I have:
>>> > length(Cubist$Expression)
>>> > #80
>>> > dim(Cubist$Expression$Disease1)
>>> > #526 1000
>>> > dim(Cubist$Expression$Disease2)
>>> > #106 1000
>>> >
>>> > names(Cubist$Expression$Disease1[4])
>>> > #ABD
>>> >
>>> > names(Cubist$Expression$Disease2[4])
>>> > #ABD
>>> >
>>> > Now I need to built the final matrices for every genes (1000) that
>I will
>>> > use for CORREP function.
>>> >
>>> > Is there a way to extract directly the first column (first gene)
>for all
>>> > Diseases (80) from Cubist$Expression? or
>>> >
>>>
>>> I don't understand most your question, but the above seems to be
>straight forward. Here's a toy example:
>>>
>>> # make a list of matrices, each with nCol columns and differing
>>> # number of rows, nRow
>>> nCol <- 3
>>> nRow <- sample(3:10, 5)
>>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol =
>nc, nrow = x)}, nCol)
>>> x
>>>
>>> # make a simple function to get a single column from a matrix
>>> getColumn <- function(x, colNum) {
>>> return(x[,colNum])
>>> }
>>>
>>> # use the function to get the column from each matrix
>>> col1 <- lapply(x, getColumn, 1)
>>> col1
>>>
>>> Does that help answer this part of your question? If not, you may
>need to create a very small example of your data and post it here using
>the head() and dput() functions.
>>>
>>> Ben
>>>
>>>
>>>
>>> > I need to built 1000 matrices with 80 columns and unequal rows?
>>> >
>>> > Cublist$Diseases <- vector("list", 1)
>>> >
>>> > for (k in 1:1000){
>>> > for (i in 1:80){
>>> >
>>> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i]
>]][k]
>>> > }
>>> >
>>> > }
>>> >
>>> > This double loops is time consuming...Is there a way to do this
>faster?
>>> >
>>> > Thanks,
>>> > karim
>>> > Ô__
>>> > c/ /'_;~~~~kmezhoud
>>> > (*) \(*) ⴽⴰⵔⵉⵎ ⵎⴻⵣⵀⵓⴷ
>>> > http://bioinformatics.tn/
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>Ben Tupper
>Bigelow Laboratory for Ocean Sciences
>60 Bigelow Drive, P.O. Box 380
>East Boothbay, Maine 04544
>http://www.bigelow.org
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list