[BioC] sum the values with same ID
Hervé Pagès
hpages at fhcrc.org
Thu Mar 6 23:17:22 CET 2014
Hi anonymous guest,
On 03/06/2014 01:43 PM, guest [guest] wrote:
>
> Dear R user,
Note that this is the Bioconductor mailing list. Looks like your
question is a general question R question, not a Bioconductor
specific one.
>
> I have a matrix like:
>
> ID group1 group2 group3
> s1 0 2 3
> s2 1 0 4
> s1 3 4 1
> s4 2 2 0
>
> I would like to sum the values with same ID to have the matrix as below:
> ID group1 group2 group3
> s1 3 6 4
> s2 1 0 4
> s4 2 2 0
>
> I checked aggregate() may help to complete this job, but unfortunately I have the error message when I do this.
>
>> all.data <- read.csv("test.csv")
Note that 'all.data' is a data.frame, not a matrix.
>> aggregate(group1 ~ ID, data=all.data, FUN=sum)
> Error in eval(expr, envir, enclos) : object 'ID' not found
Trying with a matrix:
m <- matrix(sample(12L), ncol=3)
ID <- c("s1", "s2", "s1", "s4")
rownames(m) <- ID
colnames(m) <- paste0("group", 1:3)
Then:
> m
group1 group2 group3
s1 1 9 7
s2 11 12 10
s1 2 5 6
s4 8 3 4
> aggregate(group1 ~ ID, data=m, FUN=sum)
ID group1
1 s1 3
2 s2 11
3 s4 8
aggregate() will probably be too slow anyway on a matrix with many many
rows (hundreds of thousands or more). Here is a faster solution that
leverages the IRanges infrastructure:
library(IRanges)
m2 <- apply(m, 2, function(x) sum(splitAsList(x, ID)))
Cheers,
H.
PS: IRanges is a Bioconductor package.
>
> Please help me to generate the sum for the matrix. It's been appreciated for any help.
>
> Thanks a lot
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] RColorBrewer_1.0-5 vegan_2.0-10 lattice_0.20-24 permute_0.8-0 Heatplus_2.6.0 gplots_2.12.1
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 grid_3.0.2 gtools_3.1.1 KernSmooth_2.23-10 tools_3.0.2
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list