[R] Help on averaging sets of rows defined by row name
Liaw, Andy
andy_liaw at merck.com
Fri Apr 20 16:09:11 CEST 2007
You might want to check which of the following scales better for the
size of data you have.
## Make up some data to try.
R> dat <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9),
s2=runif(9))
R> dat
gene s1 s2
1 a 0.9959172 0.9531052
2 a 0.2064497 0.4257022
3 a 0.4791100 0.5977923
4 b 0.1307096 0.8256453
5 b 0.7887983 0.8904983
6 b 0.7841745 0.6901540
7 c 0.3356583 0.7125086
8 c 0.5859311 0.0509323
9 c 0.7681325 0.8677725
## Use aggregate():
R> aggregate(dat[-1], dat[1], mean)
gene s1 s2
1 a 0.5604923 0.6588666
2 b 0.5678941 0.8020992
3 c 0.5632407 0.5437378
## Do it "by hand": need a bit more work if there are Nas.
R> rowsum(dat[-1], dat[[1]]) / table(dat[[1]])
s1 s2
a 0.5604923 0.6588666
b 0.5678941 0.8020992
c 0.5632407 0.5437378
Andy
From: Booman, M
>
> Dear all,
>
> This is my problem: I have a table of gene expression data,
> where 1st column is gene name, and 2nd -39th columns each are
> exression data for 38 samples. There are multiple
> measurements per sample for each gene, so there are multiple
> rows for each gene name. I want to average these measurements
> so i end up with one value per sample for each gene name. The
> output data frame (table.averaged) is further used in other R
> script. The code I use now (see below) takes 20 secs for each
> loop, so it takes 45 minutes to average my files of 13500
> unique genes. Can anyone help me do this faster?
>
> Cheers, marije
>
> Code I use:
>
>
> table.imputed[,1] <- as.character(table.imputed[,1])
> #table.imputed is data.frame,1st column = gene name (class
> factor), rest of columns = expression data (class numeric)
>
> genesunique <- unique(table.imputed[,1])
> #To make list of unique genes in the set
>
> table.averaged <- NULL
> for (j in 1:length(genesunique)) {
> if (j%%100 == 0){
> #To report progress
> cat(j, "genes finished", sep=" ", fill=TRUE)
> }
>
> table.averaged<-rbind(table.averaged,givemean(genesunique[j],
> table.imputed)) #collects all rows of average values and
> binds them back into one data frame
> }
>
> givemean <- function (gene, table.imputed) {
> thisgene<-table.imputed[table.imputed[,1]==gene,]
> #make a subset containing only
> the rows for one gene name
> data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean,
> na.rm=TRUE))) #calculates average for each sample
> (column) and outputs one row of average values and the gene name
> }
>
>
> De inhoud van dit bericht is vertrouwelijk en alleen bestemd
> voor de geadresseerde(n). Anderen dan de geadresseerde mogen
> geen gebruik maken van dit bericht, het openbaar maken of op
> enige wijze verspreiden of vermenigvuldigen. Het UMCG kan
> niet aansprakelijk gesteld worden voor een incomplete
> aankomst of vertraging van dit verzonden bericht.
>
> The contents of this message are confidential and only
> intended for the eyes of the addressee(s). Others than the
> addressee(s) are not allowed to use this message, to make it
> public or to distribute or multiply this message in any way.
> The UMCG cannot be held responsible for incomplete reception
> or delay of this transferred message.
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}
More information about the R-help
mailing list