[BioC] identifying consistently expressed genes between replicates

Mon Apr 11 06:02:35 CEST 2011

Hi Wendy,

First, let me mention that fit$sigma holds the between-replicate standard 
deviation for each gene, which is probably what you were looking for in 
your original post.

Second, here is a way to compare each cell type with each of the others. 
Suppose you want signature genes for BCELLA2.  The following will compare 
all other cell types back to BCELLA2:

   f <- factor(samplenames)
   BCELLA2vs <- relevel(f,ref="BCELLA2")
   design <- model.matrix(~BCELLA2vs)
   fit <- eBayes(lmFit(es.mx,design))

Now do all the pairwise tests asking for FDR better than 0.1 and fold 
change at least 1.5 (you can choose the settings you want):

   results <- decideTests(fit[,-1], p=0.1, lfc=log2(1.5))

You can find the indices of positive signature genes that are up in all 
comparisons by:

  i <- apply(results>0,1,all)

or negative signature genes by

  i <- apply(results<0,1,all)

However, you have so many cell types, some of which are probably quite 
similar.  You might allow some of these comparisons to be non-significant. 
Suppose you decide to restrict to genes that are up in BCELLA2 vs 20 out 
of the 23 other cell types:

  i <- rowSums(results>0) >= 20

You can see that any variation of this is quite easy.

Best wishes
Gordon

On Sun, 10 Apr 2011, Wendy Qiao wrote

> Dear Gordon,
>
> Thank you very much for your information.
>
> You are right-I am comparing each cell type to the average of all the
> others. Ideally, I want to compare each cell type to the others pairwisely
> and find the signature genes as you suggested. I tried this before, but I am
> afraid that I did not take the full advantages of limma as I am new here.
> Here is my problem. I am comparing 24 blood cell types (92 arrays in total).
> Following are the steps that I took. The pairwise comparison take dozens of
> ligands. Then I used topTable to find overexpressed genes from each
> comparison, and finally do the 'intersect'. I believe that there is an easy
> way to do all the pairwise comparisons and use decideTests(). Would you mind
> giving me some hints on that?
>
> Thank you very much.
> Wendy
>
> f<-factor(samplenames)   #sampelenames = colnames of 92 arrays with
> replicates have the same name
> design<-model.matrix(~0+f)
> fit<-lmFit(es.mx,design)
> fit<-eBayes(fit)
>
> contrast.matrix<-makeContrasts(fBASO1-fBCELLA1, fBASO1-fBCELLA2.....
>
>
>
>   fBASO1 fBCELLA1 fBCELLA2 fBCELLA3 ...
> 1       1        0        0        0     ...
> 2       1        0        0        0    ...
> 3       1        0        0        0      ...
> 4       0        1        0        0     ...
> ...
> 92      0        0        0        0   ...
>
>
> On 10 April 2011 18:30, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Wendy,
>>
>> From your email, I assume that you have found signature genes by comparing
>> each cell type to all the other cell types treated as one group.  As you
>> have correctly observed, this does not take account of consistency within
>> the other cell types.  Another way to find signature genes, that I think is
>> superior, is to choose signature genes to be those genes that are uniquely
>> higher or lower in the relevant cell type with respect to each of the other
>> cell types individually.  In other words, a positive signature gene is
>> higher in the relevant cell type against every other cell type, not just
>> against the average of the other cell types.  This was the method used in:
>>
>> Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML,
>> Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; kConFab,
>> Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, Lindeman GJ.
>> Aberrant luminal progenitors as the candidate target population for basal
>> tumor development in BRCA1 mutation carriers.  Nature Medicine 2009.
>>
>> to find stem cell signature genes.  If you do it this way, consistency
>> within the cell types is automatically taken care off, because the t-tests
>> will only choose genes with consistent behaviour.   limma can do all the
>> relevant pairwise tests for you in a couple of lines, then use decideTests()
>> to choose the signature genes.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> NHMRC Senior Research Fellow,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> Tel: (03) 9345 2326, Fax (03) 9347 0852,
>> smyth at wehi.edu.au
>> http://www.wehi.edu.au
>> http://www.statsci.org/smyth
>>
>>
>>  Date: Sat, 9 Apr 2011 19:57:25 -0400
>>> From: Wendy Qiao <wendy2.qiao at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] identifying consistently expressed genes between
>>>        replicates
>>>
>>> Hi all,
>>>
>>> I am comparing a number of cell types, and am wanting to find the 
>>> signature genes of each cell type. I used the limma package to do 
>>> this. The signature genes of a given cell type are found by the fold 
>>> different between the given cell type and grand mean of all the cell 
>>> types, as well as the BH-adjusted p-values. I want to add another 
>>> condition to test the consistency of expression levels of the selected 
>>> genes for each cell type. I can do this by looking at the standard 
>>> deviations of gene expressions between replicates. I am just wondering 
>>> if there is any function in limma or other BioConductor package to do 
>>> this.
>>>
>>> Thank you in advance,
>>> Wendy

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}