[BioC] identifying consistently expressed genes between replicates
Gordon K Smyth
smyth at wehi.EDU.AU
Mon Apr 11 06:02:35 CEST 2011
Hi Wendy,
First, let me mention that fit$sigma holds the between-replicate standard
deviation for each gene, which is probably what you were looking for in
your original post.
Second, here is a way to compare each cell type with each of the others.
Suppose you want signature genes for BCELLA2. The following will compare
all other cell types back to BCELLA2:
f <- factor(samplenames)
BCELLA2vs <- relevel(f,ref="BCELLA2")
design <- model.matrix(~BCELLA2vs)
fit <- eBayes(lmFit(es.mx,design))
Now do all the pairwise tests asking for FDR better than 0.1 and fold
change at least 1.5 (you can choose the settings you want):
results <- decideTests(fit[,-1], p=0.1, lfc=log2(1.5))
You can find the indices of positive signature genes that are up in all
comparisons by:
i <- apply(results>0,1,all)
or negative signature genes by
i <- apply(results<0,1,all)
However, you have so many cell types, some of which are probably quite
similar. You might allow some of these comparisons to be non-significant.
Suppose you decide to restrict to genes that are up in BCELLA2 vs 20 out
of the 23 other cell types:
i <- rowSums(results>0) >= 20
You can see that any variation of this is quite easy.
Best wishes
Gordon
On Sun, 10 Apr 2011, Wendy Qiao wrote
> Dear Gordon,
>
> Thank you very much for your information.
>
> You are right-I am comparing each cell type to the average of all the
> others. Ideally, I want to compare each cell type to the others pairwisely
> and find the signature genes as you suggested. I tried this before, but I am
> afraid that I did not take the full advantages of limma as I am new here.
> Here is my problem. I am comparing 24 blood cell types (92 arrays in total).
> Following are the steps that I took. The pairwise comparison take dozens of
> ligands. Then I used topTable to find overexpressed genes from each
> comparison, and finally do the 'intersect'. I believe that there is an easy
> way to do all the pairwise comparisons and use decideTests(). Would you mind
> giving me some hints on that?
>
> Thank you very much.
> Wendy
>
> f<-factor(samplenames) #sampelenames = colnames of 92 arrays with
> replicates have the same name
> design<-model.matrix(~0+f)
> fit<-lmFit(es.mx,design)
> fit<-eBayes(fit)
>
> contrast.matrix<-makeContrasts(fBASO1-fBCELLA1, fBASO1-fBCELLA2.....
>
>
>
> fBASO1 fBCELLA1 fBCELLA2 fBCELLA3 ...
> 1 1 0 0 0 ...
> 2 1 0 0 0 ...
> 3 1 0 0 0 ...
> 4 0 1 0 0 ...
> ...
> 92 0 0 0 0 ...
>
>
> On 10 April 2011 18:30, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Wendy,
>>
>> From your email, I assume that you have found signature genes by comparing
>> each cell type to all the other cell types treated as one group. As you
>> have correctly observed, this does not take account of consistency within
>> the other cell types. Another way to find signature genes, that I think is
>> superior, is to choose signature genes to be those genes that are uniquely
>> higher or lower in the relevant cell type with respect to each of the other
>> cell types individually. In other words, a positive signature gene is
>> higher in the relevant cell type against every other cell type, not just
>> against the average of the other cell types. This was the method used in:
>>
>> Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML,
>> Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; kConFab,
>> Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, Lindeman GJ.
>> Aberrant luminal progenitors as the candidate target population for basal
>> tumor development in BRCA1 mutation carriers. Nature Medicine 2009.
>>
>> to find stem cell signature genes. If you do it this way, consistency
>> within the cell types is automatically taken care off, because the t-tests
>> will only choose genes with consistent behaviour. limma can do all the
>> relevant pairwise tests for you in a couple of lines, then use decideTests()
>> to choose the signature genes.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> NHMRC Senior Research Fellow,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> Tel: (03) 9345 2326, Fax (03) 9347 0852,
>> smyth at wehi.edu.au
>> http://www.wehi.edu.au
>> http://www.statsci.org/smyth
>>
>>
>> Date: Sat, 9 Apr 2011 19:57:25 -0400
>>> From: Wendy Qiao <wendy2.qiao at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] identifying consistently expressed genes between
>>> replicates
>>>
>>> Hi all,
>>>
>>> I am comparing a number of cell types, and am wanting to find the
>>> signature genes of each cell type. I used the limma package to do
>>> this. The signature genes of a given cell type are found by the fold
>>> different between the given cell type and grand mean of all the cell
>>> types, as well as the BH-adjusted p-values. I want to add another
>>> condition to test the consistency of expression levels of the selected
>>> genes for each cell type. I can do this by looking at the standard
>>> deviations of gene expressions between replicates. I am just wondering
>>> if there is any function in limma or other BioConductor package to do
>>> this.
>>>
>>> Thank you in advance,
>>> Wendy
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list