[BioC] Filtering out tags with low counts in edgeR results in fewer differentially expressed genes.....

Gordon K Smyth smyth at wehi.EDU.AU
Wed Feb 6 02:07:47 CET 2013


Dear Mark,

Getting few DE genes is to be expected.  When you filter in this way, you 
are saying that you don't want to see genes for which differential 
expression is driven by only a subset of individuals in the up-group.  If 
you are happy to see genes that are DE but are only expressed in (say) 
half of the samples in the up-group, then use

   rowSums(cpm(y)>1) >= 6.

Best wishes
Gordon

> Date: Mon, 4 Feb 2013 18:27:33 -0500
> From: Mark Christie <christim at science.oregonstate.edu>
> To: bioconductor at r-project.org
> Subject: [BioC] Filtering out tags with low counts in edgeR results in
> 	fewer differentially expressed genes.....
>
> Dear edgeR users,
>
> Contrary to expectation, when I filter out tags with with low counts in
> edgeR I get fewer genes that are differentially expressed.  More
> importantly, the MDS plot shows no differences between groups after tags
> have been filtered, versus substantial differences between groups when no
> filtering was conducted.  RNA-Seq experiment details: 24 biological
> replicates averaging 10 million mapped reads per replicate. 12 samples
> belong to group one and 12 samples belong to group two.  Individuals from
> both groups were reared in common environments, but had different genetic
> backgrounds such that the differences between groups may be subtle (or at
> least more subtle than typical gene expression studies with a strict
> control and treatment).
>
> When filtering, I used the recommended approach:
>
> keep <- rowSums(cpm(y)>1) >= 12         # where 12 is the minimum group size
> y <- y[keep,]
> dim(y)
>
> The above filtering reduces the number of tags from 90,000 to ~ 50,000.
> After filtering a total of 29 genes are differentially expressed (adjusted
> p-value <= 0.05), versus 97 when the count data are not filtered.  Could
> this result simply be because there are actual biological differences
> between the two groups that result in many individuals of one group having
> cpm < 1.  Or could this somehow be an artifact of how I ran edgeR?  On
> perhaps a related note, should I alter my code because I have a fairly
> large number of biological replicates.  Below is the entire code I am
> running.  Many thanks for your advice!
>
>
>
>
> dat<- read.table("GeneCounts_males.txt", header=TRUE, sep="\t",
> na.strings="?",dec=".", strip.white=TRUE,row.names=1)
> group=factor(c("group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group1","group2","group2","group2","group2","group2","group2","group2","group2","group2","group2","group2","group2"))
>
> y <- DGEList(counts=dat,group=group)
> colnames(y)
> dim(y)        #total number of unique tags
> y$samples     #number of tags ("genes") per sample
>
>
> #filter (optional?)
> keep <- rowSums(cpm(y)>1) >= 12
> y <- y[keep,]
> dim(y)
>
> #recompute library sizes
> y$samples$lib.size <- colSums(y$counts)
> y$samples
>
> #normalize
> y <- calcNormFactors(y)
> y$samples
>
> #plot MDS of treatments
> plotMDS(y)
>
>
> #estimate dispersion
> y <- estimateCommonDisp(y, verbose=TRUE)
> y <- estimateTagwiseDisp(y)                      #uses empirical Bayes
> method, if prior null then df=20
> #y <- estimateTagwiseDisp(y,prior.df=5)          #I am not sure if this is
> the correct implementaion, put more weight on tagwise vs. common
> y
> plotBCV(y)
>
> #test for DE
> et <- exactTest(y)
> top <- topTags(et)
> top
> cpm(y)[rownames(top), ]
>
> #total number of genes at 5% FDR
> summary(de <- decideTestsDGE(et))                   #-1 equals upregulate,
> 1 equals down regulated:  need to be carefuul though as order (control,vs
> treatment matters)
> top <- topTags(et,n=40) #take top n genes
> write.table(top,file="DEgenesMales2.txt",col.names=TRUE,sep="\t",append=FALSE)
> #move column headers one column to the right (inapproporiate format due to
> as.matrix)
>
> detags <- rownames(y)[as.logical(de)]
> plotSmear(et, de.tags=detags)
> abline(h=c(-1, 1), col="blue")
>
>
>
>
>
>
>
> -- 
> Mark R. Christie, PhD
> Department of Zoology,
> Oregon State University
> http://www.science.oregonstate.edu/~christim/
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list