[BioC] Interspecies differential expression of orthologs with Edger
assaf www
assafwww at gmail.com
Wed Aug 27 17:14:22 CEST 2014
This is very helpful for me, thanks.
A slight change that I made in the code you sent, to avoid some R erros, is
# to replace:
offset = offset + gl
# with:
offset = sweep(gl,2,offset,"+")
In addition to differential expression tests, I wanted also to extract
FPKMs values (and/or normalized CPM values), that would take into account
all components of the offset (which if I'm not mistaken are: library_size +
TMM + gene_size).
I assume rpkm()/cpm() should correct only for library_size + TMM.
Is there a possibly "decent" solution for that ?
all the best, and thanks,
Assaf
On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> It works something like this:
>
> library(edgeR)
> y <- DGEList(counts=counts)
> y <- calcNormFactors(y)
>
> # Column correct log gene lengths
> # Columns of gl should add to zero
>
> gl <- log(geneLength)
> gl <- t(t(gl)-colMeans(gl))
>
> # Combine library sizes, norm factors and gene lengths:
>
> offset <- expandAsMatrix(getOffset(y))
> offset <- offset + gl
>
> Then
>
> y$offset <- offset
> y <- estimateGLMCommonDisp(y,design)
>
> etc.
>
> Note that I have not tried this myself. It should work in principle from
> a differential expression point of view.
>
> On the other hand, there may be side effects regarding dispersion trend
> estimation -- I do not have time to explore this.
>
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Wed, 27 Aug 2014, assaf www wrote:
>
> Probably wrong, but the reason I thought of using quantile normalization
>> is
>> the need to correct both for the species-length, and library size.
>>
>>
>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>> wrote:
>>
>> That doesn't look helpful to me. I suggested that you incorporate gene
>>> lengths into the offsets, not do quantile normalization of cpms.
>>>
>>> Sorry, I just don't have time to develop a code example for you. I hope
>>> someone else will help.
>>>
>>> The whole topic of interspecies differential expression is a can of
>>> worms.
>>> Even if you adjust for gene length, there will still be differences in gc
>>> content and mappability between the species.
>>>
>>> Gordon
>>>
>>>
>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>
>>> Dear Gordon thanks,
>>>
>>>>
>>>> Suppose I start with the following matrices:
>>>>
>>>> # 'counts' is the Rsem filtered counts
>>>>
>>>> counts[1:4,]
>>>>>
>>>>> h0 h1 h2 n0 n1 n2
>>>> ENSRNOG00000000021 36 17 20 10 25 38
>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>> ENSRNOG00000000028 26 12 11 18 23 28
>>>> ENSRNOG00000000029 22 13 12 16 17 15
>>>>
>>>> # 'geneLength' is the species-specific gene lengths, for species 'h' and
>>>> 'n':
>>>>
>>>> geneLength[1:3,]
>>>>>
>>>>> h0.length h1.length h2.length n0.length n1.length
>>>> n2.length
>>>> ENSRNOG00000000021 1200 1200 1200 1303 1303
>>>> 1303
>>>> ENSRNOG00000000024 1050 1050 1050 3080 3080
>>>> 3080
>>>> ENSRNOG00000000028 1047 1047 1047 1121 1121
>>>> 1121
>>>>
>>>>
>>>> does the following code look correct, and may allow allows across
>>>> species
>>>> analysis ?:
>>>> (technically it works, I checked)
>>>>
>>>> # quantile normalization: (as in here:
>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>> digital+gene+expression
>>>> )
>>>>
>>>> x1 = counts*1000/geneLength
>>>> library(limma)
>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>
>>>> ...
>>>>
>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>> fit <- glmFit(y,design,offset=offset)
>>>>
>>>> ...
>>>>
>>>>
>>>> Thanks in advance for any help..,
>>>> Assaf
>>>>
>>>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}
More information about the Bioconductor
mailing list