[BioC] Efficiently running DEXSeq for Large Cohorts
Alejandro Reyes
alejandro.reyes at embl.de
Mon Jan 14 10:44:39 CET 2013
Sorry, there was an error in part of my code:
The "TRT" would be like this,
data(pasillaExons, package="pasilla")
pasillaExonsTRT <- estimateSizeFactors( pasillaExons )
pasillaExonsTRT <- estimateDispersionsTRT( pasillaExonsTRT )
pasillaExonsTRT <- fitDispersionFunction( pasillaExonsTRT )
pasillaExonsTRT <- testForDEUTRT( pasillaExonsTRT )
Bests,
Alejandro
> Dear Fong Chun Chan,
>
> Thank you for your interest in DEXSeq and sorry in advance for the
> long e-mail. We have also noticed that the computing time increases
> considerably when you have a large number of samples, conditions or
> number of exons of a gene. For users in these situations, we have
> implemented a variant of this functions (estimateDispersionsTRT and
> testForDEUTRT) in the most recent versions of DEXSeq in the svn.
>
> The difference relies on how the model matrix is prepared, in the
> "normal" functions, the model matrices used to fit the glms are
> prepared for each exon, such that each exon bin is treated
> individually, independently of which exon you are testing. For
> example, if you have a gene with 5 exons, when testing for exon E001,
> you would consider independently E002, E003, ... , E005 in the model.
>
> In the "TRT" implementation the same model matrix is used for all the
> exons. In the same example as before, you would consider E001 and the
> sum of all the rest exons of the same gene. This reduces the model and
> allows to use DEXSeq with a large number of samples. For more clarity,
> you could try to compare the normal model frame of a gene with the TRT
> model frame:
>
> data(pasillaExons, package="pasilla")
> modelFrameForGene(pasillaExons, "FBgn0000256")
> # vs
> modelFrameForTRT( pasillaExons )
>
> Using the same example, in the last model frame, "this" would be the
> "E001" and "others" would be the sum of E002 + E003 + ... + E005.
>
> This would be the "normal" DEXSeq analysis:
>
> pasillaExons <- estimateSizeFactors( pasillaExons )
> pasillaExons <- estimateDispersions( pasillaExons )
> pasillaExons <- fitDispersionFunction( pasillaExons )
> pasillaExons <- testForDEU( pasillaExons )
>
> This would be the "TRT",
>
> pasillaExonsTRT <- estimateSizeFactors( pasillaExons )
> pasillaExonsTRT <- estimateDispersionsTRT( pasillaExons )
> pasillaExonsTRT <- fitDispersionFunction( pasillaExons )
> pasillaExonsTRT <- testForDEUTRT( pasillaExons )
>
> And you can see that you get the same results:
>
> plot(fData(pasillaExons)$pvalue, fData(pasillaExonsTRT)$pvalue, log="xy")
>
> I have the "TRT" tried this for large cohorts with complex models and
> it works nicely and in reasonable computing times.
>
> Best regards,
> Alejandro Reyes
>
> ps. this changes need to be added to the vignette.
>
>
>> Hi all,
>>
>> I've been trying to get DEXSeq to run on a fairly large RNA-seq
>> cohort that
>> I have. To be specific, I have 89 samples and I am attempt to
>> generate DE
>> exon usage results on > 500,000 exons.
>>
>> I've followed the latest tutorial (1.5.6) on Bioconductor and it so far
>> I've had relatively no problems. It just the two steps that are
>> mentioned,
>> estimateDispersions and testForDEU, are taking a fairly long time. I've
>> already attempted to parallelize this on a 48-core 256GB machine, but
>> I get
>> very little progress on the run-time of these functions.
>>
>> I was just wondering if anyone has a good way of running DEXSeq on
>> such a
>> large cohort. Tips on how to reduce run time? Are there way to
>> parallelize
>> these jobs across a cluster rather than rely on a single machine with
>> multi-cores? Any help would be greatly appreciated.
>>
>> Thanks,
>>
>> Fong
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list