[BioC] easyRNASeq for miRNA read counts
Nicolas Delhomme
nicolas.delhomme at umu.se
Wed Nov 20 14:19:09 CET 2013
Hej Vicky!
I get it now. I’ve had a look at the file and that’s how it looks like:
2L . miRNA_primary_transcript 243035 243141 . - . ID=MI0005821;Alias=MI0005821;Name=dme-mir-965
2L . miRNA 243055 243076 . - . ID=MIMAT0005480;Alias=MIMAT0005480;Name=dme-miR-965-3p;Derives_from=MI0005821
2L . miRNA 243096 243118 . - . ID=MIMAT0020861;Alias=MIMAT0020861;Name=dme-miR-965-5p;Derives_from=MI0005821
What exactly are you interested in, the miRNA only or both miRNA and pri-miRNA? If you want both, it’s going to be more tricky as you will have multiple counting indeed - the pri-miRNA containing both miRNA.
The gff would anyway need reformatting. It should look like:
2L . exon 243055 243076 . - . ID=MIMAT0005480:1;Alias=MIMAT0005480;Name=dme-miR-965-3p;Parent=MI0005821
2L . exon 243096 243118 . - . ID=MIMAT0020861:1;Alias=MIMAT0020861;Name=dme-miR-965-5p;Parent=MI0005821
This can easily be done using the genomeIntervals package:
library(genomeIntervals)
gff <- readGff3(“dmel.gff3”)
## select only miRNA
gff <- gff[gff$type == “miRNA” ]
## change the third column
gff$type=“exon”
## add a “:exon.number” to the ID. exon.number is always 1 here as we do not expect several exon per miRNA
annotation(gff)$gffAttributes <- sub(“;Alias”,”:1;Alias”,annotation(gff)$gffAttributes)
## replace Derives_from with Parent
annotation(gff)$gffAttributes <- sub(“Derives_from”,”Parent”,annotation(gff)$gffAttributes)
## write it out
writeGff3(gff,"dmel-miRNA-for-easyRNASeq.gff3”)
I haven’t tested that code, so there may be typos. But if there are it should not be far off ;-)
Then you could call easyRNASeq with count=exons.
Before that, to test your modified gff file compatibility with easyRNASeq you could do:
easyRNASeq:::.getGffRange(filename="dmel-miRNA-for-easyRNASeq.gff3”) and that should return a RangedData object.
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Nathaniel Street Lab
Department of Plant Physiology
Umeå Plant Science Center
Tel: +46 90 786 7989
Email: nicolas.delhomme at plantphys.umu.se
SLU - Umeå universitet
Umeå S-901 87 Sweden
---------------------------------------------------------------
On 20 Nov 2013, at 01:00, Victoria Church <victoriaachurch at gmail.com> wrote:
> Hi,
>
> Thanks for your response. For the input, I have bam files of short reads aligned to the drosophila melanogaster genome (dm3) as well as bam files of short reads aligned to miRbase. Ideally, I would be able to use easyRNASeq to make Count Data Sets for the bamfiles aligned to miRbase with the miRbase annotation file for drosophila, (dme.gff3, found here: ftp://mirbase.smith.man.ac.uk/pub/mirbase/CURRENT/genomes). I was able to use it fine for the files aligned to the genome, but I have not quite figured out if it will be possible to reformat the gff file of miRbase to be recognized by easyRNASeq.
>
> In the case of "transcripts" if you define mono-exonic transcript, does that mean that you will only count each read once?
>
> Thank you,
> Vicky
>
>
> On Mon, Nov 18, 2013 at 6:49 AM, Nicolas Delhomme <delhomme at embl.de> wrote:
> Hej Vicky!
>
> The very short answer is that it depends on the content of your annotation. I can’t really tell you more without knowing more about the kind of data you want to provide as input to easyRNASeq, both for the read alignments and as annotation. In my view of things using “features” would be the more appropriate, but that’s just a name. If you define mono-exonic transcript and use “transcripts”, the results would be the same.
>
> Now, with regards to miRNA, I assume that easyRNASeq could get you a count table, but the seldom time I’ve worked with miRNA, I directly got an occurrence count table as output, e.g. form the UEA workbench. If you would detail ab it more your data processing, I could tell you more accurately if there would be any caveats in using easyRNASeq for that.
>
> Finally, I’ve never - so far - tried a differential analysis approach with miRNA data, and I would think that DESeq, DESeq2, edgeR, etc.. are valid tools for that but maybe others from the list can chime in?
>
> Cheers,
>
> Nico
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
>
> On 18 Nov 2013, at 05:12, Vicky Chu [guest] <guest at bioconductor.org> wrote:
>
> >
> > Hi,
> >
> > I am looking to use DESeq to analyze my 18-29 nt small RNA libraries. Is it appropriate to prepare a countDataSet that counts "transcripts" , or possibly "features" in this case? What do you recommend?
> >
> > Thank you!
> >
> > -- output of sessionInfo():
> >
> > sessionInfo()
> > R version 3.0.2 (2013-09-25)
> > Platform: x86_64-apple-darwin10.8.0 (64-bit)
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] parallel stats graphics grDevices utils datasets methods base
> >
> > other attached packages:
> > [1] easyRNASeq_1.8.1 ShortRead_1.20.0 Rsamtools_1.14.1 GenomicRanges_1.14.3
> > [5] DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biostrings_2.30.1
> > [9] XVector_0.2.0 IRanges_1.20.5 edgeR_3.4.0 limma_3.18.3
> > [13] biomaRt_2.18.0 Biobase_2.22.0 genomeIntervals_1.18.0 BiocGenerics_0.8.0
> > [17] intervals_0.14.0
> >
> > loaded via a namespace (and not attached):
> > [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 DBI_0.2-7
> > [5] genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 hwriter_1.3
> > [9] latticeExtra_0.6-26 LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1
> > [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4
> > [17] tools_3.0.2 XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.8.0
> >>
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
>
>
More information about the Bioconductor
mailing list