[BioC] HTSeq-Count

Yuan Hao yuan.x.hao at gmail.com
Fri Sep 5 19:44:43 CEST 2014


Hi Julia,

Depending on questions in hand, there are good reasons to consider only uniquely mappable reads or being serious about multiple mappings (such as studying TEs or facing repetitive genomes). In terms of traditional RNASeq data, however, most time I ran into situation that multiple reads contributed to a substential part and personally I believe including the multiple reads should benefit the accurate expression estimation. 

Cheers,
Yuan


On Sep 5, 2014, at 11:48 AM, Pickl, Julia <j.pickl at dkfz-heidelberg.de> wrote:

> Hi Yuan,
>  
> thank you very much for your reply.
>  
> Would you say that I should take into account multiple mapped reads as these are more than unique reads or do you think the high  number of counts for _alignment_not_unique are not a problem per se?
>  
> I am a beginner to this field, so I would be happy if you could share your experience with me.
>  
> Best wishes
> Julia
>  
> Von: Yuan Hao [mailto:yuan.x.hao at gmail.com] 
> Gesendet: Freitag, 5. September 2014 17:18
> An: Julia [guest]
> Cc: bioconductor at r-project.org; Pickl, Julia
> Betreff: Re: [BioC] HTSeq-Count
>  
> Hi Julia,
> 
> You obviously allowed multiple mappings when calling STAR, however, HTSeq counts only uniquely mapped reads. ~1.5M reads mapped to intergenic and/or intronic regions which contributed to “_no_feature”. “_ambiguous” mapped to regions annotated by multiple genes. 
> 
> If you want to take into account multiple mapped reads, you can either evenly distribute them to all gene targets (normalized by times of mapping), distribute them according to uniquely mapped reads, or distribute them more sophistically by doing a multiple-run EM algorithm (such as the RSEM does). 
> 
> Cheers,
> Yuan
> On Sep 5, 2014, at 10:41 AM, Julia [guest] <guest at bioconductor.org> wrote:
> 
> 
> Hi all,
> I am new to the field of seq and performed a RIP-Seq experiment using HTSeq count as counter.
> I get now the following (using union, but doesn´t look better for interesection_strict):
> __no_feature  1503377
> __ambiguous  490772
> __too_low_aQual       0
> __not_aligned 0
> __alignment_not_unique       5277314
>             
> When I sum up counts for all genes, I get 3227845.
> 
> The number for __no_feature, __ambiguous, __alignment_not_unique look very high.
> 
> Does somebody have an idea for that?
> 
> (Additional info: We did random priming and mapped with STAR and masked rRNA loci)
> 
> Best wishes
> Julia
> 
> -- output of sessionInfo(): 
> 
> .
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list