[BioC] Rsubread : featureCounts PEReadsReordering=TRUE deletes unpaired reads?

Wei Shi shi at wehi.EDU.AU
Mon Nov 11 03:08:30 CET 2013


Yes, that should work. But the changes we plan to make to featureCounts do not need this separate bam file including unmapped reads, so it should be more efficient.

Wei
 
On Nov 11, 2013, at 12:23 PM, Ryan wrote:

> Ah, I see now. I am using tophat, which outputs the unmapped reads to a separate bam file. I guess I should merge the two bam files and then featureCounts should work as I expect?
> 
> On Sun Nov 10 16:59:32 2013, Wei Shi wrote:
>> Dear Ryan,
>> 
>> featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them.
>> 
>> Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it.
>> 
>> In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done.
>> 
>> Also note that this problem only occurs for paired-end data, but not for single-end read data.
>> 
>> Best regards,
>> Wei
>> 
>> On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote:
>> 
>>> Hello,
>>> 
>>> I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output:
>>> 
>>> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ...
>>>   Resort the input file ...
>>>   2718714 unpaired reads were ignored in reordering.
>>>   Total number of fragments is : 1653582
>>>   Number of successfully assigned fragments is : 1529491 (92.5%)
>>>   Running time : 3.39 minutes
>>> 
>>> In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them?
>>> 
>>> -Ryan Thompson
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the addressee.
>> You must not disclose, forward, print or use it without the permission of the sender.
>> ______________________________________________________________________

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list