[BioC] Rsamtools: sortBam() and 'samtools sort' does not give the same results

Henrik Bengtsson hb at biostat.ucsf.edu
Thu Aug 22 23:13:11 CEST 2013


Hi,

I've got a BAM file 'foo.bam' [425,850,792 bytes] that I'm trying to
sort.  It was generated using BWA.  I fail to sort it with
Rsamtools::sortBam() [Rsamtools v1.12.4] whereas with 'samtools sort'
[samtools v0.1.18 (r982:295)] it works.

BAM FILE:
% samtools view -H foo0.bam | head
@HD     VN:1.3  SO:coordinate
@SQ     SN:1    LN:249250621
@SQ     SN:2    LN:243199373
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431

The following:
> Rsamtools::sortBam("foo.bam", "foo.sort.byR")

outputs file 'foo.sort.byR.bam' [425,293,387 bytes], but when trying
to generate an index, both indexBam() and 'samtools index' throws an
error:

> Rsamtools::indexBam("foo.sort.byR.bam")
Error in FUN("foo.sort.byR.bam"[[1L]], ...) : failed to build index
  file: foo.sort.byR.bam
In addition: Warning messages:
1: In FUN("foo.sort.byR.bam"[[1L]], ...) :
  [bam_index_core] the alignment is not sorted: reads without
coordinates prior to reads with coordinates.
2: In FUN("foo.sort.byR.bam"[[1L]], ...) :
  [bam_index_build2] fail to index the BAM file.

and

% samtools index foo.sort.byR.bam
[bam_index_core] the alignment is not sorted: reads without
coordinates prior to reads with coordinates.
[bam_index_build2] fail to index the BAM file.

Trying to call sortBam() multiple times changes nothing.  However, if
I use 'samtools sort' to sort the BAM file, it all works:

% samtools sort foo.bam foo.sort
  [bam_sort_core] merging from 3 files...

outputs file 'foo.sort.bam' [425,254,135 bytes]. With this file both

% samtools index foo.sort.bam

and

> Rsamtools::indexBam("foo.sort.bam")

work.  They both generate identical file 'foo.sort.bam.bai' [8,274,272 bytes].

I've tried also with a toy sample BAM file 'longreads.bam' [1,923,379
bytes] and there I don't get any errors (although Rsamtools and
'samtools sort' do output differently sized *.sort.bam files).

I've never had this issue before. Comments?

/Henrik

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Rsamtools_1.12.4     Biostrings_2.28.0    GenomicRanges_1.12.4
[4] IRanges_1.18.2       BiocGenerics_0.6.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.0.1   tools_3.0.1    zlibbioc_1.6.0



More information about the Bioconductor mailing list