[BioC] possible bug in extendReads() in package chipseq
Martin Morgan
mtmorgan at fhcrc.org
Sat Nov 21 17:54:04 CET 2009
Hi Nora --
Nora Rieber <n.rieber at dkfz-heidelberg.de> writes:
> Dear all,
>
> I'm using the ShortRead package to read in my MAQ alignment output files
> with readAligned. Then I extract a subset of reads mapping to one
> chromosome:
>
> aln_Pat_chr<-aln_Pat[chromosome(aln_Pat)==chromosomeString]
>
> and extend the reads with extendReads:
>
> aln_Pat_chr_extended<-extendReads(aln_Pat_chr, seqLen=350)
>
> However, extendReads seems to misplace the reads. Here's the output of
> aln_Pat_chr:
>
>>aln_Pat_chr
> class: AlignedRead
> length: 1025425 reads; width: 36 cycles
> chromosome: gi|51511732|ref|NC_000016.8|NC_000016
> gi|51511732|ref|NC_000016.8|NC_000016 ...
> gi|51511732|ref|NC_000016.8|NC_000016 gi|51511732|ref|NC_000016.8|NC_000016
> position: 553 553 ... 88698835 88762373
> strand: - - ... + -
> alignQuality: IntegerQuality
> alignData varLabels: nMismatchBestHit mismatchQuality nExactMatch24
> nOneMismatch24
>
> and the output of aln_Pat_chr_extended:
>
>> aln_Pat_chr_extended
> $`gi|51511732|ref|NC_000016.8|NC_000016`
> IRanges of length 1025426
> start end width
> [1] 553 902 350
> [2] 239 588 350
> [3] 239 588 350
> [4] 239 588 350
> [5] 239 588 350
> [6] 239 588 350
> [7] 239 588 350
> [8] 239 588 350
> [9] 239 588 350
> ... ... ... ...
> [1025418] 239 588 350
> [1025419] 239 588 350
> [1025420] 239 588 350
> [1025421] 239 588 350
> [1025422] 239 588 350
> [1025423] 239 588 350
> [1025424] 239 588 350
> [1025425] 239 588 350
> [1025426] 239 588 350
>
> While the highest mapped position is 88762373 before extension, it is
> 902 after extension. This happens with any dataset I use the function
> with. Does anybody observe the same thing?
>
> I tried to recreate this using the maq alignment data coming with the
> ShortRead package, but using extendReads on it throws an error:
>
>>testaln<-readAligned("~/R/2.10/ShortRead/extdata/maq","out.aln.2.txt",type="MAQMapview")
>>readsextended<-extendReads(testaln,seqLen=40)
> Error in s1[[ineg]] : attempt to select less than one element.
>
> Am I doing something wrong?
There do seem to be bugs in extendReads applied to AligendRead
objects. A work around (the package is being revised; look for version
0.2.1 in the next few days) is
extendAlignedRead <-
function(reads, seqLen=200, strand=c("+", "-"))
{
rng <- IRanges(ifelse(strand(reads) == "+",
position(reads),
position(reads) + width(reads) - seqLen),
ifelse(strand(reads) == "+",
position(reads) + seqLen - 1L,
position(reads) + width(reads) - 1L))
strandIdx <- strand(reads) %in% strand
split(rng[strandIdx], chromosome(reads)[strandIdx])
}
Martin
> This is my session info:
>
> sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] chipseq_0.2.0 ShortRead_1.4.0 lattice_0.17-26 BSgenome_1.14.1
> [5] Biostrings_2.14.5 IRanges_1.4.6
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.0 grid_2.10.0 hwriter_1.1 tools_2.10.0
>
> Best,
> Nora
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list