[BioC] report a possible bug of

Martin Morgan mtmorgan at fhcrc.org
Sat Mar 3 14:28:58 CET 2012


On 02/28/2012 08:27 AM, Martin Morgan wrote:
> On 02/28/2012 08:20 AM, wang peter wrote:
>> hi all: i used ShortRead to trim some sequences
>>
>> max.mismatchs<- 0.25*1:nchar(DNAString(PCR2rc))
>
> using 1:nchar(...) can be bad, e.g., when nchar() ==> 0
>
>  > 1:0
> [1] 1 0
>
> use seq_len instead, so
>
> 0.25 * seq_len(nchar(DNAString(PCR2rc)))
>
>> trimmedCoords<- trimLRPatterns(Rpattern = PCR2rc, subject =
>> sread(highQuaReads), max.Rmismatch= max.mismatchs,
>> with.Rindels=T,ranges=T)
>> trimmedReads<- narrow(highQuaReads, start=start(trimmedCoords),
>> end=end(trimmedCoords))
>>
>> but it appear some null lines, but it is the first time to happen in
>> my exprience
>>
>>
>> @HWI-ST132:506:D0CNUABXX:3:1101:4456:1871 1:N:0:TTCACA
>>
>> +
>>
>> @HWI-ST132:506:D0CNUABXX:3:1101:4331:1931 1:N:0:TTCACA
>> GGTGGCTGTAGTTTAGTGGTAAGAATTCTACG
>> +
>> ?<?DDD?D:<CDFFFG<FCFACGGIII at AEEH
>
> How can I reproduce this?

I took the bug to be that 0-width reads could be written but not read

 > fl = tempfile()
 > xx = ShortReadQ(DNAStringSet(""), FastqQuality(""))
 > writeFastq(xx, fl)
 > cat(paste(readLines(fl), collapse="\n"), "\n")
@

+

 > readFastq(fl)
Error: Input/Output
   file(s):
     /tmp/RtmpdORPSm/file2780493c
   message: unexpected empty line /tmp/RtmpdORPSm/file2780493c:1

This has been fixed in devel, version 1.13.14.

If you want to remove zero-width records before writing, then subset as

   xx[width(xx) != 0]

If the problem is with trimLRPatterns then please ask the question again 
with a simpler illustration.

Martin

>
>
>>
>>
>>> sessionInfo()
>> R version 2.14.1 (2011-12-22)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>>
>>
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list