[BioC] [Bioc-sig-seq] first remove low quality or adaptor

Hervé Pagès hpages at fhcrc.org
Tue Sep 27 02:06:17 CEST 2011


Thanks for sending a reproducible example!

This bug should be fixed in Biostrings release (2.20.4) and devel
(2.21.10). The new versions of the package will normally propagate
to the public repositories and become available via biocLite() in
the next 24-36 hours.

Cheers,
H.


On 11-09-23 11:35 AM, wang peter wrote:
>
> dear herve:
>          i send you the coding for the problem
> rm(list=ls())
> fastqfile="query.fastq" #downloaded from
> http://biocluster.ucr.edu/~tbackman/query.fastq
> ####################################
> ##Trimming Low Quality nucleotides##
> ####################################
> library(ShortRead)
> reads <- readFastq(fastqfile);
> seqs <- sread(reads);
> #the low quality positions below the qualityCutoff will be replaced by "N"
> qualityCutoff <- 20
> qual <- PhredQuality(quality(quality(reads))) # get quality score list
> as PhredQuality
> myqual_mat <- matrix(charToRaw(as.character(unlist(qual))),
> nrow=length(qual), byrow=TRUE) # convert quality score to matrix
> at <- myqual_mat <
> charToRaw(as.character(PhredQuality(as.integer(qualityCutoff))))
> letter_subject <- DNAString(paste(rep.int <http://rep.int>("N",
> width(seqs)[1]), collapse=""))
> letter <- as(Views(letter_subject, start=1, end=rowSums(at)),
> "DNAStringSet")
> injectedseqs <- replaceLetterAt(seqs, at, letter)
> #remove "N" on 5' and 3' end, see what is left in the middle of reads
> seqsWithoutNend <-trimLRPatterns(Rpattern = letter_subject, Lpattern =
> letter_subject,subject = injectedseqs)
> nCount<-alphabetFrequency(seqsWithoutNend)[,"N"]
> nDist<- table(nCount)
> #filter all the reads with more than 2 "N" in the middle
> nCutoff=2
> middleN <- nCount < nCutoff
> reads<-reads[middleN]
> trimmedCoords<-trimLRPatterns(Rpattern = letter_subject, Lpattern =
> letter_subject,subject = injectedseqs,ranges=T)
> trimmedCoords<-trimmedCoords[middleN]
> trimmedReads <- narrow(reads, start=start(trimmedCoords),
> end=end(trimmedCoords))
> PCR2rc<-DNAString("AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA")#this
> should be clipped
> clippedCoords <- trimLRPatterns(Rpattern = PCR2rc, subject =
> sread(trimmedReads), max.Rmismatch=0.1, with.Rindels=T,ranges=T)
> clippedReads <- narrow(trimmedReads, start=start(clippedCoords),
> end=end(clippedCoords))
> that is the result:
>   in solveUserSEW(width(x), start = start, end = end, width = width) :
>    solving row 686: 'allow.nonnarrowing' is FALSE and the supplied start
> (0) is < 1


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list