[BioC] Amplicon and exon level read counts and GC content
    Martin Morgan 
    mtmorgan at fhcrc.org
       
    Wed May 30 07:59:56 CEST 2012
    
    
  
On 05/29/2012 10:44 PM, Martin Morgan wrote:
> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>> Hi,
>>
>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>> sequencing, is there any package/function which can extract
>> amplicon-level read counts and GC content from an aligned file of BAM
>> format? The same question to exon-level read counts and GC content. More
>> generally, given a genomic interval, how could I extract the read count
>> and GC content for that interval?
>
> The Rsamtools package has scanBam for read input, also
> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
> for higher-level read counting. The requirement is generally a
> 'GRanges', which can be queried from TxDb packages (e.g.,
> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
> rtracklayer, and Rsamtools packages; see
>
> http://bioconductor.org/packages/release/BiocViews.html#___Software
More specifically, after
   library(Rsamtools)
   example(scanBam)  # defines 'fl', a path to a bam file
for a _single_ genomic range
   param = ScanBamParam(what="seq",
                        which=GRanges("seq1", IRanges(100, 500)))
   dna = scanBam(fl, param=param)[[1]][["seq"]]
   length(dna)  # 365 reads overlap region
   alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
though you'd likely want to specify several regions (vector arguments to 
GRanges) and think about flags (scanBamFlag() and the flag argument to 
ScanBamParam), read mapping quality, reads overlapping more than one 
region, etc. (summarizeOverlaps implements several counting strategies, 
but it is 'easy' to implement arbitrary approaches).
>
> Martin
>
>>
>> Thanks for any input!
>>
>> Best,
>> Yu Chuan
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
    
    
More information about the Bioconductor
mailing list