[BioC] [devteam-bioc] Can you give me a sample to calculate the Read Depth with Rsamtools?
Renjie Tan
renjie.tan at duke.edu
Tue Jan 22 16:54:55 CET 2013
Hi Martin,
Thank you for your answer. I still have some questions about that.
1, the seqinfo() only result to 86 sequences for a really bam file. So I know the 86 sequence is not the read sequence. What the 86 sequences are? And where are the reads?
2, " cvg <- coverage(bf)" this step really time costly. It needs more than 10 mins for a really 5G bam file with a cluster. If I just want to calculate some parts of region's RD. should us read the whole bam file into RAM? If so how can I parse hundreds of bam samples?
3, the Result's output format is like this? What's the " [ 9 9 8 9 9 8 8 8 7 7 8 8 9 10 7 6 6 ...]"parts mean? Which number is the RD for this region?
start end width
[1] 100 299 200 [ 9 9 8 9 9 8 8 8 7 7 8 8 9 10 7 6 6 ...]
[2] 1000 1199 200 [35 36 38 38 39 40 40 39 39 39 40 39 40 43 43 45 44 ...]
4, the output format I expect is look like GATK's form. Such as
Target total_coverage
1:14467-14587 261
1:14639-14883 1974
1:14943-15064 3412
1:15671-15990 374
1:16591-16719 0
I appreciate your kindly help.
Thanks,
Renjie
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
Sent: Monday, January 21, 2013 6:45 PM
To: Renjie Tan
Cc: Maintainer; bioconductor at r-project.org
Subject: Re: [devteam-bioc] Can you give me a sample to calculate the Read Depth with Rsamtools?
Hi Renjie -- it's better to ask these questions on the Bioconductor mailing list, e.g., through the 'guest posting' facility if you don't want to subscribe
http://bioconductor.org/help/mailing-list/mailform/
That way everyone benefits from both questions and answers. For a reproducible example I ran
library(Rsamtools)
example(scanBam)
This creates a variable 'fl' that points to a small bam file. I made an instance of the BamFile class, then discovered the information about chromosomes in the file
bf <- BamFile(fl)
si <- seqinfo(bf)
which told me that I have two short sequences
> si
Seqinfo of length 2
seqnames seqlengths isCircular genome
seq1 1575 NA <NA>
seq2 1584 NA <NA>
Suppose some 'regions of interest' were on 'seq1', e.g., windows of width 200, starting at position 100 and 1000.
roi <- GRanges("seq1", IRanges(c(100, 1000), width=200), seqinfo=si)
I could calculate coverage in my BAM file, just over my regions of interest
cvg <- coverage(bf)
and then create 'Views' on my regions of interest
v <- Map(Views, cvg, ranges(split(roi, seqnames(roi))))
> v
$seq1
Views on a 1575-length Rle subject
views:
start end width
[1] 100 299 200 [ 9 9 8 9 9 8 8 8 7 7 8 8 9 10 7 6 6 ...]
[2] 1000 1199 200 [35 36 38 38 39 40 40 39 39 39 40 39 40 43 43 45 44 ...]
$seq2
Views on a 1584-length Rle subject
views: NONE
You could then access elements of this as, e.g., v$seq1[1] so for a playful animation
for (i in seq_along(v$seq1)) {
plot(v$seq1[[i]], type="l")
Sys.sleep(2)
}
If using the 'devel' (to be 3.0) version of R, coverage(bf) takes an argument
param=ScanBamParam(which=roi) which would just calculate coverage of (reads that
overlap) your regions of interest; this would obviously save a lot of computation if you had just a few regions, or regions on a single chromosome.
Martin
On 01/21/2013 01:29 PM, Maintainer wrote:
> Hi Martin,
>
> I am a new beginner Rsamtools user. I know that's really a powerful
> tool. Can you give me a sample to calculate the Read Depth with Rsamtools?
>
> My input is a bam file(s) and a series regions like:
>
> Y:2595298-2595418
>
> Y:2601341-2601461
>
> Y:2606003-2606363
>
> ........................
>
> I need to calculate the *Read depth* for every regions above. Not the read count.
>
> Thanks,
>
> Renjie
>
>
>
> ______________________________________________________________________
> __
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list