[BioC] [devteam-bioc] Can you give me a sample to calculate the Read Depth with Rsamtools?

Renjie Tan renjie.tan at duke.edu
Tue Jan 22 16:54:55 CET 2013


Hi Martin,

Thank you for your answer. I still have some questions about that.

1, the seqinfo() only result to 86 sequences for a really bam file. So I know the 86 sequence is not the read sequence. What the 86 sequences are? And where are the reads?
2, " cvg <- coverage(bf)" this step really time costly. It needs more than 10 mins for a really 5G bam file with a cluster. If I just want to calculate some parts of region's RD. should us read the whole bam file into RAM? If so how can I parse hundreds of bam samples?
3, the Result's output format is like this? What's the " [ 9  9  8  9  9  8  8  8  7  7  8  8  9 10  7  6  6 ...]"parts mean? Which number is the RD for this region?
     	start  end width
	[1]   100  299   200 [ 9  9  8  9  9  8  8  8  7  7  8  8  9 10  7  6  6 ...]
	[2]  1000 1199   200 [35 36 38 38 39 40 40 39 39 39 40 39 40 43 43 45 44 ...]

4, the output format I expect is look like GATK's form. Such as 
	Target		total_coverage
	1:14467-14587		261
	1:14639-14883		1974
	1:14943-15064		3412
	1:15671-15990		374
	1:16591-16719		0

I appreciate your kindly help.

Thanks,
Renjie


-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Monday, January 21, 2013 6:45 PM
To: Renjie Tan
Cc: Maintainer; bioconductor at r-project.org
Subject: Re: [devteam-bioc] Can you give me a sample to calculate the Read Depth with Rsamtools?

Hi Renjie -- it's better to ask these questions on the Bioconductor mailing list, e.g., through the 'guest posting' facility if you don't want to subscribe

   http://bioconductor.org/help/mailing-list/mailform/

That way everyone benefits from both questions and answers. For a reproducible example I ran

   library(Rsamtools)
   example(scanBam)

This creates a variable 'fl' that points to a small bam file. I made an instance of the BamFile class, then discovered the information about chromosomes in the file

   bf <- BamFile(fl)
   si <- seqinfo(bf)

which told me that I have two short sequences

 > si
Seqinfo of length 2
seqnames seqlengths isCircular genome
seq1           1575         NA   <NA>
seq2           1584         NA   <NA>

Suppose some 'regions of interest' were on 'seq1', e.g., windows of width 200, starting at position 100 and 1000.

   roi <- GRanges("seq1", IRanges(c(100, 1000), width=200), seqinfo=si)

I could calculate coverage in my BAM file, just over my regions of interest

   cvg <- coverage(bf)

and then create 'Views' on my regions of interest

   v <- Map(Views, cvg, ranges(split(roi, seqnames(roi))))

 > v
$seq1
Views on a 1575-length Rle subject

views:
     start  end width
[1]   100  299   200 [ 9  9  8  9  9  8  8  8  7  7  8  8  9 10  7  6  6 ...]
[2]  1000 1199   200 [35 36 38 38 39 40 40 39 39 39 40 39 40 43 43 45 44 ...]

$seq2
Views on a 1584-length Rle subject

views: NONE

You could then access elements of this as, e.g., v$seq1[1] so for a playful animation

   for (i in seq_along(v$seq1)) {
      plot(v$seq1[[i]], type="l")
      Sys.sleep(2)
   }

If using the 'devel' (to be 3.0) version of R, coverage(bf) takes an argument
param=ScanBamParam(which=roi) which would just calculate coverage of (reads that
overlap) your regions of interest; this would obviously save a lot of computation if you had just a few regions, or regions on a single chromosome.

Martin

On 01/21/2013 01:29 PM, Maintainer wrote:
> Hi Martin,
>
> I am a new beginner Rsamtools user. I know that's really a powerful 
> tool. Can you give me a sample to calculate the Read Depth with Rsamtools?
>
> My input is a bam file(s) and a series regions like:
>
> Y:2595298-2595418
>
> Y:2601341-2601461
>
> Y:2606003-2606363
>
> ........................
>
> I need to calculate the *Read depth* for every regions above. Not the read count.
>
> Thanks,
>
> Renjie
>
>
>
> ______________________________________________________________________
> __
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to 
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at 
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list