[BioC] Transformation of Grange object to density per bin
Martin Morgan
mtmorgan at fhcrc.org
Tue Jan 22 00:23:11 CET 2013
On 01/20/2013 12:45 PM, Hermann Norpois wrote:
> I am sorry, but I still have a problem. With your syntax I get a vector
> that is going over all chromosomes. How do I get , for instance separate
> vectors for each chromosomes?
I wrote Dario's original solution as:
Get seqlengths (probably these should have been added to testset.gr when it was
created, e.g., coverage(BamFile("myBam"))
library(BSgenome.Hsapiens.UCSC.hg19)
seqlengths(testset.gr) <- seqlengths(Hsapiens)[seqlevels(testset.gr)]
Calculate coverage
Gcoverage <- coverage(testset.gr)
Here's Dario's function, revised to take a width argument and for the first
argument, 'x', to be the coverage of a single chromosome
FUN <- function(x, y, width=200) {
starts <- seq(1, y, width)
mean(Views(x, start = starts, width=width), na.rm = TRUE)
}
Then the calculation over all chromosomes
coverageMeans <-
unlist(Map(FUN, Gcoverage, seqlengths(testset.gr)), use.names=FALSE)
And then to get separate vectors for each chromosome, it's just a matter of not
unlisting
binsByChr <- Map(FUN, Gcoverage, seqlengths(testset.gr))
with, e.g.,
binsByChr$chr1
But maybe I'm not understanding what you're after?
Martin
> Thanks
>
> 2013/1/17 Hermann Norpois <hnorpois at googlemail.com>
>
>> Hello,
>>
>> I would like to transform a GRange-object into something with density
>> information related of bins, e.g. 1 bin=200bps. So it starts at poistion 1
>> etc ...
>> That means (according to testset.gr):
>> bin 1 0
>> bin 2 0
>> ...
>> bin 50 0
>> bin 51 0.28 #10000-10200 => (200-144)/200
>> bin 52 0.765 #10200-10400 => 153/200
>> ...
>> etc
>> according to testset.gr (see below).
>>
>> How does it work? Is there a special function?
>> Thanks
>> Hermann
>>
>>
>>> testset.gr
>> GRanges with 20 ranges and 0 metadata columns:
>> seqnames ranges strand
>> <Rle> <IRanges> <Rle>
>> [1] chr1 [ 10144, 10353] *
>> [2] chr1 [ 10441, 10463] *
>> [3] chr1 [235633, 235766] *
>> [4] chr1 [237717, 237895] *
>> [5] chr1 [521444, 521624] *
>> [6] chr1 [540609, 540786] *
>> [7] chr1 [564495, 564672] *
>> [8] chr1 [565254, 566081] *
>> [9] chr1 [566537, 567272] *
>> ... ... ... ...
>> [12] chr1 [569610, 570190] *
>> [13] chr1 [601057, 601196] *
>> [14] chr1 [713227, 713475] *
>> [15] chr1 [752533, 752578] *
>> [16] chr1 [752654, 752695] *
>> [17] chr1 [754405, 754539] *
>> [18] chr1 [755336, 755918] *
>> [19] chr1 [756027, 756458] *
>> [20] chr1 [756793, 756843] *
>> ---
>> seqlengths:
>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX
>> chrY
>> NA NA NA NA NA NA ... NA NA NA NA
>> NA NA
>>> dput (testset.gr)
>> new("GRanges"
>> , seqnames = new("Rle"
>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>> "chr13",
>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>> , lengths = 20L
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , ranges = new("IRanges"
>> , start = c(10144L, 10441L, 235633L, 237717L, 521444L, 540609L,
>> 564495L,
>> 565254L, 566537L, 567450L, 568038L, 569610L, 601057L, 713227L,
>> 752533L, 752654L, 754405L, 755336L, 756027L, 756793L)
>> , width = c(210L, 23L, 134L, 179L, 181L, 178L, 178L, 828L, 736L, 453L,
>> 545L, 581L, 140L, 249L, 46L, 42L, 135L, 583L, 432L, 51L)
>> , NAMES = NULL
>> , elementType = "integer"
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , strand = new("Rle"
>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>> , lengths = 20L
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , elementMetadata = new("DataFrame"
>> , rownames = NULL
>> , nrows = 20L
>> , listData = structure(list(), .Names = character(0))
>> , elementType = "ANY"
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , seqinfo = new("Seqinfo"
>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>> "chr15",
>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>> "chrX", "chrY")
>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>> , genome = c(NA_character_, NA_character_, NA_character_,
>> NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>> )
>> )
>> , metadata = list()
>> )
>>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list