[BioC] stranded intronic variants with VariantAnnotation::locateVariants()
Valerie Obenchain
vobencha at fhcrc.org
Wed Nov 6 01:15:23 CET 2013
This is implemented in v 1.9.7. locateVariants() now returns the strand
of the subject that was hit except for IntergenicVariants().
The intergenic case returns multiple precede and follow gene id's. When
'ignore.strand=TRUE' genes on both strands are searched and the result
can be a mixture of '+' and '-'. For this case the strand returned is
'*'. When 'ignore.strand=FALSE' only genes on the same strand as the
'query' are searched so the return strand matches the query.
Valerie
On 10/18/2013 02:41 PM, Robert Castelo wrote:
> Great! thanks a lot Valerie!!
>
> robert.
>
> On 10/18/13 10:19 PM, Valerie Obenchain wrote:
>> Hi Robert,
>>
>> Yes, I can add that. I'll let you know when it's done.
>>
>> Valerie
>>
>> On 10/17/2013 04:01 AM, Robert Castelo wrote:
>>> hi,
>>>
>>> i have the following feature request for the VariantAnnotation package.
>>>
>>> currently, the function predictCoding() annotates the strand of variants
>>> within exons according to a given gene annotation. would it be possible
>>> that the function locateVariants() in the VariantAnnotation package
>>> annotates the strand for intronic variants?
>>>
>>> introns are non-coding, and therefore, not annotated with
>>> predictCoding(), but are stranded (GT-AG).
>>>
>>> here goes a code snippet that illustrates what i'm talking about
>>> (adapted from the vignette):
>>>
>>> =================
>>> library(VariantAnnotation)
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>
>>> fl <- system.file("extdata", "chr22.vcf.gz",
>>> package="VariantAnnotation")
>>> vcf <- readVcf(fl, "hg19")
>>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>> seqlevels(vcf) <- "chr22"
>>> rd <- rowData(vcf)
>>> loc <- locateVariants(rd, txdb, IntronVariants())
>>>
>>> head(loc, n=3)
>>> GRanges with 3 ranges and 7 metadata columns:
>>> seqnames ranges strand | LOCATION QUERYID
>>> TXID CDSID GENEID
>>> <Rle> <IRanges> <Rle> | <factor> <integer>
>>> <integer> <integer> <character>
>>> [1] chr22 [50300078, 50300078] * | intron 1
>>> 75253 <NA> 79087
>>> [2] chr22 [50300086, 50300086] * | intron 2
>>> 75253 <NA> 79087
>>> [3] chr22 [50300101, 50300101] * | intron 3
>>> 75253 <NA> 79087
>>> PRECEDEID FOLLOWID
>>> <CharacterList> <CharacterList>
>>> [1]
>>> [2]
>>> [3]
>>> ---
>>> seqlengths:
>>> chr22
>>> NA
>>> =================
>>>
>>> i.e., the strand column is set to * for the intronic variants. it's ok
>>> if this new feature would be added to the devel version, as happens
>>> normally with new features.
>>>
>>>
>>> thanks!
>>> robert.
>>> ps: sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
>>> [2] GenomicFeatures_1.14.0
>>> [3] AnnotationDbi_1.24.0
>>> [4] Biobase_2.22.0
>>> [5] VariantAnnotation_1.8.0
>>> [6] Rsamtools_1.14.1
>>> [7] Biostrings_2.30.0
>>> [8] GenomicRanges_1.14.1
>>> [9] XVector_0.2.0
>>> [10] IRanges_1.20.0
>>> [11] BiocGenerics_0.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
>>> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0
>>> stats4_3.0.2
>>> [9] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.8.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
More information about the Bioconductor
mailing list