[BioC] GENEID is missing when LOCATION is non-intergenic in VariantAnnotation package

Adaikalavan Ramasamy adaikalavan.ramasamy at gmail.com
Tue Feb 19 15:41:20 CET 2013


Dear all,

I am finding some unexpected results (to me anyway) with the
VariantAnnotation package. Basically, there are situations where the
GENEID is missing when LOCATION is either coding, promoter, intron,
threeUTR or fiveUTR. Here is an example with five SNPs (among many
more). I have marked the unexpected results with "##".


library(VariantAnnotation); library(TxDb.Hsapiens.UCSC.hg19.knownGene)

tmp <- rbind.data.frame(c("rs10917388",  "chr1",  23803138),
                                  c("rs1063412",   "chr1", 172410967),
                                  c("rs78291220",  "chr2",  60890373),
                                  c("rs116917239", "chr17", 44061025),
                                  c("rs11593",     "chrX",  153627145) )
colnames(tmp) <- c("rsid", "chr", "pos")
tmp$pos <- as.numeric( as.character(tmp$pos) )

target <- with(tmp, GRanges(seqnames = Rle(chr),
                                          ranges   = IRanges(pos,
end=pos, names=rsid),
                                          strand   = Rle(strand("*")) ) )

loc <- locateVariants(target, TxDb.Hsapiens.UCSC.hg19.knownGene, AllVariants())
names(loc) <- NULL
out <- as.data.frame(loc)
out$rsid <- names(target)[ out$QUERYID ]
out <- out[ , c("rsid", "seqnames", "start", "LOCATION", "GENEID",
"PRECEDEID", "FOLLOWID")]
out <- unique(out)
rownames(out) <- NULL
out

           rsid  seqnames     start   LOCATION GENEID PRECEDEID FOLLOWID
1   rs10917388     chr1  23803138     intron  55616      <NA>     <NA>
2   rs10917388     chr1  23803138   promoter   <NA>      <NA>     <NA> ##

3    rs1063412     chr1 172410967     intron  92346      <NA>     <NA>
4    rs1063412     chr1 172410967     intron   5279      <NA>     <NA>
5    rs1063412     chr1 172410967     coding   5279      <NA>     <NA>
6    rs1063412     chr1 172410967     coding   <NA>      <NA>     <NA> ##

7   rs78291220     chr2  60890373   promoter   <NA>      <NA>     <NA> ##
8   rs78291220     chr2  60890373 intergenic   <NA>     64895   400957

9  rs116917239    chr17  44061025     coding   4137      <NA>     <NA>
10 rs116917239    chr17  44061025     intron   4137      <NA>     <NA>
11 rs116917239    chr17  44061025     coding   <NA>      <NA>     <NA> ##

12     rs11593     chrX 153627145     intron   6134      <NA>     <NA>
13     rs11593     chrX 153627145   promoter   6134      <NA>     <NA>
14     rs11593     chrX 153627145   promoter  26778      <NA>     <NA>
15     rs11593     chrX 153627145   promoter   <NA>      <NA>     <NA> ##
16     rs11593     chrX 153627145    fiveUTR    <NA>      <NA>     <NA> ##
17     rs11593     chrX 153627145   threeUTR   <NA>      <NA>     <NA> ##

Can anyone help explain what is happening please? Is this to be
expected? Thank you.

Regards, Adai



More information about the Bioconductor mailing list