[BioC] Problem locating SNP by rsID for SNPlocs.Hsapiens.dbSNP.20120608 package Bioconductor x
Hervé Pagès
hpages at fhcrc.org
Thu Jan 17 21:53:51 CET 2013
Hi Christina,
On 01/16/2013 10:15 AM, Christina Chaivorapol wrote:
> Thanks for your help Tim and Herve.
>
> It would be very useful to include the SNPs that have a value for the
> chr-pos field even if they have more than 1 CTG line for my purposes
> since I deal with a lot of immune-related genes that tend to be
> difficult to map. Would it be possible to include these types of SNPs,
> but flag them as having more than 1 CTG line?
So I've included them in version 0.99.9 of SNPlocs.Hsapiens.dbSNP.20120608.
They're not flagged though. Note that there still is a *single*
location on the reference genome that is reported for those SNPs,,
because the other "locations" are reported as ? (question mark)
and it seems fair to not consider ? as a location.
With this new version of the package:
> library(SNPlocs.Hsapiens.dbSNP.20120608)
> sum(getSNPcount())
[1] 45697775
that is, 281064 more SNPs (i.e. 0.6%) compared to the previous version
(i.e. 0.99.8). rs7775397 is one of them now:
> rsidsToGRanges("rs7775397")
GRanges with 1 range and 2 metadata columns:
seqnames ranges strand | RefSNP_id alleles_as_ambig
<Rle> <IRanges> <Rle> | <character> <character>
[1] ch6 [32261252, 32261252] + | 7775397 K
---
seqlengths:
ch1 ch2 ch3 ch4 ... chX chY
chMT
249250621 243199373 198022430 191154276 ... 155270560 59373566
16569
SNPlocs.Hsapiens.dbSNP.20120608 version 0.99.9 will be available in
Bioc devel (requires devel version of R i.e. R 3.0) thru biocLite() in
about 45 min. Only the source package for now, which you should be
able to install on Windows or Mac with biocLite( , type="source").
Let me know if you have questions about this.
Cheers,
H.
>
> Thanks for your help,
> Christina
>
>
> On Tue, Jan 15, 2013 at 11:00 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Christina,
>
> According to the official announcement:
>
>
> http://www.ncbi.nlm.nih.gov/__mailman/pipermail/dbsnp-__announce/2012q2/000122.html
> <http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2012q2/000122.html>
>
> there are 53,558,214 rs ids in dbSNP 137 for Human.
>
> But in SNPlocs.Hsapiens.dbSNP.__20120608:
>
> > library(SNPlocs.Hsapiens.__dbSNP.20120608)
> > sum(getSNPcount())
> [1] 45416711
>
> As explained in ?SNPlocs.Hsapiens.dbSNP.__20120608, the package (like
> all other SNPlocs packages) was curated:
>
> SNPs from dbSNP were filtered to keep only those satisfying the 3
> following criteria:
>
> • The SNP is a single-base substitution i.e. its type is "snp".
> Other types used by dbSNP are: "in-del", "mixed",
> "microsatellite", "named-locus",
> "multinucleotide-polymorphism"__, etc... All those SNPs were
> dropped.
>
> • The SNP is marked as notwithdrawn.
>
> • A *single* location on the reference genome (GRCh37.p5) is
> reported for the SNP, and this location is on chromosomes
> 1-22, X, Y, MT.
>
> In the case of rs7775397, it was dropped because of this last reason.
> More precisely, the record in ds_flat_ch6.flat for this SNP contains
> the following CTG lines:
>
> CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 |
> ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+
> CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 |
> ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+
> CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 |
> ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+
> CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 |
> ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+
> CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 |
> ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+
> CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 |
> ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+
>
> That is, more than 1 CTG line corresponding to the reference assembly
> (GRCh37.p5). This is the reason why the SNP was dropped.
>
> I realize now that maybe I could keep those SNPs that have more than
> 1 CTG line corresponding to the reference assembly as long as exactly
> 1 of them actually provides a value for the chr-pos field. Would that
> be reasonable?
>
> Thanks,
> H.
>
>
>
> On 01/15/2013 05:19 PM, Christina Chaivorapol wrote:
>
> Hi,
>
> Has anyone ever had a case where a SNP was not found in
> SNPlocs.Hsapiens.dbSNP.
> 20120608, but is found in dbSNP 137? I am having this problem
> with SNP
> rs7775397.
>
> library(SNPlocs.Hsapiens.__dbSNP.20120608)
> rsidsToGRanges('rs7775397')
>
> Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397
>
> Thanks,
> Christina
>
> sessionInfo()
>
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] datasets utils grDevices graphics stats methods base
>
> other attached packages:
> [1] SNPlocs.Hsapiens.dbSNP.
> 20120608_0.99.8
> [2] BSgenome_1.26.1
> [3] Biostrings_2.26.2
> [4] GenomicRanges_1.10.5
> [5] IRanges_1.16.4
> [6] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] parallel_2.15.2 stats4_2.15.2
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>
> --
> Christina Chaivorapol, Ph.D.
> Genentech, Inc.
> Bioinformatics & Computational Biology
> phone: 650-225-6903
> chrichai at gene.com <mailto:chrichai at gene.com>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list