[BioC] SNPlocs -> VCF or similar
Hervé Pagès
hpages at fhcrc.org
Wed Jan 30 21:52:44 CET 2013
Hi Michael,
On 01/30/2013 11:21 AM, Michael Lawrence wrote:
> Hi,
>
> Is there any easy way to convert the output of getSNPlocs(), i.e., a
> GRanges with ambiguity codes, to something more like a VCF? Or would it be
> better to just access the dbSNP VCF file?
>
> I've made a function that does the above (shown below), but it would be
> nice to have a built-in path.
>
> stripRefSNPs <- function(x) {
> x[x$alt != getSeq(Hsapiens, x, as.character = TRUE)]
> }
> explodeSNPAlleles <- function(x) {
> alleles <- strsplit(IUPAC_CODE_MAP[x$alleles_as_ambig], NULL)
> x <- x[rep(seq_len(length(x)), elementLengths(alleles))]
> x$alleles_as_ambig <- NULL
> x$alt <- unlist(alleles)
> stripRefSNPs(x)
> }
A nice shortcut would be to be able to call the VCF() constructor
on the GRanges object returned by getSNPlocs():
library(SNPlocs.Hsapiens.dbSNP.20120608)
chr1_snps <- getSNPlocs("ch1", as.GRanges=TRUE)
library(VariantAnnotation)
chr1_vcf <- VCF(chr1_snps)
Actually it works:
> chr1_vcf
class: CollapsedVCF
dim: 3517088 0
rowData(vcf):
GRanges with 2 metadata columns: RefSNP_id, alleles_as_ambig
info(vcf):
DataFrame with 0 columns:
geno(vcf):
SimpleList of length 0:
and seems to produce a valid VCF object, although I doubt this
object contains the information normally expected to be found in
VCF objects.
Note that the VCF() constructor also works on the "exploded" GRanges
object returned by your code:
> exploded_chr1_snps <- explodeSNPAlleles(chr1_snps)
> VCF(exploded_chr1_snps, collapsed=FALSE)
class: ExpandedVCF
dim: 3537354 0
rowData(vcf):
GRanges with 2 metadata columns: RefSNP_id, alt
info(vcf):
DataFrame with 0 columns:
geno(vcf):
SimpleList of length 0:
but like previously the information is probably not stored in the
expected way either.
Cheers,
H.
>
> Thanks,
> Michael
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list