[BioC] fasta biostrings bioconductor
Martin Morgan
mtmorgan at fhcrc.org
Fri Mar 28 17:56:14 CET 2014
On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote:
>
> I posted this same quandary on Biostars and stack overflow.
>
> I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error:
>
> Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), :
> key 112 (char 'p') not in lookup table
>
> My fasta file ("FileName.fa") is comprised of various length sequences, in the following format:
>
>> GeneNameOne
> CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA
>> GeneNameTwo
> CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC
> ...etc
>
> I performed 'grep p FileName.fa' in the Unix terminal, but I received no output.
you could try a divide-and-conquer approach, splitting the file into two and
read each and choose the half with a problem and continue. Please continue
reading below...
>
> Does anyone have an idea on what is going on?
>
> Thanks in advance.
>
> -- output of sessionInfo():
>
> Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), :
> key 112 (char 'p') not in lookup table
Rather than repeating the error without context, it is usually helpful to
cut-and-paste the relevant portions of the session that causes problems, e.g.,
> library(Biostrings)
> readLines("FileName.fa", 4) ## correct file?
[1] "> GeneNameOne"
[2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA"
[3] "> GeneNameTwo"
[4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC"
> readDNAStringSet("FileName.fa")
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),
: key 112 (char 'p') not in lookup table
The information being asked for here is the output of the command sessionInfo()
so that basic information about your system is available; here's mine,
> library(Biostrings)
> sessionInfo()
R version 3.0.2 Patched (2014-01-02 r64626)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] stats4_3.0.2
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list