[R] Problem with number characters
Gabor Grothendieck
ggrothendieck at myway.com
Fri Oct 15 16:47:25 CEST 2004
Note that there are also regexp classes that define certain character
sets, most notably [:graph:] , which can make it easy to create
appropriate regexps. More is in ?regex .
Martin Maechler <maechler <at> stat.math.ethz.ch> writes:
:
: >>>>> "Spencer" == Spencer Graves <spencer.graves <at> pdf.com>
: >>>>> on Thu, 14 Oct 2004 13:41:24 -0700 writes:
:
: Spencer> It looks like you have several non-printing
: Spencer> characters. "nchar" will give you the total number
: Spencer> of characters in each character string.
:
: Spencer> "strsplit" can break character strings into single
: Spencer> characters, and "%in%" can be used to classify
: Spencer> them.
:
: and you give nice coding examples:
:
: Spencer> Consider the following:
: >> x <- "Draszt 0%/1ÃÂÃÂ?iso8859-15ó"
: >> nx <- nchar(x)
: >> x. <- strsplit(x, "")
: >> length(x.[[1]])
: Spencer> [1] 29
: >>
: >> namechars <- c(letters, LETTERS, as.character(0:9), ".")
:
: just to be precise: If 'namechars' is supposed to mean
: ``characters valid in R object names'', then you should have
: added "_" as well:
:
: namechars <- c(letters, LETTERS, as.character(0:9), ".", "_")
:
: >> punctuation <- c(",", "!", "+", "*", "&", "|")
: >> legalchars <- c(namechars, punctuation)
:
: and 'legalchars' would have to contain quite a bit more I
: presume, e.g. "$", " <at> ", ....
: (but that wouldn't have been a reason to write this e-mail..)
:
: >> legalx <- lapply(x., function(y)(y %in% legalchars))
: >> x.[[1]][!legalx[[1]]]
: Spencer> [1] " " "" "%" "/" "Ã" "Â" "Ã" "Â?" "-" "" "Ã" "³"
: >>
: >> sapply(legalx, sum)
: Spencer> [1] 17
:
: Spencer> Will this give you ideas about what to do what you want?
: Spencer> hope this helps. spencer graves
:
: (and this too)
:
: Martin Maechler, ETH Zurich
:
:
: Spencer> Gabor Grothendieck wrote:
:
: >> Assuming that the problem is that your input file has
: >> additional embedded characters added by the data base
: >> program you could try extracting just the text using
: >> the UNIX strings program:
: >>
: >> strings myfile.csv > myfile.txt
: >>
: >> and see if myfile.txt works with R and if not check out
: >> what the differences are between it and the .csv file.
: >>
: >> Date: Thu, 14 Oct 2004 11:31:33 -0700
: >> From: Scott Waichler <scott.waichler <at> pnl.gov>
: >> To: <r-help <at> stat.math.ethz.ch>
: >> Subject: [R] Problem with number characters
: >>
: >>
: >> I am trying to process text fields scanned in from a csv file that is
: >> output from the Windows database program FileMakerPro. The characters
: >> onscreen look like regular text, but R does not like their underlying
binary form.
: >> For example, one of text fields contains a name and a number, but
: >> R recognizes the number as something other than what it appears
: >> to be in plain text. The character string "Draszt 03" after being
: >> read into R using scan and ="" becomes "Draszt 03" where the 3 is
: >> displayed in my R session as a superscript. Here is the result pasted
: >> into this email I'm composing in emacs: "Draszt 0%/1ÃÂÃÂ?iso8859-
15ó"
: >> Another clue for the knowledgable: when I try to display the vector
element
: >> causing trouble, I get
: >> <CHARSXP: "Draszt 0%/1ÃÂÃÂ?iso8859-15ó">
: >> where again the superscipt part is just "3" in my R session. I'm
working in
: >> Linux, R version 1.9.1, 2004-06-21. Your help will be much
appreciated.
: >>
: >> Scott Waichler
: >> Pacific Northwest National Laboratory
: >> scott.waichler <at> pnl.gov
:
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
:
:
More information about the R-help
mailing list