[R] Problem with number characters

Gabor Grothendieck ggrothendieck at myway.com
Fri Oct 15 16:47:25 CEST 2004


Note that there are also regexp classes that define certain character
sets, most notably [:graph:] , which can make it easy to create 
appropriate regexps.  More is in ?regex .

Martin Maechler <maechler <at> stat.math.ethz.ch> writes:

: 
: >>>>> "Spencer" == Spencer Graves <spencer.graves <at> pdf.com>
: >>>>>     on Thu, 14 Oct 2004 13:41:24 -0700 writes:
: 
:     Spencer>   It looks like you have several non-printing
:     Spencer> characters.  "nchar" will give you the total number
:     Spencer> of characters in each character string.
: 
:     Spencer> "strsplit" can break character strings into single
:     Spencer> characters, and "%in%" can be used to classify
:     Spencer> them.
: 
: and you give nice coding examples:
: 
:     Spencer> Consider the following:
:     >> x <- "Draszt 0%/1ÂÂÂÂ?iso8859-15³"
:     >> nx <- nchar(x)
:     >> x. <- strsplit(x, "")
:     >> length(x.[[1]])
:     Spencer> [1] 29
:     >> 
:     >> namechars <- c(letters, LETTERS, as.character(0:9), ".")
: 
: just to be precise:  If 'namechars' is supposed to mean
: ``characters valid in R object names'', then you should have
: added "_" as well:
: 
: namechars <- c(letters, LETTERS, as.character(0:9), ".", "_")
: 
:     >> punctuation <- c(",", "!", "+", "*", "&", "|")
:     >> legalchars <- c(namechars, punctuation)
: 
: and 'legalchars' would have to contain quite a bit more I
: presume, e.g. "$", " <at> ", ....
: (but that wouldn't have been a reason to write this e-mail..)
: 
:     >> legalx <- lapply(x., function(y)(y %in% legalchars))
:     >> x.[[1]][!legalx[[1]]]
:     Spencer> [1] " " "" "%" "/" "Â" "Â" "Â" "Â?" "-" "" "Â" "³"
:     >> 
:     >> sapply(legalx, sum)
:     Spencer> [1] 17
: 
:     Spencer> Will this give you ideas about what to do what you want?
:     Spencer> hope this helps. spencer graves
: 
: (and this too)
: 
: Martin Maechler, ETH Zurich
: 
: 
:     Spencer> Gabor Grothendieck wrote:
: 
:     >> Assuming that the problem is that your input file has 
:     >> additional embedded characters added by the data base
:     >> program you could try extracting just the text using
:     >> the UNIX strings program:
:     >> 
:     >> strings myfile.csv > myfile.txt
:     >> 
:     >> and see if myfile.txt works with R and if not check out
:     >> what the differences are between it and the .csv file.
:     >> 
:     >> Date:   Thu, 14 Oct 2004 11:31:33 -0700 
:     >> From:   Scott Waichler <scott.waichler <at> pnl.gov>
:     >> To:   <r-help <at> stat.math.ethz.ch> 
:     >> Subject:   [R] Problem with number characters 
:     >> 
:     >> 
:     >> I am trying to process text fields scanned in from a csv file that is
:     >> output from the Windows database program FileMakerPro. The characters
:     >> onscreen look like regular text, but R does not like their underlying 
binary form.
:     >> For example, one of text fields contains a name and a number, but
:     >> R recognizes the number as something other than what it appears
:     >> to be in plain text. The character string "Draszt 03" after being
:     >> read into R using scan and ="" becomes "Draszt 03" where the 3 is 
:     >> displayed in my R session as a superscript. Here is the result pasted
:     >> into this email I'm composing in emacs: "Draszt 0%/1ÂÂÂÂ?iso8859-
15³"
:     >> Another clue for the knowledgable: when I try to display the vector 
element
:     >> causing trouble, I get
:     >> <CHARSXP: "Draszt 0%/1ÂÂÂÂ?iso8859-15³">
:     >> where again the superscipt part is just "3" in my R session. I'm 
working in
:     >> Linux, R version 1.9.1, 2004-06-21. Your help will be much 
appreciated.
:     >> 
:     >> Scott Waichler
:     >> Pacific Northwest National Laboratory
:     >> scott.waichler <at> pnl.gov
: 
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:




More information about the R-help mailing list