[R] Identifying special characters in a text file
jim holtman
jholtman at gmail.com
Fri Feb 12 03:31:06 CET 2010
Setup a regular expression to only keep what you want. This example
keep alpha, nums, spaces , commas and periods:
> x <- readLines(textConnection('I discovered that the following works:
+ any(is.na(strsplit(readLines(FILE), "")))
+
+ I am wondering whether anyone has a better approach to this problem.
+
+ Dennis bullet ©©©ƒƒƒƒƒƒŽŽŽŽŽŽŸŸŸ
+
+ Dennis Fisher MD
+ P < (The "P Less Than" Company)
+ Phone: 1-866-PLessThan (1-866-753-7784)
+ Fax: 1-866-PLessThan (1-866-753-7784)
+ www.PLessThan.com'))
> closeAllConnections()
> # replace characters not matching alphanum, space, period, comma
> gsub("[^[:alnum:][:space:][,.]", "", x) # regular expression to change
[1] "I discovered that the following works"
[2] " anyis.nastrsplitreadLinesFILE, "
[3] ""
[4] "I am wondering whether anyone has a better approach to this problem."
[5] ""
[6] "Dennis bullet "
[7] ""
[8] "Dennis Fisher MD"
[9] "P The P Less Than Company"
[10] "Phone 1866PLessThan 18667537784"
[11] "Fax 1866PLessThan 18667537784"
[12] "www.PLessThan.com"
>
>
On Thu, Feb 11, 2010 at 8:46 PM, Dennis Fisher <fisher at plessthan.com> wrote:
> Colleagues
>
> R 2.10.1 on a Mac
>
> I read in textfiles using readLines, then I process those files, then I use R to execute another program. Occasionally those files contain characters other than letter / numbers / routine punctuation marks. For example, a bullet (option-8 on a Mac) triggers the problem.
>
> Although R can read and process those characters, the other program cannot so I would like to identify these characters and exit gracefully with a warning.
>
> I discovered that the following works:
> any(is.na(strsplit(readLines(FILE), "")))
>
> I am wondering whether anyone has a better approach to this problem.
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list