[R] Using gregexpr and regmatches but getting Iconv error

Adel adel.daoud at socav.gu.se
Thu Dec 11 16:40:30 CET 2014


I have stumbled upon a problem when using gregexpr and regmatches, with the
following error-message: 

Error in iconv(x, "latin1", "ASCII") : 
  'x' must be a list of NULL or raw vectors 

The data: 

I have two journal articles and after some regex manipulation I am at the
following situation: 

# manipluat only two full text articles 
author.test <- articles1[1:2]   
# extract author informaiton 
r <- gregexpr("(\"authors\":(.*?)\"(.*?)\")|(\"authors\": \\[(.*?)\\],)",
authors.raw <- regmatches(author.test, r) 

[1] "\"authors\": [\"Allan G. KING\", \"B. Lindsay LOWELL\", \"Frank D.

[1] "\"authors\": \"Chris Baldry\", \"" 

Now, if I want to conduct additional regex manipulation I get the Error
stated above. 

r <-  gregexpr("([^(\"authors\":)])(.*?)(\"(.*?)\")", authors.raw) 
authors.raw <- regmatches(authors.raw, r) 

Error in iconv(x, "latin1", "ASCII") : 
  'x' must be a list of NULL or raw vectors 

One of the ways to avoid this is to unlist(authors.raw)  - see below - but
the problem with this is that I lose some information which was contained in
the list. The first element contains three character elements and which are
the authors of the first paper. I want to keep them in that list format. 

> authors.raw <- unlist(regmatches(authors.raw, r)) 
> authors.raw 
[1] " [\"Allan G. KING\""     ", \"B. Lindsay LOWELL\"" ", \"Frank D.
BEAN\""     " \"Chris Baldry\"" 

So what I want to do is to avoid unlis() and apply the gregex() multiple
times in a row. Any ideas? 

Thanks in advance 

View this message in context: http://r.789695.n4.nabble.com/Using-gregexpr-and-regmatches-but-getting-Iconv-error-tp4700677.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list