[R] extracting character values
Uwe Ligges
ligges at statistik.tu-dortmund.de
Sun Jan 13 18:06:02 CET 2013
On 13.01.2013 09:53, Biau David wrote:
> Dear all,
>
> I have a dataframe of names (netw), with each cell including last name and initials of an author; some cells have NA. I would like to extract only the last name from each cell; this new dataframe is calle 'res'
>
>
> Here is what I do:
>
> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>
> for (i in 1:x)
> {
> wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
> res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
> }
>
>
> the problem is that I cannot manage to extract 'complex' names properly such as ' van der hoops bf ': here I only get 'van', the real last name is 'van der hoops' and 'bf' are the initials. Basically the last name has always a minimum of 3 consecutive letters, but may have 3 or more letters separated by one or more space; the cell may start by a space too; initials never have more than 2 letters.
>
> Someone would have a nice idea for that? Thanks,
>
Maybe some poeple will, but an example of your data will actually help
them to help.
Your code is not reproducible without providing the netw object.
Best,
Uwe Ligges
>
> David
>
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list