[R] Unicode Text Segmentation Algorithms already implemented in R?

Sascha Wolfer wolfer at ids-mannheim.de
Thu Mar 3 10:47:02 CET 2016


Hello list members,

I am looking for an implementation of Unicode text segmentation (word boundary detection) algorithms in R. You can find information about the algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries

The help page for the function ‚casefuns‘ from the excellent ‚Unicode‘ package says: "Other methods will be added eventually (once the Unicode text segmentation algorithm is implemented for detecting word boundaries).“ My simple question is: Are these algorithms already implemented in an R package? I didn’t find anything on the web, but I am counting on the power of this list. My Stata-using colleague is already picking at me… (in Stata, the function ’ustrword’ does exactly what I want to do in R).

Thanks for your help, have a good day, you all!
Sascha W.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160303/95daccff/attachment.bin>


More information about the R-help mailing list