[R] TM reader with text

Mickael R problem clevenot.mickael at gmail.com
Thu Mar 1 00:00:17 CET 2012


Hello everybody,
I work, I try, with TM but I have a problem with some special words in
french. I think this is due to the manner to transform PDF to text, but I'm
not perfectly sure. 
Let's see to the example :

findFreqTerms(tdm1,30)
    [33] "<U+F0A3>"            "<U+FB01>n"           "<U+FB01>nancement"  
"<U+FB01>nancier"     "<U+FB01>nancière"    "<U+FB01>nancières"  
"<U+FB01>nanciers"    "<U+FB01>xe"         

Some french words are not well reading by TM with the reader readPlain. I
try to use reader= reader PDF. But it doesn't work so I must transformed PDF
text to text. And some words are not understand so when I use 
TermDocumentMatrix a word like inflation diseappear. It's a big probleme for
me. I spend lot of time on this problem, any idea ? Thank's for you time.
Best regard"s
Mickaël    


--
View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433394.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list