[R] DocumentTermMatrix error
Matevž Pavlič
matevz.pavlic at gi-zrmk.si
Sat May 21 14:58:42 CEST 2011
Got it...the problem was with Slovenian characters. Once i replaced them with normal characters it works fine.
Tnx anyway, m
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Matevž Pavlič
Sent: Saturday, May 21, 2011 1:27 PM
To: r-help at r-project.org
Cc: feinerer at logic.at
Subject: [R] DocumentTermMatrix error
Hi all,
I have tried to create a DocumentTermMatrix with a tm package, but i get this error :
Error in tolower(txt) :
invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
I tried doing this as it is showed in :
http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),
with this R code :
setwd("C:/Users/mpavlic/Desktop/temp")
tekst <- Corpus(DirSource("."))
>Warning message:
>In readLines(y, encoding = x$Encoding) :
>incomplete final line found on './test.txt'
meta(tekst, "Heading", "local") <- c("test")
meta(tekst[[1]])
>Available meta data pairs are:
Author :
DateTimeStamp: 2011-05-21 11:25:21
Description :
Heading : test
ID : test.txt
Language : en
Origin :
test <- TermDocumentMatrix(tekst)
> Error in tolower(txt) :
> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
Attached is a small sample (test.txt) on which i worked.
Any help would be appreaciated,
m
More information about the R-help
mailing list