[R] tm: Read a single text file into a corpus as single document?
Alexander James Rickett
ack.vandal at gmail.com
Tue Jul 19 10:11:46 CEST 2011
Hello everyone,
I'm doing some JGR (a gui frontend for R) development, specifically adding functionality from tm. In order to enable users to select some text files from a file dialog, and turn them into a corpus, I need to be able to generate a corpus using a *SINGLE* text file as a single document, and to append a new document to an existing corpora. I know if I could read files into single character vectors I'd be in business, but I can't find how to do this either. This seems like a no-brainer, so I'm at my wits' end.
Here's pseudo code of what I'd like to be able to do:
##########################################
> corp1doc <- Corpus(singleTextDocSource("path/to/doc")) #read in 1 text doc as a 1-document corpus
> corp1doc
A corpus with 1 text document
> corp1doc[[2]] <- AnotherSingleTextDoc("path/to/doc") #append a second document to the same corpus
> corp1doc
A corpus with 2 text documents
##########################################
I can almost do this with dirSource, by setting pattern='filename', but this requires me to also to separate the path to the enclosing directory, which shouldn't be necessary.
Thanks for taking a look!
More information about the R-help
mailing list