[R] TD Matrix
Huntsinger, Reid
reid_huntsinger at merck.com
Fri Mar 18 02:11:59 CET 2005
Do you mean when you encounter a new term? I would think document *length*
wouldn't matter; presumably you have a list of terms already. If so you
could treat each document as a vector of term codes, then use "tabulate" to
get the column for that document.
If you're using all terms that appear in any document, and you don't want to
compile a list of terms first, then you might want to think of creating a
sparse representation as in the sparseM package and using the sparse linear
algebra routines there. Just an idea, though.
Reid Huntsinger
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ryan Steckel
Sent: Thursday, March 17, 2005 6:01 PM
To: r-help at stat.math.ethz.ch
Subject: [R] TD Matrix
I'm trying to create a term document matrix where the columns are the
documents, the rows are the terms in the documents, and the cells are a
weight of term frequency in the document. My problem is the documents
are all different lengths. So when I add a new document, if the document
length is greater than the max document length in the matrix, I have to
resize the matrix and do a cbind operation.
Does anyone know of an easier way?
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list