[R] Finding unique elements faster

apeshifter ch_koch at gmx.de
Tue Dec 9 11:02:57 CET 2014

Thank you all for your suggestions! I must say I am amazed by the number of
people who consider helping out another! Fells like it was a good idea to
start using R - back when I was still using Perl for such tasks, I'd been
happy to have this kind of support!

@ Gheorghe Postelnicu: Unfortunately, the data is not yet in a data frame
when this part of the program starts. At this point, I am trying to fill in
all the relevant vectors (all.word.pairs, word1, word2, freq.word1,
freq.word2, typefreq.w1, typefreq.w2, ...) and then combine them to a data
frame. I will try to get my head around the doParallel package package for
the foreach loop, since parallel computing would certainly be helpful. 

@ Jeff Newmiller: Sound interesting, but I fear the same problem applies as
for Gheorghe's suggestion. I will need a data frame first,for which I do not
have all the correct values... Will keep the package in mind, though, for
future projects.

@ Stefan Evert-3: I am not sure I understand what you mean in the second
example. Since the counting of types is exactly my problem at the moment, I
do not see how I could provide a function that would work more efficiently
in the context you are describing. The line of code that I was giving is
exactly my attempt at doint this... Sorry, I might just not be getting what
you are aiming at... :-/  However, your assumptions are quite correct. word1
and word2 do indeed contain word tokens, as does all.word.pairs. The reason
for this is that I need the word pairs within the vector to be in the same
order as they appeared in the original corpus files. Also, thank you for the
link. I will check this out when I am analysing collocates. However, I
didn't find notes on my specific problem in the slides. However, please do
not think I was not using reference material for designing my script. I was
in fact using  Gries 2009: "Quantitative Corpus Linguistics with R"
for this. The trouble is that the methods in the book help as far as simple
n-gram frequency calculations are concerned (since, e.g. table() would just
do the trick), but methods for this size of repeated checks on tables are
not included.


View this message in context: http://r.789695.n4.nabble.com/Finding-unique-elements-faster-tp4700539p4700582.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list