[R] Fanny Clustering
Philippe Grosjean
phgrosjean at sciviews.org
Thu Mar 29 13:06:08 CEST 2007
Sergio Della Franca wrote:
> Ok,
>
> How can i increase the memory of your computer available to R?
Well, if you would like to increase memory of MY computer... you are
welcome to do so... but I doubt it would be of any use for you ;-)
You don't tell us how much RAM you have currently, which platform you
use, etc... The general approach is to use a computer with more RAM, up
to the limit permitted by a 32-bit system for R, and then, to switch to
a 64-bit version under Linux, if you need even more RAM.
The other proposed solution is not stupid. With 70.000 cases, you have a
fairly large dataset. You don't tell use how many groups you expect from
your clustering, but it is often better to use a couple of tens, or
hundreds of representative cases for each group, no more. In supervised
classification, it is easier to build such a training set with
relatively balanced number of items in each group, because targeted
classification is known a priori from the manual classification provided.
With unsupervised classification, you could either try a pure random
subsampling, or select your subsample based on similarity according to a
given distance measurement. I did something like that using a
Malahanobis distance, MDS, and then, stratified subsampling inside a
regular grid placed on top of the MDS plot.
Otherwise, I am not a specialist of unsupervised classification, and
other people here could have better suggestion.
Best,
Philippe Grosjean
>
> 2007/3/29, Philippe Grosjean <phgrosjean at sciviews.org>:
>> 1) Reduce the size of your sample (random or stratified subsampling),
>>
>> 2) Increase the memory of your computer available to R.
>>
>> Best,
>>
>> Philippe Grosjean
>>
>> ..............................................<°}))><........
>> ) ) ) ) )
>> ( ( ( ( ( Prof. Philippe Grosjean
>> ) ) ) ) )
>> ( ( ( ( ( Numerical Ecology of Aquatic Systems
>> ) ) ) ) ) Mons-Hainaut University, Belgium
>> ( ( ( ( (
>> ..............................................................
>>
>> Sergio Della Franca wrote:
>>> Dear R-Helpers,
>>>
>>>
>>> I'd like to develop a fanny clustering on my data set(70.000 rows), but
>> when
>>> i run the procedure i obtain this error:
>>>
>>> error in vector("double", lenght): too big dimension for
>>> the selected vector.
>>>
>>>
>>> How can i solve this problem?
>>>
>>>
>>> Thank you in advance.
>>>
>>>
>>> Sergio Della Franca.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list