[R] cluster by unique value

jim holtman jholtman at gmail.com
Mon Jul 18 13:39:36 CEST 2011


Also read FAQ 7.31 before using 'numerics' as grouping factors.

On Mon, Jul 18, 2011 at 6:36 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Your data1 and your data1_class file differ in the first three
> columns. Assuming that's an error, here's one way to do it:
>
>> data1 <- data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),layer2=c(2,3,2,2,1,2,3,2,2,2), layer3=c(1,1,1,1,1,1,1,1,1,4))
>> data1 <- cbind(data1, class=as.numeric(as.factor(do.call(paste, data1))))
>> data1
>   layer1 layer2 layer3 class
> 1     0.2      2      1     2
> 2     0.5      3      1     4
> 3     0.2      2      1     2
> 4     0.8      2      1     5
> 5     0.2      1      1     1
> 6     0.5      2      1     3
> 7     0.5      3      1     4
> 8     0.8      2      1     5
> 9     0.2      2      1     2
> 10    0.8      2      4     6
>
> You didn't give a reproducible example, and I didn't want to type in
> all the decimal places, but you should be able to get the idea from
> this example. Also, the class numbers are assigned on sorted character
> rows, from lowest to highest, and not starting with the first one, as
> in your example.  If you do need the latter, some combination of
> unique() and subsetting or merge() may work for you.
>
> Sarah
>
> On Mon, Jul 18, 2011 at 6:23 AM, Alfredo Alessandrini
> <alfreale74 at gmail.com> wrote:
>> Hi,
>>
>> I need to make a cluster classification by the unique values of the data frame.
>>
>> I explain the problem. I need to classify this table, and assign to
>> the same cluster each row that has the same combination of value:
>>
>>
>>> data1
>>             layer_1 layer_2 layer_3
>>   [1,] 0.2460000000       2    -0.1
>>   [2,] 0.5460000000       3    -0.1
>>   [3,] 0.2460000000       2    -0.1
>>   [4,] 0.8460000000       2    -0.1
>>   [5,] 0.2460000000       1    -0.1
>>   [6,] 0.5460000000       2    -0.1
>>   [7,] 0.2460000000       2    -0.1
>>   [8,] 0.8460000000       2    -0.1
>>   [9,] 0.2460000000       2    -0.1
>>  [10,] 0.2460000000       2    -0.1
>>
>>
>>> data1_class
>>             layer_1 layer_2 layer_3 class
>>   [1,] 0.2460000000       2    -0.1  1
>>   [2,] 0.5460000000       3    -0.1  2
>>   [3,] 0.2460000000       2    -0.1  1
>>   [4,] 0.8460000000       2    -0.1  3
>>   [5,] 0.2460000000       1    -0.1  4
>>   [6,] 0.5460000000       2    -0.1  5
>>   [7,] 0.5460000000       3    -0.1  2
>>   [8,] 0.8460000000       2    -0.1  3
>>   [9,] 0.2460000000       2    -0.1  1
>>  [10,] 0.8460000000       2    -0.4  6
>>
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list