[R] Why 'gbm' is not giving me error when I change the response from numeric to categorical?
Marc Schwartz
marc_schwartz at me.com
Fri Oct 4 21:50:26 CEST 2013
On Oct 4, 2013, at 2:35 PM, peter dalgaard <pdalgd at gmail.com> wrote:
>
> On Oct 4, 2013, at 21:16 , Mary Kindall wrote:
>
>> Y[Y < mean(Y)] = 0 #My edit
>> Y[Y >= mean(Y)] = 1 #My edit
>
> I have no clue about gbm, but I don't think the above does what I think you think it does.
>
> Y <- as.integer(Y >= mean(Y))
>
> might be closer to the mark.
Good catch Peter! I didn't pay attention to that initially.
Here is an example:
set.seed(1)
Y <- rnorm(10)
> Y
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
[7] 0.4874291 0.7383247 0.5757814 -0.3053884
> mean(Y)
[1] 0.1322028
Before changing Y:
> Y[Y < mean(Y)]
[1] -0.6264538 -0.8356286 -0.8204684 -0.3053884
> Y[Y >= mean(Y)]
[1] 0.1836433 1.5952808 0.3295078 0.4874291 0.7383247 0.5757814
However, the incantation that Mary is using, which calculates mean(Y) separately in each call, results in:
Y[Y < mean(Y)] = 0
> Y
[1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078 0.0000000
[7] 0.4874291 0.7383247 0.5757814 0.0000000
# mean(Y) is no longer the original value from above
> mean(Y)
[1] 0.3909967
Thus:
Y[Y >= mean(Y)] = 1
> Y
[1] 0.0000000 0.1836433 0.0000000 1.0000000 0.3295078 0.0000000
[7] 1.0000000 1.0000000 1.0000000 0.0000000
Some of the values in Y do not change because the threshold for modifying the values changed as a result of the recalculation of the mean after the first set of values in Y have changed. As Peter noted, you don't end up with a dichotomous vector.
Using Peter's method:
Y <- as.integer(Y >= mean(Y))
> Y
[1] 0 1 0 1 1 0 1 1 1 0
That being said, the original viewpoint stands, which is to not do this due to loss of information.
Regards,
Marc Schwartz
More information about the R-help
mailing list