[R] Remove highly correlated variables from a data frame or matrix

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu Nov 14 21:09:08 CET 2019


Obvious advice:

DON'T DO THIS!

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <sokovic.anamarija using gmail.com>
wrote:

> Hello,
>
> I have a data frame like this (a matrix):
> head(calc.rho)
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>
> > dim(calc.rho)
> [1] 246 246
>
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8
>
> I tried this:
>
> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > dim(data)
> [1] 246   0
>
> Can you please advise,
>
> Thanks
> Ana
>
> But this removes everything.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list