[R] Choosing and preserving a random duplicate

Peter Alspach Peter.Alspach at plantandfood.co.nz
Wed Mar 31 01:57:33 CEST 2010


Tena koe Jeff

If I understand you correctly, one approach would be to randomly order
your dataframe, remove the duplicates, and then reorder the resulting
dataframe back into the original order:

g10dfA <- g10df[sample(nrow(g10df)),]
g10dfA <- g10dfA[!duplicated(g10dfA$GENE),]
g10dfA <- g10dfA[order(g10dfA$PVAL),]

All untested.

HTH ....

Peter Alspach

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of jeff.m.ewers
> Sent: Wednesday, 31 March 2010 12:33 p.m.
> To: r-help at r-project.org
> Subject: [R] Choosing and preserving a random duplicate
> 
> 
> Dear R-Helpers,
> 
> I have a dataframe (g10df) formatted like this:
> 
>     GENE             PVAL
> 1 KCTD12      4.06904e-22
> 2 UNC93A      9.91852e-22
> 3  CDKN3      1.24695e-21
> 4 CLEC2B      4.71759e-21
> 5   DAB2      1.12062e-20
> 
> The rows are ranked in ascending order by PVAL, and I need to end up
> with
> the same relative order. There are duplicate entries for genes in the
> first
> column with corresponding p-values in the second, but the p-values are
> unique. I had intended to use the plyr package to remove these
> duplicates:
> 
> ddply(g10df, "GENE", summarise, PVAL = mean(PVAL))
> 
> But it occurred to me that instead of averaging the p-values for each
> set of
> duplicates, I should instead select one duplicate at random, and
remove
> the
> rest.
> 
> I am relatively new to R, and I have not been able to find a way to do
> this,
> with plyr or otherwise. Any help would be greatly appreciated.
> 
> Thanks and best regards,
> 
> Jeff
> 
> 
> 
> 
> --
> View this message in context: http://n4.nabble.com/Choosing-and-
> preserving-a-random-duplicate-tp1746091p1746091.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list