[R] Using apply function on duplicates in a data.frame
David Winsemius
dwinsemius at comcast.net
Mon Feb 1 00:22:04 CET 2010
On Jan 31, 2010, at 6:05 PM, Sunny Srivastava wrote:
> Dear R-Helpers,
> I have a data.frame (df) and the head of data.frame looks like
>
> ProbeUID ControlType ProbeName GeneName SystematicName
> 1665 1577 0 pSysX_50_22_1 pSysX_50 pSysX_50
> 5422 5147 0 pSysX_49_8_1 pSysX_49 pSysX_49
> 4042 3843 0 pSysX_51_18_1 pSysX_51 pSysX_51
> 3646 3466 0 sll1514_0_2 sll1514 sll1514
> 2946 2807 0 sll1514_0_1 sll1514 sll1514
> 624 582 0 pSysX_49_8_2 pSysX_49 pSysX_49
>
> Description logFC AveExpr t P.Value adj.P.Val
> 1665 Unknown 4.3887 9.5662 61.038 1.0938e-08 9.4449e-05
> 5422 Unknown -3.5251 6.9103 -35.908 1.7596e-07 3.5912e-04
> 4042 Unknown 2.5302 8.7497 35.112 1.9786e-07 3.5912e-04
> 3646 Unknown 2.3457 11.1678 33.962 2.3549e-07 3.5912e-04
> 2946 Unknown 2.3151 11.3153 32.689 2.8751e-07 3.5912e-04
> 624 Unknown -3.6256 6.8986 -31.777 3.3333e-07 3.5912e-04
> B
> 1665 9.8342
> 5422 8.1650
> 4042 8.0758
> 3646 7.9408
> 2946 7.7822
> 624 7.6622
>
tdf <- tapply(df$logFC, df$GeneName, mean)
ndf <- dataframe(Gnames = names(tdf), mn.logFC= tdf)
> I want to "collapse" this data frame into a new data.frame so that the
> df$GeneName contains no duplicate GeneNames (for eg: sll1514) AND the
> df$logFC contains the average of df$logFC corresponding to these
> GeneNames
> (which had duplicate genenames).
>
> I am aware of an inefficient strategy using loops, but I believe
> that there
> should be a way using Apply functions or may be plyr?
>
> I am not able to think of one at the moment. Can you please help me?
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list