[R] Odp: reducing data.frame

Thu Feb 25 08:04:51 CET 2010

Hi

you can use aggregate or tapply. You did not specify which function to use 
for "reduction" so I assume mean.

aggregate(multi[, some columns], multi[, c("id", "r")], mean, na.rm=T)

but this does not solve character columns. For them you could maybe try 
?ave. or split/sapply way.

There could be another issue with r values which seems to be fractional 
numeric and depending on their way of creation they may not be equal.

Regards
Petr

r-help-bounces at r-project.org napsal dne 25.02.2010 06:44:03:

> Hi All,
> 
> Is there an easy way to reduce a data.frame to 1 'id' per row while 
keeping
> information from the other rows of that same variable, if applicable? 
e.g.:
> 
> # data
> 
>  multi[1:15,]
>      id         r  n wi   wi.tau         z   k alliance a.rater   eml
> treatment outcome  o.rater german
> 1   100 0.2800000 44 41 21.72514 0.2876821 210     <NA>    <NA>  <NA>
>  <NA>    <NA>   Client   <NA>
> 2   100 0.2800000 44 41 21.80953 0.2876821 182     <NA>    <NA> Early
>  <NA>    <NA>     <NA>   <NA>
> 3   100 0.2800000 44 41 22.36641 0.2876821 206     <NA>  Client  <NA>
>  <NA>    <NA>     <NA>   <NA>
> 4   100 0.2800000 44 41 23.59224 0.2876821 188     <NA>    <NA>  <NA>
>  <NA>    <NA>     <NA>  Other
> 5   100 0.2800000 44 41 23.83157 0.2876821 147      WAI    <NA>  <NA>
>  <NA>    <NA>     <NA>   <NA>
> 6   101 0.0000000 37 34 19.65678 0.0000000 182     <NA>    <NA> Early
>  <NA>    <NA>     <NA>   <NA>
> 7   101 0.5423790 37 34 17.65078 0.6075200  98     <NA>    <NA>  <NA>
> Psychodymic    <NA>     <NA>   <NA>
> 8   101 0.5423790 37 34 19.58820 0.6075200 210     <NA>    <NA>  <NA>
>  <NA>    <NA> Observer   <NA>
> 9   101 0.5423790 37 34 21.09334 0.6075200 188     <NA>    <NA>  <NA>
>  <NA>    <NA>     <NA>  Other
> 10  101 0.9075737 37 34 19.65678 1.5135878 182     <NA>    <NA>  Late
>  <NA>    <NA>     <NA>   <NA>
> 11 103a 0.4950000 18 15 10.36364 0.5426615  90     <NA>    <NA>  <NA>
>  <NA>     SCL     <NA>   <NA>
> 12 103a 0.6171548 18 15 11.32425 0.7203964 210     <NA>    <NA>  <NA>
>  <NA>    <NA> Observer   <NA>
> 13 103a 0.6171548 18 15 11.34714 0.7203964 182     <NA>    <NA> Early
>  <NA>    <NA>     <NA>   <NA>
> 14 103a 0.6171548 18 15 11.49606 0.7203964 206     <NA>  Client  <NA>
>  <NA>    <NA>     <NA>   <NA>
> 15 103a 0.6171548 18 15 11.81150 0.7203964 188     <NA>    <NA>  <NA>
>  <NA>    <NA>     <NA>  Other
> 
> # with the goal of having a reduced df (1 id per row) like this:
> 
>    id         r  n wi   wi.tau         z   k alliance a.rater   eml
> treatment outcome  o.rater german
> 1   100 0.2800000 44 41 21.72514 0.2876821 210     wai    client  early
>    <NA>    <NA>   Client   other
>      101 etc...
> 
> Ideally, I would like to reduce by id and r, if the values are the same 
and
> keep any discrepant values as a separate row (if possible), e.g.:
> 
> 6   101 0.0000000 37 34 19.65678 0.0000000 182     <NA>    <NA> Early
>  <NA>    <NA>     <NA>   <NA>
> 7   101 0.5423790 37 34 17.65078 0.6075200  98     <NA>    <NA>  Late
> Psychodymic    <NA>   Observer  Other
> 
> I appreciate any assistance,
> 
> AC
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.