[R] Can somebody help me with following data manipulation?

David L Carlson dcarlson at tamu.edu
Thu Dec 6 21:42:55 CET 2012


Converting to factors does not get all combinations. 

> v3mean <- aggregate(V3~V1+V2, dat, mean)
> cats <- with(dat, expand.grid(V1=unique(V1), V2=unique(V2)))
> merge(cats, v3mean, all=TRUE)
   V1 V2        V3
1   C  0 0.5000000
2   C  1        NA
3   G  0 1.0000000
4   G  1        NA
5   I  0 0.3333333
6   I  1 0.4285714
7   O  0 1.0000000
8   O  1 0.0000000
9   R  0 0.0000000
10  R  1 0.6666667
11  T  0 0.8333333
12  T  1 0.5000000

But the OP's dat1 contains only 6 observations.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Sarah Goslee
> Sent: Thursday, December 06, 2012 2:04 PM
> To: Christofer Bogaso
> Cc: r-help
> Subject: Re: [R] Can somebody help me with following data manipulation?
> 
> If I understand what you want correctly, aggregate() should do it.
> 
> > aggregate(V3 ~ V1 + V2, "mean", data=dat)
>    V1 V2        V3
> 1   C  0 0.5000000
> 2   G  0 1.0000000
> 3   I  0 0.3333333
> 4   O  0 1.0000000
> 5   R  0 0.0000000
> 6   T  0 0.8333333
> 7   I  1 0.4285714
> 8   O  1 0.0000000
> 9   R  1 0.6666667
> 10  T  1 0.5000000
> 
> That returns the combinations that actually exist.
> 
> If you convert V1 and V2 to factors, thus setting the possible levels,
> all combinations will be returned:
> > dat$V1 <- factor(dat$V1)
> > dat$V2 <- factor(dat$V2)
> > aggregate(V3 ~ V1 + V2, "mean", data=dat)
>    V1 V2        V3
> 1   C  0 0.5000000
> 2   G  0 1.0000000
> 3   I  0 0.3333333
> 4   O  0 1.0000000
> 5   R  0 0.0000000
> 6   T  0 0.8333333
> 7   I  1 0.4285714
> 8   O  1 0.0000000
> 9   R  1 0.6666667
> 10  T  1 0.5000000
> 
> Sarah
> 
> On Thu, Dec 6, 2012 at 2:35 PM, Christofer Bogaso
> <bogaso.christofer at gmail.com> wrote:
> > Dear all, let say I have following data:
> >
> > dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L,
> 6L,
> > 4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L,
> > 3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C",
> > "G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L,
> > 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
> > 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
> > 0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L,
> > 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
> > 0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class =
> > "data.frame", row.names = c(NA,
> > -36L))
> >
> > Now I want to get following kind of data frame out of that:
> >
> > dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L),
> .Label =
> > c("C",
> > "G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L),
> >     V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names =
> c("V1",
> > "V2", "V3"), class = "data.frame", row.names = c(NA, -6L))
> >
> > Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' &
> 'V2 = 0'
> > what is the percentage of '1' for "V3" and so on.....
> >
> > Is there any R function to achieve that directly?
> >
> > Thanks and regards,
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list