[R] strange answer when using 'aggregate()' with a formula
Fox, John
jfox at mcmaster.ca
Thu Jan 21 07:52:36 CET 2016
Dear Chel Hee Lee,
With the formula method, the default na.action is na.omit; thus,
> aggregate(y~grp, data=tmp, function(x) sum(is.na(x)), na.action=na.pass)
grp y
1 2 1
2 3 0
I hope this helps,
John
-----------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Chel Hee Lee
> Sent: January 21, 2016 5:08 AM
> To: R-help at r-project.org
> Subject: [R] strange answer when using 'aggregate()' with a formula
>
> Could you kindly test the following codes? It is because I found strange answer
> when 'aggregate()' is used with a formula.
>
> I am trying to count how many missing data entries are in each group.
> For this exercise, I created data as below:
>
> > tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5)) > tmp
> grp y
> 1 2 NA
> 2 3 0.5
> 3 2 3.0
> 4 3 0.5
>
> I see that observations (variable y) can be grouped into two groups (variable
> grp). For group 2, y has NA and 3.0. For group 3, y has 0.5 and 0.5. Hence, the
> number of missing values is 1 and 0 for group 2 and
> 3, respectively. This work can be done using 'aggregate()' in the
> 'stats' package as below:
>
> > aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x)))
> grp x
> 1 2 1
> 2 3 0
>
> A formula can be used as below:
>
> > aggregate(y~grp, data=tmp, function(x) sum(is.na(x)))
> grp y
> 1 2 0
> 2 3 0
>
> What a surprise! Is this a bug? I would appreciate if you share the
> results after testing the codes. Thank you so much for your helps in
> advance!
>
> Chel Hee Lee
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list