[R] Hmisc summarize() with level "" in by variable

Frank E Harrell Jr f.harrell at vanderbilt.edu
Sat Jun 13 15:46:36 CEST 2009


Sorry about the bug, which is now fixed.  You can get the fix by entering

source('http://biostat.mc.vanderbilt.edu/cgi-bin/viewvc.cgi/*checkout*/Hmisc/trunk/R/summary.formula.s?rev=661')

until we update the package.

Frank


Michael Erickson wrote:
> I was using summarize() in a data set in which one of the levels of
> the by variable was "".  The summary statistic was consistently off by
> one level and the "" level was not in the output data frame.  I tried
> to report it as a bug, but I could not log into the Hmisc bug
> reporting website to do so.  I searched for this in the email
> archives.  If it's there, I failed to find it.  Should I try to pursue
> this as a bug, or am I using summarize incorrectly?  Here is my
> example along with the output:
> 
>> tst1 <- data.frame(a=factor(c("", "A", "B", "C")),
> +                   x=1:4)
>> tst1
>   a x
> 1   1
> 2 A 2
> 3 B 3
> 4 C 4
>> with(tst1, summarize(x, by=llist(a), FUN=mean))
>   a x
> 1 A 1
> 2 B 2
> 3 C 3
>> with(tst1, aggregate(x, by=list(a), FUN=mean))
>   Group.1 x
> 1         1
> 2       A 2
> 3       B 3
> 4       C 4
> 
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> i486-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Hmisc_3.6-0
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.13 grid_2.9.0      lattice_0.17-22
> 
> 
> Michael
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list