[R] Nicely formatted summary table with mean, standard deviation or number and proportion

Keith Wong keithw at med.usyd.edu.au
Mon May 14 06:25:41 CEST 2007

Prof Harrell

Thanks for the hint!

I am using Hmisc 3.3-2, on R version 2.5.0 with windows XP.

I saw the argument "prmsd = T" in the help for summary.formula(), and 
couldn't understand how to make it work, but just now realised that it 
should be applied to the latex() function, and not to summary.formula() itself:

options(digits = 2)
(x = summary(treatment ~ age + sex, method = "reverse") )
latex(x, prmsd = T)

That's what I needed, thank you.


At 12:11 PM 14/05/2007, Frank E Harrell Jr wrote:
>Keith Wong wrote:
>>Dear all,
>>The incredibly useful Hmisc package provides a method to generate summary 
>>tables that can be typeset in latex. The Alzola and Harrell book   "An 
>>introduction to S and the Hmisc and Design libraries" provides an example 
>>that generates mean and quartiles for continuous variables, and numbers 
>>and percentages for count variables: summary() with method = 'reverse'.
>>I wonder if there is a way to change it so the mean and standard 
>>deviation are reported instead for continuous variables.
>>I illustrate my question below using an example from the book.
>>Thank you.
>Newer versions of Hmisc have an option to add mean and SD for 
>method='reverse'.  Quartiles are always there.
>>  > ####
>>  > library(Hmisc)
>>  >
>>  > set.seed(173)
>>  > sex = factor(sample(c("m", "f"), 500, rep = T))
>>  > age = rnorm(500, 50, 5)
>>  > treatment = factor(sample(c("Drug", "Placebo"), 500, rep = T))
>>  > summary(sex ~ treatment, fun = table)
>>sex    N=500
>>|         |       |N  |f  |m  |
>>|treatment|Drug   |263|140|123|
>>|         |Placebo|237|133|104|
>>|Overall  |       |500|273|227|
>>  >
>>  >
>>  >
>>  > (x = summary(treatment ~ age + sex, method = "reverse"))
>>  > # generates quartiles for continuous variables
>>Descriptive Statistics by treatment
>>|       |Drug          |Placebo       |
>>|       |(N=263)       |(N=237)       |
>>|age    |46.5/49.9/53.2|46.7/50.0/53.4|
>>|sex : m|   47% (123)  |   44% (104)  |
>>  >
>>  >
>>  > # latex(x) generates a very nicely formatted table
>>  > # but I'd like "mean (standard deviation)" instead of quartiles.
>>  > # this function from 
>> http://tolstoy.newcastle.edu.au/R/e2/help/06/11/4713.html
>>  > g <- function(y) {
>>+   s <- apply(y, 2,
>>+              function(z) {
>>+                z <- z[!is.na(z)]
>>+                n <- length(z)
>>+                if(n==0) c(NA,NA,NA,0) else
>>+                if(n==1) c(z, NA,NA,1) else {
>>+                  m <- mean(z)
>>+                  s <- sd(z)
>>+                  c(N=n, Mean=m, SD=s)
>>+                }
>>+              })
>>+   w <- as.vector(s)
>>+   names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
>>+   w
>>+ }
>>  >
>>  > summary(treatment ~ age + sex, method = "reverse", fun = g)
>>  > # does not work, 'fun' or 'FUN" argument is ignored.
>>Descriptive Statistics by treatment
>>|       |Drug          |Placebo       |
>>|       |(N=263)       |(N=237)       |
>>|age    |46.5/49.9/53.2|46.7/50.0/53.4|
>>|sex : m|   47% (123)  |   44% (104)  |
>>  >
>>  >
>>  > (x1 = summarize(cbind(age), llist(treatment), FUN = g, 
>> stat.name=c("n", "mean", "sd")))
>>    treatment   n mean   sd
>>1      Drug 263 49.9 4.94
>>2   Placebo 237 50.1 4.97
>>  >
>>  > # this works but table is rotated, and it count data has to be
>>  > # treated separately.
>Frank E Harrell Jr   Professor and Chair           School of Medicine
>                      Department of Biostatistics   Vanderbilt University

