[R] Tukey HSD plot with lines indicating (non-)significance
Karl Ove Hufthammer
karl at huftis.org
Tue Jan 15 12:46:23 CET 2013
Den 2013-01-14 19:58 skreiv Richard M. Heiberger:
> Please look at the MMC (Mean-mean Multiple Comparisons) plot in the
> HH package.
> It displays both the means and the differences.
>
> install.packages("HH") ## if you don't already have it.
> library(HH)
> ?MMC
I have now coded a very quick-and-dirty solution for my line graph.
Example follows. I use the results of the 'cld' function in multcomp
to calculcate which groups are significantly different from each other.
Important note: The LetterMatrix object contains the needed
data, but the rows are not in the same order as the original levels,
and unfortunately the row names do *not* match the level names
(spaces seems to be stripped -- I don't know why). That's the reason
I remove the spaces in the example below.
Note: The levels should preferably be ordered by the group means,
but the function works (modulo any bugs) even if they're not.
Dotted lines are then used to indicate significant differences.
Important note: There are bound to be bugs. Do check the results
before using the resulting graph.
# Calculate non-significance lines for Tukey HSD
tukeyHSDLines=function(l, removeEndLines=TRUE,
removeSingleGroups=FALSE) {
# Calculate Tukey HSD values
library(multcomp)
l.glht=glht(l, linfct = mcp(trt = "Tukey"))
# Calculcate a compact letter display (CLD)
l.cld=cld(l.glht)
lmat=l.cld$mcletters$LetterMatrix[levels(d$trt),]
# Calculate the non-significance lines that should be
# drawn, based on the CLD data
calc.lines=function(x) {
r=rle(x)
lend=cumsum(r$lengths) # Line end index
lstart=c(1,lend[-length(lend)]+1) # Line start index
d2=data.frame(lstart=lstart-.4, lend=lend+.4, nodraw=!r$values)
if( removeEndLines ) {
if(d2$nodraw[1]) d2=d2[-1,] # Remove 'leading'
dotted lines
if(d2$nodraw[nrow(d2)]) d2=d2[-nrow(d2),] # Remove 'trailing'
dotted lines
}
d2
}
linlist=apply(lmat, 2, calc.lines)
lindat=do.call(rbind, linlist)
lindat$y=rep(seq_along(linlist), times=sapply(linlist, nrow))
if (removeSingleGroups) {
lindat=subset(lindat, !(y %in% which(tabulate(lindat$y)==1)))
lindat$y=as.numeric(factor(lindat$y))
}
lindat
}
# Example:
library(DAAG)
d=rice # The dataset to use
levels(d$trt)=gsub(" ", "", levels(d$trt)) # Remove spaces from level
names
# d$trt=reorder(d$trt, -d$ShootDryMass) # Graph looks better with
reordered factor
l=aov(ShootDryMass ~ trt, data=d) # Fit a one-way ANOVA
lindat=tukeyHSDLines(l)
lindat$y=150-lindat$y*3
library(ggplot2)
ggplot(d, aes(x=trt, y=ShootDryMass)) + geom_boxplot(outlier.colour=NA)
+
geom_jitter(col="red", size=3, position=position_jitter(width=.1)) +
geom_segment(aes(x=lstart, xend=lend, y=y, yend=y, linetype=nodraw),
lindat)
--
Karl Ove Hufthammer
More information about the R-help
mailing list