[BioC] limma, contrasts, and NA's

Albyn Jones jones at reed.edu
Fri Aug 13 20:42:14 CEST 2010


Following up on a thread I initiated last year on missing values
in limma and contrasts.fit 
     (see
      gmane.science.biology.informatics.conductor:26494
      gmane.science.biology.informatics.conductor:26483)
I have prepared an example illustrating the fact that when there are
missing data, the contrast.fit function not only doesn't give exact
results, but the results depend on the source chosen as the
"reference" in modelMatrix().  When I pointed out to my colleagues in
biology that they could manually construct a design matrix that would
fit contrasts exactly, they rebelled at the need to construct N-1 linearly
indpendent columns for N sources.  The utility of contrasts.fit() is
clear, avoiding the need for non-statisticians to learn to code design
matrices.  

albyn
-- 
Albyn Jones
Reed College
jones at reed.edu

=========================================================================
library(limma)
### necessary files and data are at:
url <- "http://people.reed.edu/~renns/jones_renn/maternal_data/"

targets <- readTargets(paste(url, "targets_fixed.csv", sep = ""),  sep = ",")

RG <- read.maimages(paste(url, targets$FileName, sep = ""), columns = 
    list(R="F635 Median", G="F532 Median", Rb="B635 Median", Gb="B532 Median"),
    other.columns = c("Flags","B635 SD","B532 SD","F635 % Sat.","F532 % Sat."))
names(RG$other) <- c("Flags", "RbSD", "GbSD","RSat" ,"GSat" )

RG$genes <- readGAL(paste(url,"Fishchip4.03_annotatedGAL_20090730HEM.gal", 
            sep = ""))

RG$printer <- getLayout(RG$genes)

### Filter features manually flagged bad and automatedly flagged "not found"
RG$R[RG$other$Flags < 0]<-NA 
RG$G[RG$other$Flags < 0]<-NA

as.numeric(apply(is.na(RG$R),2,sum)) 

### Filter faint features
RG$R[(RG$R < (RG$Rb + 2*RG$other$RbSD))&(RG$G < (RG$Gb + 2*RG$other$GbSD))]<-0
RG$G[RG$R == 0]<-NA
RG$R[RG$R == 0]<-NA

as.numeric(apply(is.na(RG$R),2,sum)) 
table(apply(is.na(RG$R),1,sum))

### Normalize
MA<- normalizeWithinArrays(RG, method = "printtiploess", bc.method = "minimum")

### set up the design matrices
design1<- modelMatrix(targets, ref = "a1126")
design2<- modelMatrix(targets, ref = "a1217")

### contrast coding
cont1 <-  makeContrasts(WvsL = ( - a1111 - a1210 + a0201 + a1217 + 
    a1220 - a1211 - a1219 + a1128 + a1202 - a1028 - a0128)/6, levels=design1)
cont2 <- makeContrasts(WvsL = (a1126 - a1111 - a1210 + a0201  + 
    a1220 - a1211 - a1219 + a1128 + a1202 - a1028 - a0128)/6, levels=design2)

### analysis
fit1 <- lmFit (MA, design1)
fit.cont1 <-  contrasts.fit(fit1, cont1) 
fit.ebayes1 <- eBayes(fit.cont1)

fit2 <- lmFit (MA, design2)
fit.cont2 <-  contrasts.fit(fit2, cont2) 
fit.ebayes2 <- eBayes(fit.cont2)

results1 <- decideTests(fit.ebayes1, method = "separate", 
            adjust.method="none", p.value=0.05)
results2 <- decideTests(fit.ebayes2, method = "separate", 
            adjust.method="none", p.value=0.05)

res<-cbind(results1[,1], results2[,1])
colnames(res)<-c("ref1", "ref2")
vennCounts(res, include="both")
#     ref1 ref2 Counts
#[1,]    0    0  11862
#[2,]    0    1     30
#[3,]    1    0    100
#[4,]    1    1    814



More information about the Bioconductor mailing list