[BioC] XPS: new quality control features (e.g. residual images)
cstrato
cstrato at aon.at
Fri Apr 15 17:26:58 CEST 2011
Dear all,
Let me first thank Dan Tenenbaum and Herve Page for their continuous
support of the special requirements of xps. It is their credit that the
daily multiple platform build/check works so seamlessly.
Now I am pleased to announce that the new release of "xps" does finally
support many quality control features, asked for by many users for quite
some time: These quality control features are based on the new S4 class
"QualTreeSet", which adds quality control features for the raw
intensities, background-corrected and normalized intensities, as well as
to the summarization.
Concretely, the following QC methods are new:
- borderplot(): boxplot of positive and negative border elements of arrays.
- coiplot(): "Center of Intensity" plots of positive and negative border
elements.
- corplot(): a heat map of the array-array Spearman rank correlation
coefficients.
- madplot(): an array-array expression level distance plot (heat map).
- nuseplot(): NUSE boxplots for raw data, background adjusted and
normalized intensities.
- pcaplot(): a PCA plot of the first two principle components of
expression levels or their correlations.
- rleplot(): RLE boxplots for raw, background adjusted and normalized
expression levels.
- image(): a heat map of residual images (both residuals and weights).
- YES, you can draw residual images for Exon ST arrays, Gene ST arrays
and plate arrays!
- plotAffyRNAdeg(): Affymetrix RNA degradation plots.
- YES, you can draw RNA degradation plots for Exon ST arrays, Gene ST
arrays and plate arrays!
- Both, residual images and RNA degradation plots can be created for the
raw intensities, background corrected intensities and normalized
intensities (see parameter 'qualopt').
Furthermore, improvements in the handling of data, especially of large
datasets, have been made in the C++ code:
- treeInfo(): many tree data such as min/max intensities, quantiles,
percent P/M/A calls are
now preprocessed during creation of the root trees and
are stored in the trees
as tree info ('userinfo'); this information allows to
draw boxplots, callplots, etc much faster,
and is accessible by calling method treeInfo().
The following QC methods have been improved:
- boxplot(): it is now possible to draw boxplots of raw data w/o the
need to attachInten() first.
- callplot(): barplots of percent P/M/A calls now access the preprocessd
tree info.
- image(): it is now possible to draw images of intensities w/o the need
to attachInten() first.
The following functions are improved replacements of similar existing
functions:
- plotBoxplot(): allows to easily save boxplots as png, jpeg, etc files
(will replace function boxplot.dev()).
- plotImage(): allows to easily save one or more images as png, jpeg,
etc files (will replace function image.dev()).
The following QC methods have not been changed:
- hist(): density plots of raw data and expression levels still depend
on a non-empty slot 'data', thus for large datasets it is still
necessary to use function root.density().
- mvaplot(): scatter plot of M vs A values.
- pmplot(): barplot of PM and MM intensities still depends on a
non-empty slot 'data'.
Please note that S4 class "QualTreeSet" is still work in progress, there
are still missing features, missing documentation, and the current
features are not tested extensively. For this reason there may be still
some hidden problems/bugs.
If you experience any problems, please report them as follows:
- run R from the console, i.e. RTerm, xterm
- set "verbose=TRUE" in all functions
- send me the complete output of the console
- add your sessionInfo()
- add the ROOT version that you use (should be root_v5.27.04)
As an example you can find below some demo code for the HuGene ST array.
Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
V.i.e.n.n.a A.u.s.t.r.i.a
e.m.a.i.l: cstrato at aon.at
_._._._._._._._._._._._._._._._._._
#------------------------------------------------------------------------------#
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Import raw data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### new R session: load library xps_1.11.12
library(xps)
### define directories:
scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na31"
celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuGene"
datdir <- getwd()
### import raw data (e.g. Affymetrix human tissue dataset for HuGene ST
arrays)
scheme.gene <- root.scheme(file.path(scmdir, "hugene10stv1.root"))
celfiles <-
c("TisMap_Breast_01_v1_WTGene1.CEL","TisMap_Breast_02_v1_WTGene1.CEL",
"TisMap_Breast_03_v1_WTGene1.CEL","TisMap_Prostate_01_v1_WTGene1.CEL",
"TisMap_Prostate_02_v1_WTGene1.CEL","TisMap_Prostate_03_v1_WTGene1.CEL")
celnames <-
c("Breast01","Breast02","Breast03","Prostate01","Prostate02","Prostate03")
data.gene <- import.data(scheme.gene, "TisMapData", filedir=datdir,
celdir=celdir, celfiles=celfiles, celnames=celnames)
str(data.gene)
### show data stored in "userinfo" of ROOT trees, see "?treeInfo"
info <- treeInfo(data.gene, treetype="cel", varlist="*")
### boxplot of raw data (taking advantage of preprocessed quantiles in
"userinfo")
boxplot(data.gene, which="userinfo:fIntenQuant")
### draw whiskers at 10%, 90% and show min/max values as outliers
boxplot(data.gene, which="userinfo:fIntenQuant", range=1.5)
### images of raw data (no longer necessary to attachInten())
names <- unlist(treeNames(data.gene))
image(data.gene, names=names[4], add.legend=TRUE)
### save image as png-file
outfile <- paste("Image_", names[4], sep="")
plotImage(data.gene, type="intensity", names=names[4], dev="png",
outfile=outfile)
### density plot
outfile <- paste("DensityPlot_", names[4], sep="")
root.density(data.gene, canvasname=outfile, save.as="png")
### the following plots still require attachInten()
data.gene <- attachInten(data.gene)
# density plot
hist(data.gene, which="core")
# PM/MM plot
pmplot(data.gene)
data.gene <- removeInten(data.gene)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Quality control
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### new R session
library(xps)
### qualification - RLM
rlm.all <- fitRLM(data.gene, "TisMapRLM_allcore", filedir=datdir, tmpdir="",
background="antigenomic", normalize=TRUE, qualopt="all",
option="transcript", exonlevel="core+affx",
add.data=FALSE)
### show data stored in "userinfo" of ROOT trees
info <- treeInfo(rlm.all, treetype="rlm", varlist="*")
### boxplot of positive/negative border elements
borderplot(rlm.all, transfo=NULL)
borderplot(rlm.all, type="pos", transfo=NULL)
borderplot(rlm.all, type="neg", transfo=NULL, ylim=c(0,1000))
### Center-of-Intensity plots for positive/negative border elements
coiplot(rlm.all)
coiplot(rlm.all, type="pos", )
coiplot(rlm.all, type="neg")
### NUSE plot
nuseplot(rlm.all, names="namepart")
# for raw data only (usual case)
nuseplot(rlm.all, names="namepart:raw")
### RLE plot
rleplot(rlm.all, names="namepart")
# for normalized data only (usual case)
rleplot(rlm.all, names="namepart:normalized")
### residual images of raw data
resname <- getTreeNames(rootFile(rlm.all), treetype="res")
image(rlm.all, type="resids", names=resname[4], add.legend=TRUE)
image(rlm.all, type="pos.resids", names=resname[4])
image(rlm.all, type="neg.resids", names=resname[4])
image(rlm.all, type="sign.resids", names=resname[4])
image(rlm.all, type="weights", names=resname[4], add.legend=TRUE)
### save residual image as png-file
outfile <- paste("ResidualImage_", resname[4], sep="")
plotImage(rlm.all, type="weights", names=resname[4], dev="png",
outfile=outfile)
### residual images of background adjusted data
image(rlm.all, type="weights", qualopt="adjusted", names=resname[10],
add.legend=TRUE)
### residual images of normalized data
image(rlm.all, type="weights", qualopt="normalized", names=resname[16],
add.legend=TRUE)
### get RNA degradation
rnadeg <- AffyRNAdeg(rlm.all)
result <- summaryAffyRNAdeg(rnadeg)
### plot RNA degradation
plotAffyRNAdeg(rnadeg, add.legend=TRUE)
### plot slope of RNA degradation
plotAffyRNAdeg(rnadeg, summary=TRUE)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Preprocessing: RMA
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### new R session
library(xps)
### RMA
data.rma <- rma(data.gene, "TisMapRMAcore", filedir=datdir, tmpdir="",
background="antigenomic", normalize=TRUE,
option="transcript",
exonlevel="core+affx")
### show data stored in "userinfo" of ROOT trees
info <- treeInfo(data.rma, treetype="mdp", varlist="*")
### boxplot of normalized expression levels
boxplot(data.rma, which="userinfo:fLevelQuant")
### density plot
hist(data.rma)
### RLE plot
rleplot(data.rma)
mboxplot(data.rma, ylim=c(-3,3))
### M vs A plots
mvaplot(data.rma, pch=20, ylim=c(-4,4))
### Detection Above Background Call (DABG)
call.dabg <- dabg.call(data.gene,"TisMapDABGcore",filedir=datdir,
exonlevel="core+affx")
# note: alpha1 and alpha2 need to be adjusted to get usable P/M/A calls
### show data stored in "userinfo" of ROOT trees
info <- treeInfo(call.dabg, treetype="dab", varlist="*")
### detection call plots
callplot(call.dabg)
callplot(call.dabg, beside=FALSE)
#---------------------------------------------------------------------#
More information about the Bioconductor
mailing list