[BioC] Limma analysis of focused arrays vs. whole genome arrays

Tue Jun 7 15:33:51 CEST 2005

Hi,

The lab I work with has used "whole genome" human arrays (~18,000 
genes) for a couple years and I have helped with the analysis using 
Limma.  Now, due to costs, they are now considering switching from 
whole genome arrays to focused arrays with ~400 genes of interest 
(selected from the whole-genome array results).

The obvious analysis problems with a focused array where most genes are 
changing are:

1. LOESS normalization assumes most genes are not changing.  If most of 
the genes are expected to change, there is no basis to recenter the 
data around zero.  The response from the lab was that they would be 
willing to include 100-150 genes that are not expected to change.

2. The B-statistic in Limma requires a parameter indicating a certain 
fraction of genes are changing.  The corresponding moderated 
t-statistic uses the data from all genes to moderate the standard error 
in the t calculation.  Both of these could change dramatically if most 
of the genes on the array are changing.

My questions are:

1. Are my concerns valid and are there ways around around them?  Are 
there other analysis pitfalls with this scenario?

2. Can Limma handle situations where most of an array is expected to 
change?  What modifications, if any, need to be made to the Limma 
analysis to account for this?

3. Alternatively, is there a more appropriate statistical package to 
use in this case?

Thanks.

--
Mike