[BioC] Tiling array question, Agilent platform, multiple different chips.

Wed Apr 18 03:09:47 CEST 2007

Eugene Bolotin wrote:
> Dear Bioconductor mailing list,
> I am analysing some tiling, chip-on-chip, two color (one is input and the
> other is chromatin),  Agilent data. There are 10 arrays with ~44000
> features, scanned with GenePix,  each that represent most of the human
> promoters of about 8kb regions, each with one biological replicate giving
> the grand total of 20 arrays. I am interested in getting high resolution
> peaks, hopefully with p-values. 
If I understand you, you have 10 different array designs, each covering 
a portion of the genome?  I will make that assumption below, so correct 
me if I misunderstood.

Look at the ACME or Ringo packages (both in the devel/bioc 2.0 
repository).  Both are geared toward nimblegen arrays, but they offer 
some methods for dealing with ChIP/chip data. 

> I am trying to use Limma to normalize them
> using RMA. However all these arrays have different probes, so in the end I
> should end up with ~440,000 different probe values. However Limma treats
> these arrays as replicates and I only end up with 44,000 probes. How can I
> keep it from doing that? 
You can't.  If you want to use limma, you will need to load each set of 
arrays with the same probes as a separate batch. 
> Also, any suggestions about normalization methods
> would be greatly appreciated.
>   
I would load the arrays separately, median center them and scale each 
set of arrays to have the same MAD (on the log2 scale).  Because of the 
strong correlation between probes along the chromosome, probe-specific 
artifacts, etc. are much less harmful than for gene expression analyses. 
Nonlinear normalization methods have the potential of reducing any 
signal, so unless you have a strong reason to use them, I would suggest 
not using them. 

After loading the arrays, you will want to combine them, perhaps as a 
data.frame.  Then, order the probes by chromosome and chromosome 
position.  Finally, you can take your combined data and form one of the 
data structures required by ACME or Ringo.

Sean