[BioC] Tiling array question, Agilent platform, multiple different chips.
Sean Davis
sdavis2 at mail.nih.gov
Wed Apr 18 03:09:47 CEST 2007
Eugene Bolotin wrote:
> Dear Bioconductor mailing list,
> I am analysing some tiling, chip-on-chip, two color (one is input and the
> other is chromatin), Agilent data. There are 10 arrays with ~44000
> features, scanned with GenePix, each that represent most of the human
> promoters of about 8kb regions, each with one biological replicate giving
> the grand total of 20 arrays. I am interested in getting high resolution
> peaks, hopefully with p-values.
If I understand you, you have 10 different array designs, each covering
a portion of the genome? I will make that assumption below, so correct
me if I misunderstood.
Look at the ACME or Ringo packages (both in the devel/bioc 2.0
repository). Both are geared toward nimblegen arrays, but they offer
some methods for dealing with ChIP/chip data.
> I am trying to use Limma to normalize them
> using RMA. However all these arrays have different probes, so in the end I
> should end up with ~440,000 different probe values. However Limma treats
> these arrays as replicates and I only end up with 44,000 probes. How can I
> keep it from doing that?
You can't. If you want to use limma, you will need to load each set of
arrays with the same probes as a separate batch.
> Also, any suggestions about normalization methods
> would be greatly appreciated.
>
I would load the arrays separately, median center them and scale each
set of arrays to have the same MAD (on the log2 scale). Because of the
strong correlation between probes along the chromosome, probe-specific
artifacts, etc. are much less harmful than for gene expression analyses.
Nonlinear normalization methods have the potential of reducing any
signal, so unless you have a strong reason to use them, I would suggest
not using them.
After loading the arrays, you will want to combine them, perhaps as a
data.frame. Then, order the probes by chromosome and chromosome
position. Finally, you can take your combined data and form one of the
data structures required by ACME or Ringo.
Sean
More information about the Bioconductor
mailing list