[BioC] averaging multiple probes for same gene on agilent array

Tobias Straub tstraub at med.uni-muenchen.de
Thu Jul 23 09:23:01 CEST 2009


Hi Alison,

I agree that from a biologist point of view a summarization on the  
gene level is very much wanted, therefore I would prefer summarize as  
early as possible (before testing for differential expression). I  
think, however, that the strategy will depend a bit on the rationale  
of probe design: if probes are e.g. always placed on different exons  
then you might expect very different Ms and the summarization is very  
problematic (also from a biological point of view).

My personal way to deal with your problem on Agilent arrays is to  
first filter the probes before gene summarization based on several  
criteria
a) agilent spot quality criteria (whatever you have, whatever you like)
b) at present I also apply A-value cutoffs as the Ms are not reliable  
under and above certain expression levels

My gene summarization is based on the assumption that the highest Ms  
are the most meaningful (maybe the most 'real'), therefore I do not  
calculate medians or sth similar but simply keep just the probe with  
the highest median of absolute Ms across the arrays. if most of your  
genes comprise 3 probes is anyway difficult to average.

if anyone has better ideas, I am looking forward to hear them!
best
Tobias


On Jul 22, 2009, at 9:00 PM, Alison Waller wrote:

> Dear Bioconductor list,
>
> I am analysing data from  a custom Agilent array with 3600 spots  
> using Limma.
>
> There are 3 probes for each gene (usually, however some genes only  
> have one probe), all probes are in duplicate.
>
> I would like to obtain an average M value for each gene.
>
> Examples of the spot ID's are as below.
> D137-cbdb_A1587_1
> D137-cbdb_A1587_1
> D137-cbdb_A1587_2
> D137-cbdb_A1587_2
> D137-cbdb_A1587_3
> D137-cbdb_A1587_3
> D138-cbdb_A1594
> D138-cbdb_A1594
>
>
> One option I thought of was to adjust the GAL file to have identical  
> IDs for all of the probes for the same gene and then use the  
> avereps() function.
>
> ID	Name
>
> D137	D137-cbdb_A1587_1
> D137	D137-cbdb_A1587_1
> D137	D137-cbdb_A1587_2
> D137	D137-cbdb_A1587_2
> D137	D137-cbdb_A1587_3
> D137	D137-cbdb_A1587_3
> D138	D138-cbdb_A1594
> D138	D138-cbdb_A1594
>
> However, the avereps() function seems more suitable for actual  
> duplicates, for probesets I would like to use some weighted average  
> where probes with intensities which are futher from the mean of the  
> probe set are down weighted (for example the tukey biweight).
>
> Does anyone have experience with similar arrays or suggestions of an  
> appropriate function.
>
> thank you,
>
> alison
>
> ---------------------------------------------------------
> Alison Waller  Ph.D
> alison.waller at utoronto.ca
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------------------------------------------------------------------
Tobias Straub   ++4989218075439   Adolf-Butenandt-Institute, München D



More information about the Bioconductor mailing list