[BioC] Summarising Probe Sets for Agilent 4x44 Arrays

Francois Pepin francois.pepin at sequentainc.com
Thu Dec 8 22:44:20 CET 2011


Hi Sam,

It depends a lot on how you have designed your custom array. One of the reason for multiple probes in the whole genome 44k arrays is that they have given different results in their test datasets. In that case, summarizing can be counterproductive.

>  1.  Is summarisation ever a good idea for Agilent probe sets (we have 8 probes per transcript), and if so, are their routines in R that would enable us to do this?

It could be, depending on the probe design and what your goal is. One way would be to just average over them. If you have more complicated behavior between your probes, then an RMA-style summarization could work well. Without knowing what your design is and what your data looks like, it's hard to tell. I'm not aware of R routines that do this out of the box, but I haven't checked in a while and they could be easy to write.

Another type of "summarization" would be to chose a representative probe per gene (e.g. geneFilter::findLargest). You'd end up throwing away 7/8 of your array, but it works well if some probes are definitely better than others.

>  2.  If summarisation is a bad idea for Agilent data sets would taking the median signal intensity be a better strategy?
I'd consider taking the median as a form of summarization, like I suggested an average above. If all your probes show a very similar signal, then it could be a good option.

>  3.  Can anybody recommend a good hierarchical clustering routine in R that would be suitable for our Agi one-colour data, whether we take all individual probes or just the median signal intensity?  (I thought maybe oompa or BiClust?)

I'm a fan of the basic hclust routine with method='ward', but that's not saying the others aren't good.

Hope this helps,

François Pepin
Scientist
 
Sequenta, Inc.
400 E. Jamie Court, Suite 301
South San Francisco, CA 94080
 
650 243 3929 p
 
francois.pepin at sequentainc.com
www.sequentainc.com
 
The contents of this e-mail message and any attachments are intended solely for the addressee(s) named in this message.  This communication is intended to be and to remain confidential and may be subject to applicable attorney/client and/or work product privileges.  If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and its attachments.  Do not deliver, distribute or copy this message and/or any attachments and if you are not the intended recipient, do not disclose the contents or take any action in reliance upon the information contained in this communication or any attachments.



More information about the Bioconductor mailing list