[BioC] what is the best baseline transformation method before clustering

Sean Davis sdavis2 at mail.nih.gov
Tue Sep 16 15:40:29 CEST 2008


On Tue, Sep 16, 2008 at 7:42 AM, Ruppert Valentino <ruppert7 at hotmail.com> wrote:
>
> Hello,
>
> I tried to cluster data from Affy U133A by normalising with gcrma then zscoring but I am get different values and results from when using the Eisen Cluster 3.0 software and other commercial software. I am wondering what is the best way to baseline transform the data after normalisation show the most variable data in the set that can be used to show the relationship in clustering.
>

There is not a "best way".  And, of course, you will potentially get
different clustering depending on the parameters that you use in the
different software packages.  The nice thing about R is that you can
use pretty much any transformation you like combined with any
clustering method and distance measure.

As for the most variable genes, you will want to determine those
BEFORE you Z-score transform.  There are many messages in the list
archives dealing with filtering genes.

Hope that helps,
Sean

> In Genespring they use baseling transformation as follows :
>
> Baseline to median of all samples: For each probe the median of the log summarized values from all the samples is calculated and subtracted from each of the samples.
>
> In Cluster 3.0
>
> It is recommended to log transform the data and mean or median centre the genes to transform the data.
>
>
> What is the best way to go about base transforming (e.g. scaling, mean centering) the data in biocondcutor before clustering them?
>
> Thanks
>
>
> Ruppert.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list