[BioC] summarized expression values from beadarray versus GenomeStudio
Ina Hoeschele
inah at vbi.vt.edu
Fri Apr 15 19:58:12 CEST 2011
Mark,
as to the options in GenomeStudio, it was confirmed to me that I actually have two sets of values:
(1) summarized and quantile normalized (without the global background normalization)
(2) summarized and NOT quantile normalized (without the global background normalization)
There are no other options to set, except not to impute missing values in which case GenomeStudio deletes and bead types which failed for at least one sample.
Thanks, Ina
----- Original Message -----
From: "Ina Hoeschele" <inah at vbi.vt.edu>
To: "Mark Dunning" <mark.dunning at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Sent: Thursday, April 14, 2011 5:13:10 PM
Subject: Re: [BioC] summarized expression values from beadarray versus GenomeStudio
Hi Mark,
sorry for my slow response (I am dealing with the 450K methylation data at the same time ...).
<<
Could you send me the Illumina IDs and/or ArrayAddress IDs of any bead
types that do not get summarized by beadarray?
>>
I currently have this information only for the "Gene" probes, not for the controls (my collaborator sent me the GenomeStudio summarized data only for the Gene probes). Below is the difference between the summarized Gene probes from GenomeStudio versus beadarray:
> length(ILMN_GSData_Gene)
[1] 47320 # number of Gene probes summarized by GenomeStudio
> length(ILMN_BSData_Gene)
[1] 47224 # number of Gene probes summarized by beadarray
> setdiff(ILMN_GSData_Gene,ILMN_BSData_Gene)
[1] "ILMN_2038777" "ILMN_2038774" "ILMN_3164734" "ILMN_3164750" "ILMN_3164765"
[6] "ILMN_3164808" "ILMN_3164838" "ILMN_3164858" "ILMN_3164875" "ILMN_3164905"
[11] "ILMN_3164915" "ILMN_3164950" "ILMN_3164979" "ILMN_3165007" "ILMN_3165027"
[16] "ILMN_3165033" "ILMN_3165086" "ILMN_3165100" "ILMN_3165113" "ILMN_3165130"
[21] "ILMN_3165170" "ILMN_3165190" "ILMN_3165201" "ILMN_3165218" "ILMN_3165229"
[26] "ILMN_3165245" "ILMN_3165277" "ILMN_3165303" "ILMN_3165334" "ILMN_3165363"
[31] "ILMN_3165378" "ILMN_3165415" "ILMN_3165426" "ILMN_3165438" "ILMN_3165457"
[36] "ILMN_3165474" "ILMN_3165484" "ILMN_3165533" "ILMN_3165547" "ILMN_3165565"
[41] "ILMN_3165590" "ILMN_3165604" "ILMN_3165619" "ILMN_3165638" "ILMN_3165650"
[46] "ILMN_3165668" "ILMN_3165687" "ILMN_3165699" "ILMN_3165727" "ILMN_3165745"
[51] "ILMN_3165757" "ILMN_3165768" "ILMN_3165829" "ILMN_3165877" "ILMN_3165896"
[56] "ILMN_3165903" "ILMN_3165920" "ILMN_3165933" "ILMN_3165993" "ILMN_3166057"
[61] "ILMN_3166075" "ILMN_3166098" "ILMN_3166114" "ILMN_3166132" "ILMN_3166177"
[66] "ILMN_3166194" "ILMN_3166223" "ILMN_3166238" "ILMN_3166255" "ILMN_3166311"
[71] "ILMN_3166325" "ILMN_3166368" "ILMN_3166404" "ILMN_3166414" "ILMN_3166430"
[76] "ILMN_3166475" "ILMN_3166491" "ILMN_3166504" "ILMN_3166519" "ILMN_3166551"
[81] "ILMN_3166569" "ILMN_3166578" "ILMN_3166596" "ILMN_3166630" "ILMN_3166640"
[86] "ILMN_3166655" "ILMN_3166673" "ILMN_3166687" "ILMN_3166703" "ILMN_3166721"
[91] "ILMN_3166728" "ILMN_3166775" "ILMN_3166789" "ILMN_3166804" "ILMN_1343295"
[96] "ILMN_2038772" "ILMN_2038775" "ILMN_2038776" "ILMN_2038773"
> setdiff(ILMN_BSData_Gene,ILMN_GSData_Gene)
[1] "ILMN_1657147" "ILMN_3246658" "ILMN_3247816"
Is this of any use?
<<
Could you give a bit more detail on how the GenomeStudio data were
exported? i.e with/without normalisation
>>
without normalization and without the second background correction. I will check with my collaborator on any other details and send tomorrow.
Many thanks, Ina
On Tue, Apr 12, 2011 at 3:22 PM, Ina Hoeschele <inah at vbi.vt.edu> wrote:
> Hi Mark and Wei,
>
> thank you very much for your suggestions.
>
> For all of my 8 BSData objects the first dimension is 48,107 probes (47,224 gene probes, 883 control probes). The corresponding dataset produced by GenomeStudio contains 47,320 gene probes and 886 control probes, so I seem to have 96 fewer gene probes and 3 control probes less ... I do not know why there is this difference, but these numbers do not look like anything is really messed up.
>
> I would not be so worried about the discrepancy in values, but since the correlations among (control) samples (on different chips) are so much worse for Bioconductor compared to GenomeStudio (.91-.92 versus .98-.99), something must be going wrong somewhere.
>
> Related to this, for each sample run on a bead chip, there may be some bead types that failed. For all samples that are combined in a 'project' in GenomeStudio, bead types that have failed in any of these samples are excluded from the summarized data (unless one checks the impute option). I wonder how this is being handled in the summarization in beadarray. Since beadarray deals with a single chip at a time, a project in beadarrary would be a single chip. So if beadarray also excludes failed bead types, then different BSData objects (each representing a single chip) may have different bead types represented. I need to check whether this might have messed up my correlations between control samples from different chips (?) But for my first batch of 8 chips, all BSData objects have the same 1st dimension, which is a bit smaller than the number of summarized probes from GenomeStudio.
>
> Ina
>
>
>
> ----- Original Message -----
> From: "Mark Dunning" <mark.dunning at gmail.com>
> To: "Ina Hoeschele" <inah at vbi.vt.edu>
> Cc: bioconductor at stat.math.ethz.ch
> Sent: Thursday, April 7, 2011 5:33:09 AM
> Subject: Re: summarized expression values from beadarray versus GenomeStudio
>
> Hi Ina,
>
> Nothing seems to be wrong with your approach and it should re-create
> the BeadStudio intensities. We tried it out on some of our own data
> and managed to get very close to the BeadStudio values.
>
> Do the number of observations reported by beadarray and GenomeStudio
> agree? What are the dimensions of your BSData object and are they what
> you are expecting? It could be that summarize is incorrectly trying to
> combine data from multiple strips.
>
> Best,
>
> Mark
>
>
>
> On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at vbi.vt.edu> wrote:
>> Hi Mark et al.,
>> I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem.
>>
>> I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code
>>
>> myMean = function(x) mean(x, na.rm = TRUE)
>> mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x))
>> GreenChannelTransform <- function (BLData, array)
>> {
>> x = getBeadData(BLData, array = array, what = "Grn")
>> return(x)
>> }
>> greenChannel = new("illuminaChannel",GreenChannelTransform,illuminaOutlierMethod,myMean,mySe,"G")
>>
>> for (iChip in 1:nChips)
>> {
>> setwd(Chip.Dir[iChip])
>> BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4")
>> BSData <- summarize(BLData,list(greenChannel),useSampleFac=TRUE,sampleFac=NULL,removeUnMappedProbes=TRUE)
>> save(BSData,file="BSData.rda")
>> rm(BLData); rm(BSData); gc()
>> }
>>
>>
>> If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio?
>>
>> I checked the summarized value for one beadtype on the first several sections of chip 1.
>> The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39
>> The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0
>> (I also calculated the first summary value by hand and come up with 103.36!)
>>
>> Why are these values different, any hint?
>>
>> Many thanks as always, Ina
>>
>
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list