[BioC] Integrating Codelink data with bioconductor (using affyand limmafunctions)

Mon Apr 25 15:15:22 CEST 2005

Gordon Smyth escribió:
> At 09:35 PM 25/04/2005, Diego Díez Ruiz wrote:
> 
>> Dear Gordon,
>>
>> Thanks for your response. I will use the data as early but, What do 
>> you think it could affect more to normalization process: Some points 
>> assigned as NA values or some point with lowers A values as one of the 
>> intensitues was assigned a value of say 0.01?
> 
> 
> Unless you're doing much more than I think you are, you must avoid NAs 
> at all costs. If you have to live with low intensities, then so be it.

then so be it.

> 
>> I'd let you see my class definition and parser of course. This is 
>> really the first time a make use of classes and store all things as an 
>> R package so I thought that the best way to make something usable and 
>> quick without having to read completly "writting R extensions" was 
>> using others packages to learn (that is one of the greatness of 
>> opensource :). Of course I will have to read it one day.
>> Briefly:
>> 1. The parser read exported txt files from codelink software.
> 
> 
> I've never seen Codelink output, but my understanding is that it is 
> essentially just ImaGene output. Is that not correct?

I've never seen Imagene output. This is header and column names from 
codelink output:

CodeLink Expression Analysis 4.1.0.29054
CNIC Report for Slide (T00241792)
LAYOUT  EXP294X192-912.22.ID
PROJECT
EXPERIMENT
PRODUCT Human Whole Genome
Sample Name Array 1 Sample001
Median Array 1  86,6547470092773
Report( 1 ): 310105-Person
--------------------------------------------------------------------------------
Idx Array   Sample_name Probe_name  Annotation_PIN  Annotation_NCBI_Acc 
Annotation_NCBI_NID Annotation_LocusLink    Annotation_OGS 
Annotation_UniGene  Annotation_ENSEMBL  Probe_type  Feature_id 
Raw_intensity   Normalized_intensity    Quality_flag    Signal_strength 
Logical_row Logical_col Center_X    Center_Y    Spot_mean   Spot_median 
Spot_stdev  Spot_area   Spot_diameter   Spot_noise_level    Bkgd_mean 
Bkgd_median Bkgd_stdev  Bkgd_area   Annotation_Molecular_Function
Annotation_Biological_Process   Annotation_Cellular_Component 
Annotation_Cytoband Annotation_HS_Homology  Annotation_MM_Homology 
Annotation_RN_Homology  Annotation_Analogous_CodeLink 
Annotation_Legacy_Probe_Name    Description

Header could be less than 10 rows (custom) and columns could be 
customized (for example in my own data I avoid Annotation_* and 
Description columns). I'm not sure if in this example there are all the 
possible fields.

D

> 
> Gordon
> 
>>  It works fine with 3 different chips so I think it should work fine 
>> with all types. A problem is that exported text data have custom 
>> fields (and you can chose within all fields including Raw_intensity, 
>> Median_foreground, etc) So it could be possible to found files with 
>> missing fields not exported. I know that it is possible to export as 
>> XML but a didn't try that yet.
>>
>> 2. The class definition is very simple. I based it in RGlist and used 
>> almost all redefinitions of dim() as.matrix() etc... that you use in 
>> limma. I also based a subsetting system in the one used in AffyBatch 
>> objects in affy. A Codelink object stores as a list 3 matrices. One of 
>> intensities, one of Flags and one last with probe name and probe type. 
>> I actually named it "val" "flags" and "info" slots but i don't thing 
>> they are appropiate so this week I want to import all possible fields 
>> and name it as they are called in the exported files. I probably too 
>> make comprobation about the fields present and warn or error if a 
>> *must have* field is missing.
>>
>> When I have a more clear and clean code I will not have any problems 
>> in let you see it.
>>
>> D
> 
>