[BioC] Annotation.db: how automatically call a mapping?

Tue Jun 30 18:42:47 CEST 2009

Hooiveld, Guido wrote:
> Hi Martin,
> 
> Indeed, another useful, straigh-forward possibility for mapping. 
> However, I am now facing the problem of properly combining the
> annotation info with the expression data. This is what I would like to
> do:
> 
>> Tab_data <- exprs(eset[probeids])
>> Tab_data <- cbind(Tab_data, fit2$Amean) # to add average expression of
> LIMMA output
>> Tab_data <- cbind(Tab_data, fit2$p.value) # to add p-value of LIMMA
> output
> etc.
> 
> This al goes fine, however adding the annotation info 'mixes-up' the
> content of Tab_data; the annotation data replaces the first column of
> Tab_data, and the content of all cells is replaced by 'null'. I suspect
> it has something to do with the type of object I would like to merge,
> but I am not sure.
> 
>> map.entrez <- getAnnMap("ENTREZID", annotation(eset))
>> map.entrez <- as.list(map.entrez[probeids])
> 
> 
>> Tab_data <- cbind(Tab_data, map.entrez)

this cbind's a matrix and a list; check that the mapping between probeid
and entrez id is strictly 1:1, convert to a named vector, and use the
names to coordinately subset & replace

  library(annotate)
  data(sample.ExpressionSet)
  obj <- sample.ExpressionSet             # save typing ;)
  map <- getAnnMap('ENTREZID', annotation(obj))

  submap <- map[featureNames(obj)]
  elts <- as.list(submap)
  stopifnot(all(sapply(elts, length)) == 1)

  tabdat <- as.data.frame(exprs(obj)) # conceptually no longer a matrix
  tabdat[names(elts), "ENTREZID"] <- unlist(elts, use.names=FALSE)

if the objective were other than to export data from R, and the data
'SomeData'  something experiment specific (like the p.values from limma)
I'd suggest something along the lines of

  featureData(obj)[["SomeData", labelDescription="describe SomeData"]]
     <- SomeData

to add the data to obj, and to carry it forward in a coordinated fashion
for subsequent analysis, e.g., eventually

  forOutput <- cbind(exprs(obj), fData(obj))

(the syntax for simultaneously creating and assigning a _subset_ of
featureData is a little convoluted, featureData(obj)[["...",
labelD...]][indexToCreate] <- values ).

In this case also one wants to make sure the data is appropriately
formatted for standard R operations, e.g., cbinding a matrix / data
frame with a vector, rather than a list.

Martin

>   ^ in R this seems to work, but when saved as .txt the content of
> Tab_data is completely mixed up. Before 'adding' map.entrez Tab_dat is
> OK.
> 
> 
>> write.table(cbind(rownames(Tab_data2), Tab_data2),
> file="test_1234.txt", sep="\t", col.names=TRUE, row.names=FALSE)
> 
>> class(Tab_data)
> [1] "matrix"
>> class(map.entrez)
> [1] "list"
> 
> 
> Do you, or someone elsr, have a suggestion how to properly link these
> two types of data?
> Thanks again,
> Guido
> 
> 
> 
>  
> 
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch 
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
>> Martin Morgan
>> Sent: 30 June 2009 00:00
>> To: Hooiveld, Guido
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] Annotation.db: how automatically call a mapping?
>>
>> Hooiveld, Guido wrote:
>>> Hi,
>>>  
>>> I am facing a problem i cannot solve myselves, despite everything i 
>>> read/know. But i assume the solution is easy for the more 
>> knowledgable 
>>> folks in BioC/R...
>>>  
>>> This does work:
>>>> library(moe430a.db)
>>>> xxyy <- moe430aSYMBOL
>>>> xxyy
>>> SYMBOL map for chip moe430a (object of class "AnnDbBimap")
>>>  
>>> However, for this to work you need to know the array type 
>> of the data 
>>> that is analyzed.
>>>  
>>>  
>>> Now i would like to automatically extract the (e.g.) SYMBOL mapping 
>>> from an annotation.db, thus by retrieving the array type 
>> from the eset.
>>>  
>>>> library(affy)
>>>> eset <- rma(data)
>>>> probeids <- featureNames(eset)
>>>> annotation(eset)
>>> [1] "moe430a"
>>>  
>>> But how can i use this info to properly call the SYMBOL mapping?
>> Hi Guido --
>>
>> to get the appropriate map
>>
>>   library(annotate)
>>   map = getAnnMap("SYMBOL", annotation(eset))
>>
>> to select just the relevant probes
>>
>>   map[probeids]
>>
>> toTable(map[probeids]) or as.list(map[probeids]) might be the 
>> next step in the work flow.
>>
>> Martin
>>
>>>  
>>> I tried this:
>>>> arraytype <- annotation(eset)
>>>> arraytype <- paste(arraytype, "db", sep = ".") arraytype
>>> [1] "moe430a.db"
>>>> arraytype <- paste("package", arraytype, sep = ":") gh <- 
>>>> ls(arraytype) gh
>>>  [1] "moe430a"              "moe430a_dbconn"       "moe430a_dbfile"
>>> "moe430a_dbInfo"       "moe430a_dbschema"     "moe430aACCNUM"
>>> "moe430aALIAS2PROBE"   "moe430aCHR"           "moe430aCHRLENGTHS"
>>> "moe430aCHRLOC"       
>>> [11] "moe430aCHRLOCEND"     "moe430aENSEMBL"
>>> "moe430aENSEMBL2PROBE" "moe430aENTREZID"      "moe430aENZYME"
>>> "moe430aENZYME2PROBE"  "moe430aGENENAME"      "moe430aGO"
>>> "moe430aGO2ALLPROBES"  "moe430aGO2PROBE"     
>>> [21] "moe430aMAP"           "moe430aMAPCOUNTS"     "moe430aMGI"
>>> "moe430aMGI2PROBE"     "moe430aORGANISM"      "moe430aPATH"
>>> "moe430aPATH2PROBE"    "moe430aPFAM"          "moe430aPMID"
>>> "moe430aPMID2PROBE"   
>>> [31] "moe430aPROSITE"       "moe430aREFSEQ"        "moe430aSYMBOL"
>>> "moe430aUNIGENE"       "moe430aUNIPROT"
>>>  
>>>> gh[33]
>>> [1] "moe430aSYMBOL"
>>>> symbols <- mget(probeids, gh[33])
>>> Error in mget(probeids, gh[33]) : second argument must be an 
>>> environment
>>>  
>>> This also doesn't work:
>>>> symbols <- mget(probeids, envir=gh[33])
>>> Error in mget(probeids, envir = gh[33]) : 
>>>   second argument must be an environment
>>>  
>>> My approach thus is the wrong approach to automatically extract 
>>> mappings from a annotation.db.
>>> Since i don't know about any other possibility, i would 
>> appreciate if 
>>> someone could point me to a working solution.
>>>  
>>> Thanks,
>>> Guido
>>>  
>>>
>>> ------------------------------------------------
>>> Guido Hooiveld, PhD
>>> Nutrition, Metabolism & Genomics Group Division of Human Nutrition 
>>> Wageningen University Biotechnion, Bomenweg 2
>>> NL-6703 HD Wageningen
>>> the Netherlands
>>> tel: (+)31 317 485788
>>> fax: (+)31 317 483342 
>>> internet:   http://nutrigene.4t.com <http://nutrigene.4t.com/>  
>>> email:      guido.hooiveld at wur.nl 
>>>
>>>
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>