[BioC] GO's to gene's

Martin Morgan mtmorgan at fhcrc.org
Mon Mar 1 04:30:34 CET 2010


On 02/28/2010 07:17 PM, Loren Engrav wrote:
> Thank you both
> Given my skills, it might be easier/quicker to do it "manually" with Amigo
> But I am trying both methods
> 
> For the second method I get
> 
>> library(GO.db)
> Loading required package: AnnotationDbi
> Loading required package: Biobase
> 
> Welcome to Bioconductor
> 
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
> 
> Loading required package: DBI
>> terms <- Term(GOTERM)
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "Term", for signature
> "GOTermsAnnDbBimap"
> 
>> sessionInfo()
> R version 2.9.2 Patched (2009-09-05 r49613)
> i386-apple-darwin9.8.0
> 
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
,
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base

Update to R version 2.10 and associated Bioc packages, or for a (much)
slower solution (you'll want to check that Term and Ontology return ids
in identical order)

  terms = eapply(GOTERM, Term)

etc. I have

> sessionInfo()
R version 2.10.1 Patched (2010-02-23 r51168)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GO.db_2.3.5         RSQLite_0.7-3       DBI_0.2-4
[4] AnnotationDbi_1.8.1 Biobase_2.6.1

loaded via a namespace (and not attached):
[1] tools_2.10.1


Martin

> 
>> From: Martin Morgan <mtmorgan at fhcrc.org>
>> Date: Sun, 28 Feb 2010 18:42:33 -0800
>> To: Vincent Carey <stvjc at channing.harvard.edu>
>> Cc: Loren Engrav <engrav at u.washington.edu>, "bioconductor at stat.math.ethz.ch"
>> <bioconductor at stat.math.ethz.ch>
>> Subject: Re: [BioC] GO's to gene's
>>
>> On 02/28/2010 06:14 PM, Vincent Carey wrote:
>>> Perhaps there is a package with such functionality.  However, with the
>>> GO.db package in place, you need to do a little
>>> programming, perhaps along the lines of
>>>
>>> querGO = function(str, attr = "definition", ont = "MF") {
>>>   require(GO.db, quietly = TRUE)
>>>   gc = GO_dbconn()
>>>   quer.1 = paste("select go_id, term from go_term where",
>>>   attr, "like('%")
>>>   quer.2 = "%') and ontology = '"
>>>   quer.3 = "'"
>>>   quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "",
>>>   sep = "")
>>>   dbGetQuery(gc, quer)
>>> }
>>>
>>> whereby
>>>
>>>> querGO("collagen", "term")
>>>        go_id                                                           term
>>> 1 GO:0004656                     procollagen-proline 4-dioxygenase activity
>>> 2 GO:0005518                                               collagen binding
>>> 3 GO:0008475                      procollagen-lysine 5-dioxygenase activity
>>> 4 GO:0019797                     procollagen-proline 3-dioxygenase activity
>>> 5 GO:0019798                       procollagen-proline dioxygenase activity
>>> 6 GO:0033823                       procollagen glucosyltransferase activity
>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based cuticle
>>> 8 GO:0050211                     procollagen galactosyltransferase activity
>>> 9 GO:0070052                                             collagen V binding
>>>>
>>
>> Also
>>
>>   library(GO.db)
>>   terms <- Term(GOTERM)  # or maybe Definition(GOTERM) ?
>>   ontologies <- Ontology(GOTERM)
>>   collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)]
>>
>> and the next step,
>>
>>   library(org.Hs.eg.db)
>>   egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA)
>>   egids <- egids[!is.na(egids)]
>>
>>
>>>
>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at u.washington.edu>
>>> wrote:
>>>> Is there a BioC package that will find all the GO terms containing some
>>>> word, like perhaps ³collagen²
>>>> And then find all the genes contained within those found terms
>>>>
>>>> I scanned
>>>> GoProfiles
>>>> GOSemSim
>>>> GOstats
>>>> GoTools and
>>>> TopGO
>>>>
>>>> And could not determine that any would do that.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> -- 
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list