[BioC] Genes annotated to GO:0031281 using org.Hs.eg.db

James W. MacDonald jmacdon at uw.edu
Mon Mar 24 17:01:06 CET 2014


Hi Tim,

Different approach.

gos <- summary(hgOverGO)[,1]
gIU <- geneIdUniverse(hgOverGO)[gos]
gns <- geneIds(hgOverGO)
golst <- lapply(gIU, function(x) x[x %in% gns])

And as a test

 > univ <- keys(org.Hs.eg.db)
 > set.seed(0xabeef)
 > gns <- sample(300, univ)
 > p <- new(:GOHyperGParams, geneIds = gns, universeGeneIds = univ, 
ontology = "BP", annotation = "org.Hs.eg.db")
 > hyp <- hyperGTest(p)
 > gos <- summary(hyp)[,1]
 > gIU <- geneIdUniverse(hyp)[gos]
 > golst <- lapply(gIU, function(x) x[x %in% gns])
 > all.equal(summary(hyp)[,"Count"], sapply(golst, length), 
check.attributes = FALSE)
[1] TRUE

Best,

Jim



On 3/21/2014 8:12 PM, Tim Smith wrote:
> Hi James,
>
> Thanks for that. However, the number of genes associated with the GO 
> term with your code example, doesn't match the number of genes 
> associated with that GO term with summary(hgOverGO):
>
> For example, my hgOverGO gives 43 genes annotated to the node:
>
> GOBPID Pvalue OddsRatio ExpCount Count Size Term
> GO:0007268 2.65828422773352e-07 2.64555618294749 19.0653745072273 43 
> 365 synaptic transmission
>
> While if I use your example code, it results in 53 genes. There are 
> many such mismatches. Wouldn't it be great if GOstats gives the option 
> of accessing the actual genes that it found annotated to a GO term?
>
> thanks !!
>
>
>
>
>
>
> On Thursday, March 20, 2014 10:38 AM, James W. MacDonald 
> <jmacdon at uw.edu> wrote:
> Hi Tim,
>
> Note that you _always_ use a set of genes for hyperGTest. Even if you
> started with an array, you can only count a significantly differentially
> expressed gene once.
>
> The difference here is that you used the org.Hs.eg.db package as the
> annotation rather than some array annotation package, so
> probeSetSummary() won't work, as it is trying to match things back to
> probe IDs.
>
> So you can do something like this:
>
> gn2go <- select(org.Hs.eg.db, geneIds(hgOverGO), "GOALL")
> gos <- summary(hgOverGO)[,1]
> golst <- lapply(gos, function(x) gn2go[gn2go[,2] %in% x,])
>
> Best,
>
> Jim
>
>
>
> On 3/19/2014 7:03 PM, Tim Smith wrote:
> > Hi James,
> >
> > Hmmm....I seem to get an error if I use probeSetSummary. As I
> > mentioned before I run the hyperGTest using a set of genes (and not a
> > set of probes).
> >
> > Here is what I get:
> >
> > hgOverGO <- hyperGTest(paramsGO)
> >
> > > hgOverGO
> > Gene to GO BP Conditional test for over-representation
> > 3843 GO BP ids tested (220 have p < 0.05)
> > Selected gene set size: 318
> >    Gene universe size: 6088
> >    Annotation package: org.Hs.eg
> >
> > > class(hgOverGO)
> > [1] "GOHyperGResult"
> > attr(,"package")
> > [1] "GOstats"
> >
> > > head(summary(hgOverGO))
> >      GOBPID Pvalue OddsRatio  ExpCount Count Size               Term
> > 1 GO:0007268 2.658284e-07  2.645556 19.0653745 43  365
> >  synaptic transmission
> > 2 GO:0001975 1.299252e-06 21.622722  0.6790407 7  13    response
> > to amphetamine
> > 3 GO:0035637 2.726576e-06  2.320191 22.8784494 46  438
> > multicellular organismal signaling
> > 4 GO:0072511 1.195032e-05  3.566708  6.2680683 19  120
> >  divalent inorganic cation transport
> > 5 GO:0018958 1.788607e-05  9.280645  1.2536137 8  24
> > phenol-containing compound metabolic process
> > 6 GO:0042886 2.537415e-04  3.066360  5.9546649 16  114
> >  amide transport
> >
> >
> > > ps <- probeSetSummary(hgOverGO,pvalue=0.05)
> > Error in (function (classes, fdef, mtable)  :
> >  unable to find an inherited method for function ‘columns’ for
> > signature ‘"function"’
> >
> > > traceback()
> > 7: stop(gettextf("unable to find an inherited method for function %s
> > for signature %s",
> >  sQuote(fdef at generic <mailto:fdef at generic>), sQuote(cnames)), domain 
> = NA)
> > 6: (function (classes, fdef, mtable)
> >    {
> >        methods <- .findInheritedMethods(classes, fdef, mtable)
> >        if (length(methods) == 1L)
> >  return(methods[[1L]])
> >        else if (length(methods) == 0L) {
> >            cnames <- paste0("\"", sapply(classes, as.character),
> >                "\"", collapse = ", ")
> >  stop(gettextf("unable to find an inherited method for function %s for
> > signature %s",
> >  sQuote(fdef at generic <mailto:fdef at generic>), sQuote(cnames)), domain 
> = NA)
> >        }
> >        else stop("Internal error in finding inherited methods; didn't
> > return a unique method",
> >            domain = NA)
> >  })(list("function"), function (x)
> >  standardGeneric("columns"), <environment>)
> > 5: columns(db)
> > 4: match(x, table, nomatch = 0L)
> > 3: map %in% columns(db)
> > 2: getAnnMap(ids, annotation(result))
> > 1: probeSetSummary(hgOverGO, pvalue = 0.05)
> >
> >
> > Am I calling the probeSetSummary function incorrectly?
> >
> >
> > thanks!!
> >
> >
> > On Wednesday, March 19, 2014 5:54 PM, Tim Smith
> > <tim_smith_666 at yahoo.com <mailto:tim_smith_666 at yahoo.com>> wrote:
> > Hi James,
> >
> > Thanks for the reply! I had 371 genes ( gene universe ~ 7k genes) for
> > which I was checking enrichment and I got this term as one of the
> > significant terms. The details are:
> >
> >                GOBPID      Pvalue OddsRatio ExpCount Count Size
> >            Term
> > GO:0031281 0.021301601 10.113271  0.22913929    2 15
> >            positive regulation of cyclase activity
> >
> > I used GOstats for this analysis. If Count = 2, then shouldn't there
> > be two genes that are directly annotated to this term?
> >
> >
> >
> >
> >
> > On Wednesday, March 19, 2014 10:17 AM, James W. MacDonald
> > <jmacdon at uw.edu <mailto:jmacdon at uw.edu> <mailto:jmacdon at uw.edu 
> <mailto:jmacdon at uw.edu>>> wrote:
> >
> > Hi Tim,
> >
> > There may not be any IDs mapped to that term directly. You can use
> > GO2ALLEGS, which maps all direct and child terms to Entrez Gene IDs.
> >
> > > get("GO:0031281", org.Hs.egGO2ALLEGS)
> >      IDA    IDA    TAS    TAS    IDA    TAS IEA TAS TAS
> > TAS
> >    "49"  "116"  "116"  "117"  "135"  "135" "136" "136" "140"
> > "153"
> >      IDA    IDA    TAS    IDA    IEA    IDA IDA TAS IDA
> > IEA
> >    "154"  "155"  "554"  "796"  "796"  "799" "1394" "1394" "1812"
> > "1812"
> >      IDA    IEA    NAS    TAS    TAS    IEA TAS TAS IEA
> > TAS
> >  "1816"  "1816"  "1816"  "1816"  "1909"  "2692" "2696"  "2740" "2774"
> > "2778"
> >      IDA    ISS    ISS    IBA    IBA    IMP ISS TAS TAS
> > TAS
> >  "2852"  "3973"  "4763"  "4842"  "4843"  "4846" "4846"  "4914" "4915"
> > "5032"
> >      ISS    NAS    IEA    IDA    IEA    TAS    TAS
> >  "5578"  "5894"  "7077"  "7432"  "7434" "10486" "10487"
> >
> > Or the more powerful select() method:
> >
> > > select(org.Hs.eg.db, "GO:0031281", c("ENTREZID", "SYMBOL"), "GOALL")
> >          GOALL EVIDENCEALL ONTOLOGYALL ENTREZID SYMBOL
> > 1  GO:0031281        IDA          BP      49 ACR
> > 2  GO:0031281        IDA          BP      116 ADCYAP1
> > 3  GO:0031281        TAS          BP      116 ADCYAP1
> > 4  GO:0031281        TAS          BP      117 ADCYAP1R1
> > 5  GO:0031281        IDA          BP      135 ADORA2A
> > 6  GO:0031281        TAS          BP      135 ADORA2A
> > 7  GO:0031281        IEA          BP      136 ADORA2B
> > 8  GO:0031281        TAS          BP      136 ADORA2B
> > 9  GO:0031281        TAS          BP      140 ADORA3
> > 10 GO:0031281        TAS          BP      153 ADRB1
> > 11 GO:0031281        IDA          BP      154 ADRB2
> > 12 GO:0031281        IDA          BP      155 ADRB3
> > 13 GO:0031281        TAS          BP      554 AVPR2
> > 14 GO:0031281        IDA          BP      796 CALCA
> > 15 GO:0031281        IEA          BP      796 CALCA
> > 16 GO:0031281        IDA          BP      799 CALCR
> > 17 GO:0031281        IDA          BP    1394 CRHR1
> > 18 GO:0031281        TAS          BP    1394 CRHR1
> > 19 GO:0031281        IDA          BP    1812 DRD1
> > 20 GO:0031281        IEA          BP    1812 DRD1
> > 21 GO:0031281        IDA          BP    1816 DRD5
> > 22 GO:0031281        IEA          BP    1816 DRD5
> > 23 GO:0031281        NAS          BP    1816 DRD5
> > 24 GO:0031281        TAS          BP    1816 DRD5
> > 25 GO:0031281        TAS          BP    1909 EDNRA
> > 26 GO:0031281        IEA          BP    2692 GHRHR
> > 27 GO:0031281        TAS          BP    2696 GIPR
> > 28 GO:0031281        TAS          BP    2740 GLP1R
> > 29 GO:0031281        IEA          BP    2774 GNAL
> > 30 GO:0031281        TAS          BP    2778 GNAS
> > 31 GO:0031281        IDA          BP    2852 GPER1
> > 32 GO:0031281        ISS          BP    3973 LHCGR
> > 33 GO:0031281        ISS          BP    4763 NF1
> > 34 GO:0031281        IBA          BP    4842 NOS1
> > 35 GO:0031281        IBA          BP    4843 NOS2
> > 36 GO:0031281        IMP          BP    4846 NOS3
> > 37 GO:0031281        ISS          BP    4846 NOS3
> > 38 GO:0031281        TAS          BP    4914 NTRK1
> > 39 GO:0031281        TAS          BP    4915 NTRK2
> > 40 GO:0031281        TAS          BP    5032 P2RY11
> > 41 GO:0031281        ISS          BP    5578 PRKCA
> > 42 GO:0031281        NAS          BP    5894 RAF1
> > 43 GO:0031281        IEA          BP    7077 TIMP2
> > 44 GO:0031281        IDA          BP    7432 VIP
> > 45 GO:0031281        IEA          BP    7434 VIPR2
> > 46 GO:0031281        TAS          BP    10486 CAP2
> > 47 GO:0031281        TAS          BP    10487 CAP1
> > Warning message:
> > In .generateExtraRows(tab, keys, jointype) :
> >    'select' resulted in 1:many mapping between keys and return rows
> >
> > Best,
> >
> > Jim
> >
> >
> >
> > On 3/19/2014 4:28 AM, Tim Smith wrote:
> > > Hi,
> > >
> > > I was trying to get the genes annotated to the GO term "GO:0031281".
> > My code:
> > >
> > > library(org.Hs.eg.db)
> > >
> > > genes <- get("GO:0031281", org.Hs.egGO2EG)
> > >
> > > When I run the code, I get:
> > >
> > >> genes <- get("GO:0031281", org.Hs.egGO2EG)
> > > Error in .checkKeys(value, Rkeys(x), x at ifnotfound 
> <mailto:x at ifnotfound>
> > <mailto:x at ifnotfound <mailto:x at ifnotfound>>) :
> > >    value for "GO:0031281" not found
> > >
> > > If I check in AMIGO for this GO term, it seems to have many gene
> > products for Homo sapiens. Am I doing something wrong? Is there an
> > alternate package that I can try, just to double check the results?
> > >
> > > thanks!
> > >    [[alternative HTML version deleted]]
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org> 
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
> >    [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org> 
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list