[BioC] Genes annotated to GO:0031281 using org.Hs.eg.db
James W. MacDonald
jmacdon at uw.edu
Mon Mar 24 17:01:06 CET 2014
Hi Tim,
Different approach.
gos <- summary(hgOverGO)[,1]
gIU <- geneIdUniverse(hgOverGO)[gos]
gns <- geneIds(hgOverGO)
golst <- lapply(gIU, function(x) x[x %in% gns])
And as a test
> univ <- keys(org.Hs.eg.db)
> set.seed(0xabeef)
> gns <- sample(300, univ)
> p <- new(:GOHyperGParams, geneIds = gns, universeGeneIds = univ,
ontology = "BP", annotation = "org.Hs.eg.db")
> hyp <- hyperGTest(p)
> gos <- summary(hyp)[,1]
> gIU <- geneIdUniverse(hyp)[gos]
> golst <- lapply(gIU, function(x) x[x %in% gns])
> all.equal(summary(hyp)[,"Count"], sapply(golst, length),
check.attributes = FALSE)
[1] TRUE
Best,
Jim
On 3/21/2014 8:12 PM, Tim Smith wrote:
> Hi James,
>
> Thanks for that. However, the number of genes associated with the GO
> term with your code example, doesn't match the number of genes
> associated with that GO term with summary(hgOverGO):
>
> For example, my hgOverGO gives 43 genes annotated to the node:
>
> GOBPID Pvalue OddsRatio ExpCount Count Size Term
> GO:0007268 2.65828422773352e-07 2.64555618294749 19.0653745072273 43
> 365 synaptic transmission
>
> While if I use your example code, it results in 53 genes. There are
> many such mismatches. Wouldn't it be great if GOstats gives the option
> of accessing the actual genes that it found annotated to a GO term?
>
> thanks !!
>
>
>
>
>
>
> On Thursday, March 20, 2014 10:38 AM, James W. MacDonald
> <jmacdon at uw.edu> wrote:
> Hi Tim,
>
> Note that you _always_ use a set of genes for hyperGTest. Even if you
> started with an array, you can only count a significantly differentially
> expressed gene once.
>
> The difference here is that you used the org.Hs.eg.db package as the
> annotation rather than some array annotation package, so
> probeSetSummary() won't work, as it is trying to match things back to
> probe IDs.
>
> So you can do something like this:
>
> gn2go <- select(org.Hs.eg.db, geneIds(hgOverGO), "GOALL")
> gos <- summary(hgOverGO)[,1]
> golst <- lapply(gos, function(x) gn2go[gn2go[,2] %in% x,])
>
> Best,
>
> Jim
>
>
>
> On 3/19/2014 7:03 PM, Tim Smith wrote:
> > Hi James,
> >
> > Hmmm....I seem to get an error if I use probeSetSummary. As I
> > mentioned before I run the hyperGTest using a set of genes (and not a
> > set of probes).
> >
> > Here is what I get:
> >
> > hgOverGO <- hyperGTest(paramsGO)
> >
> > > hgOverGO
> > Gene to GO BP Conditional test for over-representation
> > 3843 GO BP ids tested (220 have p < 0.05)
> > Selected gene set size: 318
> > Gene universe size: 6088
> > Annotation package: org.Hs.eg
> >
> > > class(hgOverGO)
> > [1] "GOHyperGResult"
> > attr(,"package")
> > [1] "GOstats"
> >
> > > head(summary(hgOverGO))
> > GOBPID Pvalue OddsRatio ExpCount Count Size Term
> > 1 GO:0007268 2.658284e-07 2.645556 19.0653745 43 365
> > synaptic transmission
> > 2 GO:0001975 1.299252e-06 21.622722 0.6790407 7 13 response
> > to amphetamine
> > 3 GO:0035637 2.726576e-06 2.320191 22.8784494 46 438
> > multicellular organismal signaling
> > 4 GO:0072511 1.195032e-05 3.566708 6.2680683 19 120
> > divalent inorganic cation transport
> > 5 GO:0018958 1.788607e-05 9.280645 1.2536137 8 24
> > phenol-containing compound metabolic process
> > 6 GO:0042886 2.537415e-04 3.066360 5.9546649 16 114
> > amide transport
> >
> >
> > > ps <- probeSetSummary(hgOverGO,pvalue=0.05)
> > Error in (function (classes, fdef, mtable) :
> > unable to find an inherited method for function ‘columns’ for
> > signature ‘"function"’
> >
> > > traceback()
> > 7: stop(gettextf("unable to find an inherited method for function %s
> > for signature %s",
> > sQuote(fdef at generic <mailto:fdef at generic>), sQuote(cnames)), domain
> = NA)
> > 6: (function (classes, fdef, mtable)
> > {
> > methods <- .findInheritedMethods(classes, fdef, mtable)
> > if (length(methods) == 1L)
> > return(methods[[1L]])
> > else if (length(methods) == 0L) {
> > cnames <- paste0("\"", sapply(classes, as.character),
> > "\"", collapse = ", ")
> > stop(gettextf("unable to find an inherited method for function %s for
> > signature %s",
> > sQuote(fdef at generic <mailto:fdef at generic>), sQuote(cnames)), domain
> = NA)
> > }
> > else stop("Internal error in finding inherited methods; didn't
> > return a unique method",
> > domain = NA)
> > })(list("function"), function (x)
> > standardGeneric("columns"), <environment>)
> > 5: columns(db)
> > 4: match(x, table, nomatch = 0L)
> > 3: map %in% columns(db)
> > 2: getAnnMap(ids, annotation(result))
> > 1: probeSetSummary(hgOverGO, pvalue = 0.05)
> >
> >
> > Am I calling the probeSetSummary function incorrectly?
> >
> >
> > thanks!!
> >
> >
> > On Wednesday, March 19, 2014 5:54 PM, Tim Smith
> > <tim_smith_666 at yahoo.com <mailto:tim_smith_666 at yahoo.com>> wrote:
> > Hi James,
> >
> > Thanks for the reply! I had 371 genes ( gene universe ~ 7k genes) for
> > which I was checking enrichment and I got this term as one of the
> > significant terms. The details are:
> >
> > GOBPID Pvalue OddsRatio ExpCount Count Size
> > Term
> > GO:0031281 0.021301601 10.113271 0.22913929 2 15
> > positive regulation of cyclase activity
> >
> > I used GOstats for this analysis. If Count = 2, then shouldn't there
> > be two genes that are directly annotated to this term?
> >
> >
> >
> >
> >
> > On Wednesday, March 19, 2014 10:17 AM, James W. MacDonald
> > <jmacdon at uw.edu <mailto:jmacdon at uw.edu> <mailto:jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>>> wrote:
> >
> > Hi Tim,
> >
> > There may not be any IDs mapped to that term directly. You can use
> > GO2ALLEGS, which maps all direct and child terms to Entrez Gene IDs.
> >
> > > get("GO:0031281", org.Hs.egGO2ALLEGS)
> > IDA IDA TAS TAS IDA TAS IEA TAS TAS
> > TAS
> > "49" "116" "116" "117" "135" "135" "136" "136" "140"
> > "153"
> > IDA IDA TAS IDA IEA IDA IDA TAS IDA
> > IEA
> > "154" "155" "554" "796" "796" "799" "1394" "1394" "1812"
> > "1812"
> > IDA IEA NAS TAS TAS IEA TAS TAS IEA
> > TAS
> > "1816" "1816" "1816" "1816" "1909" "2692" "2696" "2740" "2774"
> > "2778"
> > IDA ISS ISS IBA IBA IMP ISS TAS TAS
> > TAS
> > "2852" "3973" "4763" "4842" "4843" "4846" "4846" "4914" "4915"
> > "5032"
> > ISS NAS IEA IDA IEA TAS TAS
> > "5578" "5894" "7077" "7432" "7434" "10486" "10487"
> >
> > Or the more powerful select() method:
> >
> > > select(org.Hs.eg.db, "GO:0031281", c("ENTREZID", "SYMBOL"), "GOALL")
> > GOALL EVIDENCEALL ONTOLOGYALL ENTREZID SYMBOL
> > 1 GO:0031281 IDA BP 49 ACR
> > 2 GO:0031281 IDA BP 116 ADCYAP1
> > 3 GO:0031281 TAS BP 116 ADCYAP1
> > 4 GO:0031281 TAS BP 117 ADCYAP1R1
> > 5 GO:0031281 IDA BP 135 ADORA2A
> > 6 GO:0031281 TAS BP 135 ADORA2A
> > 7 GO:0031281 IEA BP 136 ADORA2B
> > 8 GO:0031281 TAS BP 136 ADORA2B
> > 9 GO:0031281 TAS BP 140 ADORA3
> > 10 GO:0031281 TAS BP 153 ADRB1
> > 11 GO:0031281 IDA BP 154 ADRB2
> > 12 GO:0031281 IDA BP 155 ADRB3
> > 13 GO:0031281 TAS BP 554 AVPR2
> > 14 GO:0031281 IDA BP 796 CALCA
> > 15 GO:0031281 IEA BP 796 CALCA
> > 16 GO:0031281 IDA BP 799 CALCR
> > 17 GO:0031281 IDA BP 1394 CRHR1
> > 18 GO:0031281 TAS BP 1394 CRHR1
> > 19 GO:0031281 IDA BP 1812 DRD1
> > 20 GO:0031281 IEA BP 1812 DRD1
> > 21 GO:0031281 IDA BP 1816 DRD5
> > 22 GO:0031281 IEA BP 1816 DRD5
> > 23 GO:0031281 NAS BP 1816 DRD5
> > 24 GO:0031281 TAS BP 1816 DRD5
> > 25 GO:0031281 TAS BP 1909 EDNRA
> > 26 GO:0031281 IEA BP 2692 GHRHR
> > 27 GO:0031281 TAS BP 2696 GIPR
> > 28 GO:0031281 TAS BP 2740 GLP1R
> > 29 GO:0031281 IEA BP 2774 GNAL
> > 30 GO:0031281 TAS BP 2778 GNAS
> > 31 GO:0031281 IDA BP 2852 GPER1
> > 32 GO:0031281 ISS BP 3973 LHCGR
> > 33 GO:0031281 ISS BP 4763 NF1
> > 34 GO:0031281 IBA BP 4842 NOS1
> > 35 GO:0031281 IBA BP 4843 NOS2
> > 36 GO:0031281 IMP BP 4846 NOS3
> > 37 GO:0031281 ISS BP 4846 NOS3
> > 38 GO:0031281 TAS BP 4914 NTRK1
> > 39 GO:0031281 TAS BP 4915 NTRK2
> > 40 GO:0031281 TAS BP 5032 P2RY11
> > 41 GO:0031281 ISS BP 5578 PRKCA
> > 42 GO:0031281 NAS BP 5894 RAF1
> > 43 GO:0031281 IEA BP 7077 TIMP2
> > 44 GO:0031281 IDA BP 7432 VIP
> > 45 GO:0031281 IEA BP 7434 VIPR2
> > 46 GO:0031281 TAS BP 10486 CAP2
> > 47 GO:0031281 TAS BP 10487 CAP1
> > Warning message:
> > In .generateExtraRows(tab, keys, jointype) :
> > 'select' resulted in 1:many mapping between keys and return rows
> >
> > Best,
> >
> > Jim
> >
> >
> >
> > On 3/19/2014 4:28 AM, Tim Smith wrote:
> > > Hi,
> > >
> > > I was trying to get the genes annotated to the GO term "GO:0031281".
> > My code:
> > >
> > > library(org.Hs.eg.db)
> > >
> > > genes <- get("GO:0031281", org.Hs.egGO2EG)
> > >
> > > When I run the code, I get:
> > >
> > >> genes <- get("GO:0031281", org.Hs.egGO2EG)
> > > Error in .checkKeys(value, Rkeys(x), x at ifnotfound
> <mailto:x at ifnotfound>
> > <mailto:x at ifnotfound <mailto:x at ifnotfound>>) :
> > > value for "GO:0031281" not found
> > >
> > > If I check in AMIGO for this GO term, it seems to have many gene
> > products for Homo sapiens. Am I doing something wrong? Is there an
> > alternate package that I can try, just to double check the results?
> > >
> > > thanks!
> > > [[alternative HTML version deleted]]
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list