[BioC] GoStats

Sun Oct 10 17:21:30 CEST 2004

hi michael

In case you're impatient: i wrote a function (it's at the bottom of  
this message) that sort of does what you want automatically. I need to  
warn you beforehand: i am a biologist and a beginner at R so it's not  
perfect at all! The table you get as an output will need some cleaning  
up as you will see. I do that using an excel macro, but it should be  
possible in R, i just don't know how.

In this function, MATR is my matrix with gene expression changes of  
8000 genes over over time, so replace that with your own. The first  
column is the spot identifier (so probename in the case of an affy  
array). Im using my own annotation package called 'floor', so replace  
that with your affy package. Use

GOtable("0006818")

If you want to know which genes belong to the 'hydrogen transport'  
class, for instance.

The output is a textfile in the form of table with all genes that  
belong to the GO you specified with the feature identifier and the  
genename and its regulation that is in the matrix. The last row is the  
name of the GO and the amount of representing genes on the array. The  
name of the textfile is the GO identifier.

if you change it for the better, please let me know. I suppose it  
should be possible to use this function in a sapply command to execute  
it on a list of overrepresented GOs. I also made a function that makes  
tables of overrepresented GO identifiers and  corresponding p values  
based on a matrix of gene x condition that exists of 0 (for not sign.  
regulated) and 1 (for sign regulated genes). Let me know if you want  
it. Have Fun.

Floor

GOtable<-function(GOID){

GObase<-paste("GO:", GOID, sep = "")
filename<-paste("GO_", GOID, ".xls", sep = "")
tmpv<-mget(GObase, env=floorGO2ALLPROBES, ifnotfound="NA")
tmp<-unlist(tmpv)

tmpindex<-as.integer(tmp)

index <- MATR[,1] %in% tmpindex

subset <- MATR[index,]
tmp<-unique(sort(as.integer(tmp)))
tmp<-as.character(tmp)
tmp<-mget(tmp, env=floorGENENAME, ifnotfound="NA")
tmp<-as.matrix(tmp)
colnames(tmp)=list("GeneName")
data<-cbind(tmp, subset)
bla<-get(GObase, env=GOTERM)
rows<-(dim(data))[1]
rows1<-rows+1

data[rows1,1]<-bla
write.table(data, file=filename, sep="\t")
}

_______________________________________________________
Floor Stam

Vrije Universiteit Amsterdam
Faculty of Earth and Life Sciences
Department of Molecular and Cellular Neurobiology
De Boelelaan 1085
1081HV Amsterdam
The Netherlands

Ph: 	+31-20-4447114
	+31-20-5665512
Fax: 	+31-20-4447112
e-mail: fjstam at bio.vu.nl
_______________________________________________________
On 9 Oct 2004 , at 5:26, Robert Gentleman wrote:

> On Thu, Oct 07, 2004 at 06:25:45PM +0200, Auer Michael wrote:
>> When applying GOHyperG to a series of Locus IDs, the function only  
>> reports
>> the GO ID and the corresponding frequencies. But what most people  
>> want to
>> have, when interpreting the GO results is the corresponding Affy IDS  
>> which
>> map on the different Locus IDs and GO IDS.
>>
>
>   I will add this feature to the next release.
>   You will then get the GO ID and the set of LocusLink IDs that are
>   mapped to that GO ID. You would then use some other tools (like
>   those you suggest below) to find the related Affymetrix (or other
>   manufacturer IDs).
>
>
>> Code I use
>>
>> Locusids<-mget(AffyIDS,env=hgu133aLOCUSID)
>> /*retrieves the LOCUS LINK IDS from a series of AffyIDS*/
>>
>> MF<- GOHyperG(Locusids,lib="hgu133a",what="MF")
>> /*reports the occurring Gene Ontologies, the frequencies and the
>> hypergeometric p values*/
>>
>> How can I get a result of the form
>>
>> 1. GO ID, freq (output of GoHyperG)
>>
>> 2. GO ID, LOCUS ID (multiple occurring GO IDs)
>>
>> 3. LOCUS ID, AFFY ID (multiple occurring LOCUS IDs)
>>
>
>   What you would need to do now, is to take each of the set of
>   LocusLink IDs you supplied to the call to GOHyperG, and the output
>   set of GO IDs and find which LL's are linked to which GO IDs. This
>   is done in GOALLLOCUSID, in GO 1.6.4.
>
>   Finally you would mapp the LLs to whatever manufacturer IDs you
>   have. That would be done from the manufacturer specific data
>   package.
>
>>
>> I know that there is the function hgu95av2GO2ALLPROBES$"GO:0000166"
>> which
>> retrieves all the probes mapping on a GO, but still it is not the  
>> result.
>> Ok, one would have to make a for loop and look for the occurring  
>> probes in
>> the sample. But isn't there a better way????
>
>   I am not sure what you mean by a better way? Nor how it would
>   necessarily answer your question.
>
>
>>
>>
>> Thanks a lot
>>
>> Michael Auer
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> --  
> +---------------------------------------------------------------------- 
> -----+
> | Robert Gentleman                 phone : (617) 632-5250               
>      |
> | Associate Professor              fax:   (617)  632-2444               
>      |
> | Department of Biostatistics      office: M1B20                        
>      |
> | Harvard School of Public Health  email: rgentlem at jimmy.harvard.edu    
>      |
> +---------------------------------------------------------------------- 
> -----+
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>