[BioC] getHomolog in biomaRt
Steve Pederson
stephen.pederson at student.adelaide.edu.au
Wed Apr 11 08:30:29 CEST 2007
Hi Steffen,
Thanks for the response & that sorted my problem out rather well. I had
been using biomaRt 1.8.2.
Cheers,
Steve
Steffen Durinck wrote:
> Hi Steve,
>
> Which version of biomaRt are you using?
> I would recommend using the devel version, as this will return both the
> query id and it's homolog id.
>
> >human=useMart("ensembl", dataset="hsapiens_gene_ensembl")
> >mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl")
> > getHomolog( id = c("66645","64058"), to.type = "entrezgene",from.type
> = "entrezgene", from.mart = mouse, to.mart=human )
> V1 V2
> 1 64058 64065
> 2 66645 55269
>
>
> > sessionInfo()
> R version 2.4.0 (2006-10-03)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
>
> attached base packages:
> [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets"
> [7] "base"
>
> other attached packages:
> biomaRt RCurl XML
> "1.9.22" "0.8-0" "1.4-1"
>
> Cheers,
> Steffen
>
> Steve Pederson wrote:
>> Hi,
>>
>> I'm still on a steep learning curve with R & am trying to convert a
>> large batch of mouse entrezIDs to homologous human entrezID & when
>> sending as a batch to biomaRt the search result doesn't contain the
>> query string (is this possible as a suggestion for the next update?),
>> so is unable to be matched to the original. For example:
>>
>> > getHomolog( id = c("73663","66645","74855"), to.type =
>> "entrezgene", from.type = "entrezgene", from.mart = mouse,
>> to.mart=human )
>> V1
>> 1 55269
>>
>> As a result, I'm sending one at a time via a quick function that I set
>> up. The batch regularly seems to fail & I get the following error
>> message:
>> Error in read.table(con, sep = "\t", header = FALSE, quote = "",
>> comment.char = "", :
>> no lines available in input
>>
>> This is an example of the exact code that causes it:
>> library(biomaRt)
>> human <- useMart("ensembl","hsapiens_gene_ensembl")
>> mouse <- useMart("ensembl","mmusculus_gene_ensembl")
>> getHomolog( id = "380768", to.type = "entrezgene", from.type =
>> "entrezgene", from.mart = mouse, to.mart=human )
>>
>> The response is not NULL, as my code is set up to handle this response.
>>
>> My main question is, does anyone know how do I stop the loop aborting
>> when I receive this error message, which I think is external? If I can
>> record which specific IDs are causing the error, I could exclude them
>> from the original batch, but the error-handling is a bit murky to my
>> reading in the R help. My actual function is included below
>> (biomaRt.conversion).
>>
>> Unfortunately, I don't have any MySQL experience (yet) so that isn't
>> an option for me as an alternative.
>>
>> The list is derived from those unable to be matched from
>> ProbeMatchDB2.0, as that database maps via Unigene
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchDB/ncbi_probmatch_para_step1.asp
>>
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>> biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp)
>> {
>> # x is the initial list of ids
>> # from.id & to.id are the type of codes (e.g entrez or unigene)
>> # from.mart & to.mart can only be human or mouse
>> # Warnings will need to be suppressed in the case of no match
>> existing
>> homologs <- c()
>> no.homolog <- c()
>> if (from.sp=="human") mart1
>> <-useMart("ensembl","hsapiens_gene_ensembl")
>> if (to.sp=="human") mart2 <-
>> useMart("ensembl","hsapiens_gene_ensembl")
>> if (from.sp=="mouse") mart1
>> <-useMart("ensembl","mmusculus_gene_ensembl")
>> if (to.sp=="mouse") mart2 <-
>> useMart("ensembl","mmusculus_gene_ensembl")
>> for (i in 1:length(x))
>> {
>> suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id,
>> from.type =from.id, from.mart = mart1, to.mart = mart2))
>> if (is.null(hum)==FALSE) # if a homolog was found
>> {
>> #A duplicate removal stage
>> if(dim(hum)[1]>1)
>> {
>> j=1 # the first entry in hum to check for duplicates
>> k=dim(hum)[1]
>> while(j<k)
>> {
>> if(length(which(hum==hum[j]))>1)# if there is a
>> duplicate
>> {
>> hum <- hum[-(which(hum==hum[j])[-1]),]
>> #removes all the duplicates except the first
>> #reset the values
>> if(is.null(dim(hum)[1])==TRUE)
>> {
>> k=0 #this will exit the loop if "hum" is
>> now a single value
>> }
>> else
>> {
>> k=dim(hum)[1]
>> j=j+1
>> }
>> }
>> }
>> }
>>
>> for (j in 1:length(hum))
>> {
>> homologs <- rbind(homologs,c(x[i],hum[j]))
>> }
>>
>> }
>> else #if no homolog was found
>> {
>> no.homolog <- c(no.homolog,x[i])
>> }
>> }
>> colnames(homologs) <-
>> c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep="."))
>> list(homologs=data.frame(homologs),no.homolog=no.homolog)
>> }
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
More information about the Bioconductor
mailing list