[BioC] getHomolog in biomaRt
Steve Pederson
stephen.pederson at student.adelaide.edu.au
Tue Apr 10 14:36:25 CEST 2007
Hi,
I'm still on a steep learning curve with R & am trying to convert a
large batch of mouse entrezIDs to homologous human entrezID & when
sending as a batch to biomaRt the search result doesn't contain the
query string (is this possible as a suggestion for the next update?), so
is unable to be matched to the original. For example:
> getHomolog( id = c("73663","66645","74855"), to.type = "entrezgene",
from.type = "entrezgene", from.mart = mouse, to.mart=human )
V1
1 55269
As a result, I'm sending one at a time via a quick function that I set
up. The batch regularly seems to fail & I get the following error message:
Error in read.table(con, sep = "\t", header = FALSE, quote = "",
comment.char = "", :
no lines available in input
This is an example of the exact code that causes it:
library(biomaRt)
human <- useMart("ensembl","hsapiens_gene_ensembl")
mouse <- useMart("ensembl","mmusculus_gene_ensembl")
getHomolog( id = "380768", to.type = "entrezgene", from.type =
"entrezgene", from.mart = mouse, to.mart=human )
The response is not NULL, as my code is set up to handle this response.
My main question is, does anyone know how do I stop the loop aborting
when I receive this error message, which I think is external? If I can
record which specific IDs are causing the error, I could exclude them
from the original batch, but the error-handling is a bit murky to my
reading in the R help. My actual function is included below
(biomaRt.conversion).
Unfortunately, I don't have any MySQL experience (yet) so that isn't an
option for me as an alternative.
The list is derived from those unable to be matched from
ProbeMatchDB2.0, as that database maps via Unigene
http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchDB/ncbi_probmatch_para_step1.asp
Thanks,
Steve
biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp)
{
# x is the initial list of ids
# from.id & to.id are the type of codes (e.g entrez or unigene)
# from.mart & to.mart can only be human or mouse
# Warnings will need to be suppressed in the case of no match existing
homologs <- c()
no.homolog <- c()
if (from.sp=="human") mart1
<-useMart("ensembl","hsapiens_gene_ensembl")
if (to.sp=="human") mart2 <- useMart("ensembl","hsapiens_gene_ensembl")
if (from.sp=="mouse") mart1
<-useMart("ensembl","mmusculus_gene_ensembl")
if (to.sp=="mouse") mart2 <-
useMart("ensembl","mmusculus_gene_ensembl")
for (i in 1:length(x))
{
suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id,
from.type =from.id, from.mart = mart1, to.mart = mart2))
if (is.null(hum)==FALSE) # if a homolog was found
{
#A duplicate removal stage
if(dim(hum)[1]>1)
{
j=1 # the first entry in hum to check for duplicates
k=dim(hum)[1]
while(j<k)
{
if(length(which(hum==hum[j]))>1)# if there is a
duplicate
{
hum <- hum[-(which(hum==hum[j])[-1]),] #removes
all the duplicates except the first
#reset the values
if(is.null(dim(hum)[1])==TRUE)
{
k=0 #this will exit the loop if "hum" is
now a single value
}
else
{
k=dim(hum)[1]
j=j+1
}
}
}
}
for (j in 1:length(hum))
{
homologs <- rbind(homologs,c(x[i],hum[j]))
}
}
else #if no homolog was found
{
no.homolog <- c(no.homolog,x[i])
}
}
colnames(homologs) <-
c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep="."))
list(homologs=data.frame(homologs),no.homolog=no.homolog)
}
More information about the Bioconductor
mailing list