[BioC] getHomolog in biomaRt
Steffen Durinck
durincks at mail.nih.gov
Tue Apr 10 17:17:38 CEST 2007
Hi Steve,
Which version of biomaRt are you using?
I would recommend using the devel version, as this will return both the
query id and it's homolog id.
>human=useMart("ensembl", dataset="hsapiens_gene_ensembl")
>mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl")
> getHomolog( id = c("66645","64058"), to.type = "entrezgene",from.type
= "entrezgene", from.mart = mouse, to.mart=human )
V1 V2
1 64058 64065
2 66645 55269
> sessionInfo()
R version 2.4.0 (2006-10-03)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "methods" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "base"
other attached packages:
biomaRt RCurl XML
"1.9.22" "0.8-0" "1.4-1"
Cheers,
Steffen
Steve Pederson wrote:
> Hi,
>
> I'm still on a steep learning curve with R & am trying to convert a
> large batch of mouse entrezIDs to homologous human entrezID & when
> sending as a batch to biomaRt the search result doesn't contain the
> query string (is this possible as a suggestion for the next update?), so
> is unable to be matched to the original. For example:
>
> > getHomolog( id = c("73663","66645","74855"), to.type = "entrezgene",
> from.type = "entrezgene", from.mart = mouse, to.mart=human )
> V1
> 1 55269
>
> As a result, I'm sending one at a time via a quick function that I set
> up. The batch regularly seems to fail & I get the following error message:
> Error in read.table(con, sep = "\t", header = FALSE, quote = "",
> comment.char = "", :
> no lines available in input
>
> This is an example of the exact code that causes it:
> library(biomaRt)
> human <- useMart("ensembl","hsapiens_gene_ensembl")
> mouse <- useMart("ensembl","mmusculus_gene_ensembl")
> getHomolog( id = "380768", to.type = "entrezgene", from.type =
> "entrezgene", from.mart = mouse, to.mart=human )
>
> The response is not NULL, as my code is set up to handle this response.
>
> My main question is, does anyone know how do I stop the loop aborting
> when I receive this error message, which I think is external? If I can
> record which specific IDs are causing the error, I could exclude them
> from the original batch, but the error-handling is a bit murky to my
> reading in the R help. My actual function is included below
> (biomaRt.conversion).
>
> Unfortunately, I don't have any MySQL experience (yet) so that isn't an
> option for me as an alternative.
>
> The list is derived from those unable to be matched from
> ProbeMatchDB2.0, as that database maps via Unigene
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchDB/ncbi_probmatch_para_step1.asp
>
> Thanks,
>
> Steve
>
>
>
> biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp)
> {
> # x is the initial list of ids
> # from.id & to.id are the type of codes (e.g entrez or unigene)
> # from.mart & to.mart can only be human or mouse
> # Warnings will need to be suppressed in the case of no match existing
> homologs <- c()
> no.homolog <- c()
> if (from.sp=="human") mart1
> <-useMart("ensembl","hsapiens_gene_ensembl")
> if (to.sp=="human") mart2 <- useMart("ensembl","hsapiens_gene_ensembl")
> if (from.sp=="mouse") mart1
> <-useMart("ensembl","mmusculus_gene_ensembl")
> if (to.sp=="mouse") mart2 <-
> useMart("ensembl","mmusculus_gene_ensembl")
> for (i in 1:length(x))
> {
> suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id,
> from.type =from.id, from.mart = mart1, to.mart = mart2))
> if (is.null(hum)==FALSE) # if a homolog was found
> {
> #A duplicate removal stage
> if(dim(hum)[1]>1)
> {
> j=1 # the first entry in hum to check for duplicates
> k=dim(hum)[1]
> while(j<k)
> {
> if(length(which(hum==hum[j]))>1)# if there is a
> duplicate
> {
> hum <- hum[-(which(hum==hum[j])[-1]),] #removes
> all the duplicates except the first
> #reset the values
> if(is.null(dim(hum)[1])==TRUE)
> {
> k=0 #this will exit the loop if "hum" is
> now a single value
> }
> else
> {
> k=dim(hum)[1]
> j=j+1
> }
> }
> }
> }
>
> for (j in 1:length(hum))
> {
> homologs <- rbind(homologs,c(x[i],hum[j]))
> }
>
> }
> else #if no homolog was found
> {
> no.homolog <- c(no.homolog,x[i])
> }
> }
> colnames(homologs) <-
> c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep="."))
> list(homologs=data.frame(homologs),no.homolog=no.homolog)
> }
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Steffen Durinck, Ph.D.
Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877
More information about the Bioconductor
mailing list