[BioC] genes in region of miRNA genes
Iain Gallagher
iaingallagher at btopenworld.com
Mon Jun 8 15:18:47 CEST 2009
Hi list
I some data on the chromosome, start and end points of some microRNAs of interest:
miR chromosome start end
hsa-mir-572 17 10979549 10979643
hsa-mir-583 18 95440598 95440672
hsa-mir-587 19 107338693 107338788
hsa-mir-598 21 10930126 10930222
hsa-mir-599 21 100618040 100618134
hsa-mir-210 3 558089 558198
hsa-mir-141 4 6943521 6943615
hsa-mir-492 4 93752305 93752420
hsa-mir-639 11 14501355 14501452
hsa-mir-663 13 26136822 26136914
hsa-mir-503 24 133508024 133508094
I was hoping to use biomaRt to extract information for genes upstream and downstream of these miRNAs (see script below).
I have created a list in the correct form for a multi filter query using biomaRt but the following query only retrieves data for chromosome 17. I gather that looping over data is discouraged for biomaRt (presumably to prevent overloading servers) and I was wondering if there was a better way of doing this.
In the following script the allMirs table is the result of:
allMirs <- "ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/genomes/hsa.gff"
allMirs<-read.table(allMirs)
Although I did massages the data outside R to remove some extraneous columns (mainly those full of full stops) and add column names.
The 'miRsUpInFlu.txt' table is that above.
#get miR chromosome corrds from biomaRt
rm(list=ls())
library(biomaRt)
#read in list of miRs
mirs<-read.table('miRsUpInFlu.txt', header=T, sep='\t')
mirs<-sub('R', 'r', as.character(mirs[,1])) #correct miR labels
allMirs<-read.table('miRbaseJune2009.txt', header=T, sep='\t')
mirRow<-which(as.character(allMirs$id) %in% mirs)
mirsData<-allMirs[mirRow,]
#minor miRs are missing (eg * etc etc)
mirRow<-cbind(as.character(mirsData$id), mirsData[,2], mirsData[,4], mirsData[,5])
#now we have a dataframe containing the miR id, start and stop
#we have to extend the start and stop sites by 500000
#then retrieve genes in these regions
starts<-as.numeric(mirRow[,3])
stops<-as.numeric(mirRow[,4])
limitStarts<-starts-500000#going 5'
limitStops<-stops+500000#going 3'
#this creates a dataframe in the form we need for list conversion
vals<-rbind(mirRow[,2], limitStarts, limitStops)
#the list conversion is required for the biomaRt query because we are using more than one filter
vals<-as.list(vals)
#generate query
db<-useMart('ensembl', dataset='hsapiens_gene_ensembl')
query<-getBM(c('hgnc_symbol', 'ensembl_transcript_id', 'chromosome_name', 'external_gene_id'), filters=c('chromosome_name', 'start', 'end'), values=vals, mart=db)
Any help would be appreciated.
Thanks
Iain
R version 2.9.0 (2009-04-17)
x86_64-pc-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.0.0
loaded via a namespace (and not attached):
[1] RCurl_0.94-1 XML_2.3-0
More information about the Bioconductor
mailing list