[BioC] How can I obtain gene name from chromosome location?
Yoo, Seungyeul
seungyeul.yoo at mssm.edu
Tue Jul 10 05:34:34 CEST 2012
Hi Tim,
Thank you for your advices. I'm sorry for another naive questions, but how can I know whether the chromosome location are promoter regions or not from the rawdata?
I'm reading rawdata of DNA methylation which is a pair of untreated and methylated .xys files like followings.
pd<-read.table("CTRL_sample.txt",header=TRUE,sep="\t")
res<-validatePd(pd)
rawData<-readCharm(pd$filename,path="/projects/zhuj05a/Lung_Dataset/LGRC/Raw/Charm/3_CTRL",sampleKey=pd)
ctrlind<-getControlIndex(rawData,subject=Hsapiens)
grp<-pData(rawData)$tissue
p<-methp(rawData,controlIndex=ctrlind,plotDensity="density_CTRL.pdf",plotDensityGroups=grp)
rownames(p)<-pns(rawData)
colnames(p)<-unique(pd$sampleID)
I want the rownames of the matrix p is the genename rather than chromosome locations.
I will try to use "flanks()" as you suggested and also try other advices from Steve and Brian.
Thanks,
Seungyeul Yoo
Postdoctoral Fellow
Department of Genetics and Genomic Sciences
Institute of Genomics and Multiscale Biology
Mount Sinai School of Medicine
(office) 212-659-6877
On Jul 9, 2012, at 4:56 PM, Tim Triche, Jr. wrote:
> The original poster did not specify whether these are promoter regions
> or genic regions; if they are the former, flank() will be useful.
>
>
> On Mon, Jul 9, 2012 at 12:43 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi Seungyeul,
>>
>> On Mon, Jul 9, 2012 at 3:28 PM, Yoo, Seungyeul <seungyeul.yoo at mssm.edu> wrote:
>>> Hi all,
>>>
>>> I'm working on DNA-methylation data of Lung Genomes.
>>>
>>> I'm using CHARM packages for the analysis of differentially methylated regions.
>>>
>>> I can have a list of chromosomal locations indicating genes but I don't know how I map this location into specific gene names.
>>>
>>>> head(pns)
>>> [1] "chr19:4205395-4220723" "chr16:73793547-73835933"
>>> [3] "chr22:18115791-18146966" "chr19:60540822-60563218"
>>> [5] "chr16:14630202-14638324" "chr19:49197954-49200178"
>>>
>>> Because I also have gene expression dataset, I want to integrate dna methylation data so obtaining genename is very critical.
>>>
>>> Please let me have any advices.
>>
>> I'll just point you towards the way, and leave the (important) task of
>> learning how to use these packages up to you (or another poster who
>> feels that given you the exact commands is the best way to help you
>> ;-)
>>
>> (1) Use the GenomicFeatures package to build a TranscriptDb for your
>> organism and annotation source of choice (refseq, ensembl, ucsc known
>> genes):
>>
>> http://bioconductor.org/packages/2.10/bioc/html/GenomicFeatures.html
>>
>> (2) Represent your ranges (chr22:XXX-YYY) as a GenomicRanges object:
>>
>> http://bioconductor.org/packages/2.10/bioc/html/GenomicRanges.html
>>
>> (3) Extract the "transcripts" from your TranscriptDb object using the
>> `transcripts` function
>>
>> (4) Use findOverlaps and friends (eg. subsetByOverlaps) to find which
>> transcripts overlap which transcripts.
>>
>> The GenomicFeatures, GenomicRanges, and (if you really want to master
>> your craft) IRanges packages each have pretty extensive documentation
>> in terms of vignettes and API documentation that are worth your time
>> to read -- once you do so, using those packages to perform the tasks
>> outlined above will be rather straightforward.
>>
>> HTH,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> A model is a lie that helps you see the truth.
>
> Howard Skipper
More information about the Bioconductor
mailing list