[BioC] Start and End positions in the GTF/GFF3 files

Hervé Pagès hpages at fhcrc.org
Thu Jul 4 04:22:44 CEST 2013


Hi Delasa,

If you are lucky the exact version of the reference genome might be
specified in the header of the file (i.e. in the first lines of
comments -- those lines should start with ## in a GFF3 file). But I
don't think the specs for GTF/GFF3 require this information to be
present. That means there is no way to know by just looking at the
content of the file. This is information one generally gathers from
the provider of the file. For example on the NCBI or UCSC FTP servers,
things are organized in a way that makes it clear which GTF/GFF3 files
go with which reference genomes. If this is not clear, then it's a
problem with how the provider is distributing those files and the best
way to clarify is to contact them.

H.

On 07/03/2013 04:56 PM, Delasa Aghamirzaie wrote:
> Hi Bioconductors,
> I have a question regarding to GTF/GFF3's start and end positions in the
> genome and I dont know if here is the right place to ask. I would be
> appreciated if anyone can answer my question.
>
> Does anybody know the numbers regarding to start position and end positions
> in GTF/GFF3 files are based on which version of reference genome? I see
> different versions: hard_masked.fa or soft-masked.fa, cds.fa,
> cds_primaryTranscriptOnly.fa. I have a GTF file and I want to find the
> corresponding sequences in the fasta references, but I dont know which file
> to use. I used the hardmasked one in which we used to map the reads to
> genome, but corresponding positions in the GTF file does not give me
> correct sequence for the each gene.
>
>
> Sincerely Yours,
> Delasa Aghamirzaie
> Genetics, Bioinformatics, and Computational Biology (GBCB) PhD Student
> Virginia Tech
> Blacksburg, Virginia
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list