[BioC] rtracklayer gff import

Kathi Zarnack zarnack at ebi.ac.uk
Thu Apr 14 14:16:08 CEST 2011


Hi,

I am using the package rtracklayer to import transcript.gtf files 
produced by Cufflinks.

As I understand the gff3 specification, feature coordinates are given as 
"start and end of the feature, in 1-based integer coordinates" (also 
discussed in this mailing list lately), meaning that the line below from 
my gtf file corresponds to an exons ranging from 1310534 to 1310771.

original line from the gtf file:
chr1    transcripts_C4    exon    1310534    1310771    78    -    .    
Parent=CUFF.1065.1

However, upon rtracklayer import, the exon ends at 1310770 (see below). 
Thus, as I understand it, rtracklayer import.gff() interprets gtf as 
"1-based right-open" (upon export using export.gff3(), it also becomes 
1310771 again). I tried importing with explicitly specifying version="3" 
and also updated to the latest rtracklayer version, but neither helped. 
Is this a bug in the rtracklayer function or am I interpreting the gff 
coordinates wrongly? Any comments will be appreciated.

Thanks for your help.

Best regards,
Kathi


 > library(rtracklayer)
Loading required package: RCurl
Loading required package: bitops

 > 
gff=import.gff("/nfs/research2/luscombe/kathi/data/expression_data/hnRNPC_mRNAseq/cufflinks_0.9.3/cufflinks_C4/transcripts_C4.gtf",
+ genome="hg19",asRangedData=FALSE)

 > gff[177]
GRanges with 1 range and 11 elementMetadata values
    seqnames             ranges strand |        type       source       
phase
       <Rle>          <IRanges>  <Rle> | <character>  <character> 
<character>
[1]     chr1 [1310534, 1310770]      - |        exon 
Cufflinks_C4          NA
      conf_hi   conf_lo       cov      FPKM      frac          ID      
Parent
    <numeric> <numeric> <numeric> <numeric> <numeric> <character> 
<character>
[1]        NA        NA        NA        NA        NA          NA 
CUFF.1065.1
        score
    <numeric>
[1]        78

seqlengths
  chr1 chr10 chr11 chr12 chr13 chr14 ...  chr7  chr8  chr9  chrM  chrX  chrY
    NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

 > export.gff3(gff[177],"test_export.gtf")


[zarnack at ebi-001 ~]$ more test_export.gtf
##gff-version 3
##date 2011-04-14
chr1    Cufflinks_C4    exon    1310534    1310771    78    -    NA    
Parent=CUFF.1065.1


 > sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8   
 [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8  
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] rtracklayer_1.10.6 RCurl_1.5-0        bitops_1.0-4.1   

loaded via a namespace (and not attached):
[1] Biobase_2.10.0      Biostrings_2.18.0   BSgenome_1.18.1   
[4] GenomicRanges_1.2.1 IRanges_1.8.9       tools_2.12.0      
[7] XML_3.2-0


-- 
Dr. Kathi Zarnack
Luscombe Group
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK
tel +44 1223 494 526



More information about the Bioconductor mailing list