[BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
Katja Hebestreit
katjah at stanford.edu
Mon Apr 14 07:14:49 CEST 2014
Actually, the error was not reproducible with the lines I attached. But it is reproducible with those lines (four additional lines):
chr1 mm9_refFlat stop_codon 3206103 3206105 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat CDS 3206106 3207049 0.000000 - 2 gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat exon 3204563 3207049 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat CDS 3411783 3411982 0.000000 - 1 gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat exon 3411783 3411982 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat CDS 3660633 3661429 0.000000 - 0 gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat start_codon 3661427 3661429 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat exon 3660633 3661579 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat stop_codon 4283062 4283064 0.000000 - . gene_id "Rp1"; transcript_id "Rp1";
chr1 mm9_refFlat CDS 4283065 4283093 0.000000 - 2 gene_id "Rp1"; transcript_id "Rp1";
Let me know if you like to get the entire file.
Thank you!!
Katja
----- Original Message -----
From: "Michael Lawrence" <lawrence.michael at gene.com>
To: "Katja Hebestreit" <katjah at stanford.edu>
Cc: bioconductor at r-project.org, "Rsamtools Maintainer" <maintainer at bioconductor.org>
Sent: Sunday, April 13, 2014 10:02:13 PM
Subject: Re: [BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
On Sun, Apr 13, 2014 at 7:18 PM, Katja Hebestreit <katjah at stanford.edu>wrote:
> Hello,
>
> I get an error when I try to import my gff file:
>
> txdb <- makeTranscriptDbFromGFF(file="file.gtf", format="gtf")
>
> Error in .parse_attrCol(attrCol, file, colnames) :
> Some attributes do not conform to 'tag value' format
>
> This is how my file looks like:
>
> chr1 mm9_refFlat stop_codon 3206103 3206105 0.000000 -
> . gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat CDS 3206106 3207049 0.000000 - 2
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat exon 3204563 3207049 0.000000 - .
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat CDS 3411783 3411982 0.000000 - 1
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat exon 3411783 3411982 0.000000 - .
> gene_id "Xkr4"; transcript_id "Xkr4";
> chr1 mm9_refFlat CDS 3660633 3661429 0.000000 - 0
> gene_id "Xkr4"; transcript_id "Xkr4";
>
> I have the feeling that this has something to do with the missing exon
> rank information in my file. Is that true? Is there a way to import this
> file? All I want to do is to determine the gene lengths.
>
It is most likely as the error says: some of your attributes are malformed.
Is that the entire file listed above, or is there more? If you could get me
the file somehow I could diagnose the issue.
>
> Could anyone help? That would be awesome!
> Cheers,
> Katja
>
>
> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.16.0 AnnotationDbi_1.25.19 Biobase_2.23.6
> [4] GenomicRanges_1.16.0 GenomeInfoDb_0.99.32 IRanges_1.21.45
> [7] BiocGenerics_0.9.3 BiocInstaller_1.14.1
>
> loaded via a namespace (and not attached):
> [1] BatchJobs_1.2 BBmisc_1.5 BiocParallel_0.6.0
> [4] biomaRt_2.20.0 Biostrings_2.32.0 bitops_1.0-6
> [7] brew_1.0-6 BSgenome_1.32.0 codetools_0.2-8
> [10] DBI_0.2-7 digest_0.6.4 fail_1.2
> [13] foreach_1.4.2 GenomicAlignments_1.0.0 iterators_1.0.7
> [16] plyr_1.8.1 Rcpp_0.11.1 RCurl_1.95-4.1
> [19] Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0
> [22] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2
> [25] tools_3.1.0 XML_3.98-1.1 XVector_0.4.0
> [28] zlibbioc_1.10.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list