[BioC] AnnBuilder doesn't work on gene information ("GENENAME", "SYMBOL", etc)

nli at fhcrc.org nli at fhcrc.org
Thu Jul 13 05:12:05 CEST 2006


Hi, Weijun,

Sorry for the late reply. I was travelling yesterday... Please add one tempty
column to your base file myBase, so it will look like:
10001_at\t\t10001
10002_at\t\t10002
10003_at\t\t10003
Here "\t" is the TAB key, not string "\t". This modification will solve your
problem. And you should get environments like: xxACCNUM, xxGENENAME, xxSYMBOL,
xxGO, xxOMIM, xxPMID, etc. All those information are obtained from Entrez Gene.
The Entrez Gene data is parsed by one of the parsers in folder
AnnBuilder/scripts (or AnnBuilder/inst/scripts in source). The input parameter
"baseMapType" of ABPkgbuilder decides which parser to use. The parser for
baseMapType "ll" is "llParser". The parser seems to assume that the baseFile
has
at least three columns (delimited by TAB), and the first three columns should
probeset ID, GenBank accession numbers, and Entrez Gene ID. Because we only use
"Entrez Gene ID" for the mapping, the 2nd column can be empty, but you have to
have two TAB delimiters there. When baseFile has only two columns, the parser
will do something else. I am not sure whether it is a feature or a bug. But I
am hesitate to modify it cauz other packages may depend on it. If anyone is
familiar with this part, I will appreciate if you could give some comments.

Also, just FYI, even though many annotation packages in bioc were generated by
ABPkgBuilder, none of them use "ll" as "baseMapType". So, we didn't realize
this problem before. Thanks you for bring this up. We will either update the
document or provide a patch in the near future.

many thanks

nianhua

Nianhua Li
computational biology, PHS, FHCRC



More information about the Bioconductor mailing list