[BioC] can not link the genomic positions queried and their specific annotation, when getting genomic variants annotated by biomaRt package

Mao Jianfeng jianfeng.mao at gmail.com
Tue Feb 8 17:47:59 CET 2011


Dear listers, Sean and Steve,

I have posted a similar question in this list. But, I am still
confused. So I try to describe my question more detail, in order to
let it more clear for you. PLEASE read all the 6 sections followed.

Thanks a lot. My question is not a student's homework. And, I have
only one way to get helps on R and bioconductor. I learned all of them
by myself, in a somewhat isolated environment. So, your any helps are
very very valuable for me.

Jian-Feng,



(1) the genomic variants data I need to be annotated:
# SNPs,chromosome,start,end
SNP_1,1,43,43
SNP_2,2,56,56

(2) I want to get (annotation), there maybe multiples term for a
specific annotation column, they need be combined in one cell. Or they
need be in different rows of the same column. Whatever they are, the
genomic positions should go along with their specific annotations.

# SNPs,chromosome,start,end,annotation_term
SNP_1,1,43,43,go_1:go_3
SNP_2,2,56,56,go_100:go_1000

or

# SNPs,chromosome,start,end,go_term
SNP_1,1,43,43,go_1
SNP_1,1,43,43,go_3
SNP_2,2,56,56,go_100
SNP_2,2,56,56,go_1000

(3) It was said that biomaRt package have such functionalities,

(4) what I have got using the biomaRt package,
library(biomaRt)
listMarts()
plant = useMart("plant_mart_7")
alyr=useDataset("alyrata_eg_gene", mart=plant)
atha = useDataset ("athaliana_eg_gene",mart=plant)

listAttributes(alyr)
listFilters(alyr)

chr<-c(rep(1, 10))
start<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
end<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)

getBM(attributes =
c("chromosome_name","start_position","ensembl_gene_id",
"go_biological_process_linkage_type"), filters = c("chromosome_name",
"start", "end"), values = list(chr, start, end), mart=alyr, uniqueRows
= TRUE)

(5) what I got

   chromosome_name start_position end_position                ensembl_gene_id
1                1          48875        49123            Al_scaffold_0001_16
2                1          72255        72617            Al_scaffold_0001_21
3                1          10652        11944             Al_scaffold_0001_4
4                1          82573        83367 fgenesh1_pg.C_scaffold_1000018
5                1          87206        90301 fgenesh1_pg.C_scaffold_1000020
6                1          29681        31614 fgenesh1_pm.C_scaffold_1000009
7                1          51526        52636 fgenesh1_pm.C_scaffold_1000016
8                1          78367        80505 fgenesh1_pm.C_scaffold_1000020
9                1          35461        39593 fgenesh2_kg.1__12__AT1G02120.1
10               1          39949        42531 fgenesh2_kg.1__13__AT1G02110.1
11               1          46396        48761 fgenesh2_kg.1__19__AT1G02090.1
12               1          55814        56468 fgenesh2_kg.1__20__AT1G02070.1
13               1          74785        76652 fgenesh2_kg.1__23__AT1G02065.1
14               1          80941        82330 fgenesh2_kg.1__25__AT1G02050.1
15               1          80941        82330 fgenesh2_kg.1__25__AT1G02050.1
16               1          90714       113497 fgenesh2_kg.1__28__AT1G02010.1
17               1          90714       113497 fgenesh2_kg.1__28__AT1G02010.1
18               1           3311         6198  fgenesh2_kg.1__2__AT1G02190.2
19               1           3311         6198  fgenesh2_kg.1__2__AT1G02190.2
20               1           9512        10567  fgenesh2_kg.1__3__AT1G02180.1
21               1          12552        13416  fgenesh2_kg.1__5__AT1G02160.2
22               1             47         2523              scaffold_100001.1
23               1             47         2523              scaffold_100001.1
24               1           7429         7630              scaffold_100003.1
25               1          13702        15386              scaffold_100007.1
26               1          15665        19464              scaffold_100008.1
27               1          19692        20609              scaffold_100009.1
28               1          24515        27497              scaffold_100010.1
29               1          33055        34772              scaffold_100013.1
30               1          33055        34772              scaffold_100013.1
31               1          33055        34772              scaffold_100013.1
32               1          33055        34772              scaffold_100013.1
33               1          33055        34772              scaffold_100013.1
34               1          33055        34772              scaffold_100013.1
35               1          43130        46178              scaffold_100016.1
36               1          49553        51020              scaffold_100018.1
37               1          49553        51020              scaffold_100018.1
38               1          57579        57871              scaffold_100022.1
39               1          58865        72177              scaffold_100023.1
   go_biological_process_linkage_type
1
2
3                                 IEA
4
5
6
7
8
9
10
11
12
13
14                                IEA
15                                IEA
16                                IEA
17                                IEA
18                                IEA
19                                IEA
20
21
22                                IEA
23                                IEA
24
25
26                                IEA
27
28
29                                IEA
30                                IEA
31                                IEA
32                                IEA
33                                IEA
34                                IEA
35
36                                IEA
37                                IEA
38
39

(6) my problem is I can not link the genomic positions I queried and
their specific annotation.






-- 
Jian-Feng, Mao

the Institute of Botany,
Chinese Academy of Botany,



More information about the Bioconductor mailing list