[R] splitting multiple data in one column into multiple rows with one entry per column
Felix Müller-Sarnowski
drflxms at googlemail.com
Sun Jul 26 21:26:53 CEST 2009
Dear R colleagues,
I annotated a list of single nuclotide polymorphiosms (SNP) with the
corresponding genes using biomaRt. The result is the following
data.frame (pasted from R):
snp ensembl_gene_id
1 rs8032583
2 rs1071600 ENSG00000101605
3 rs13406898 ENSG00000167165
4 rs7030479 ENSG00000107249
5 rs1244414 ENSG00000165629
6 rs1005636 ENSG00000230681
7 rs927913 ENSG00000151655;ENSG00000227546
8 rs4832680
9 rs4435168 ENSG00000229164;ENSG00000225227;ENSG00000211817
10 rs7035549
11 rs12707538 ENSG00000186472
As you can see, the SNP with the identifier rs4435168 corresponds to 3
gene ids, rs927913 corresponds to 2 gene ids. As I'd like to perform a
join of several data.frames using the ensembl_gene_id later on, I'd
like to split columns with multiple gene identifiers into rows with
only one ensembl gene identifier each. So for the example of rs4435168
it should look like this (faked output):
snp ensembl_gene_id
...
9 rs4435168 ENSG00000229164
10 rs4435168 ENSG00000225227
11 rs4435168 ENSG00000211817
...
This is just a simple example. Finally there will be a lot of other
columns, which should be replicated like the snp column.
Does anyone know how to do this? I tried strsplit, which splits nicely
the multiple entries in column ensembl_gene_id. But how to go on?
I'd appreciate any kind of help very much!
Best regards from Munich,
Felix
More information about the R-help
mailing list