[BioC] Gene names

Sun Nov 6 13:08:48 CET 2005

Hi Narendra,

R is also very good for this sort of thing. Have a look at the strsplit 
function.

   x = readLines("yourfile")
   sp = strsplit(x, split="|")

(see the man page of strsplit) and from this you can construct e.g. a 
vector with the j-th column through

  sapply(sp, "[", j)

Cheers
  Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber
-------------------------------------

J.delasHeras at ed.ac.uk wrote:
> Quoting Narendra Kaushik <kaushiknk at Cardiff.ac.uk>:
> 
> 
>>I have gene file in this format, everything in one column (no spaces at all):
>>SFTPB|NM_000542.1|4506904|surfactant, pulmonary-associated protein B
>>Is there any way to convert it in this format (into four columns) except
>>manually?
>>
>>SFTPB                        NM_000542.1               4506904
>>surfactant, pulmonary-associated protein B
>>
>>Any suggestions?
>>
>>Narendra
> 
> 
> Maybe too obvious, but Excel is very good for this sort of thing. 
> Functions like
> Search allow you to obtain the position of a particulat character (like 
> "|") and
> knowing that you can select the text to the left or right to it... if you do
> that consecutively you can sort it like that. It'll take a minute.
>