[R] quick help needed: split a number and "find and replace" type of function that works like in MS excel
Steve Lianoglou
mailinglist.honeypot at gmail.com
Sun May 1 23:03:59 CEST 2011
Hi,
There are a couple of ways to do what you want.
I'll provide the fodder and let you finish the implementation.
On Sun, May 1, 2011 at 4:26 PM, Ram H. Sharma <sharma.ram.h at gmail.com> wrote:
> Hi R experts
>
> I have a couple of quick question:
>
> Q1
> #my data
> set.seed(12341)
> SN <- 1:100
> pool<- c(12,13,14, 23, 24, 34)
> CT1<- sample(pool, 100, replace= TRUE)
> set.seed(1242)
> CT2 <- sample(pool, 100, replace= TRUE)
> set.seed(142)
> CT3 <- sample(pool, 100, replace= TRUE)
> # the number of variables run to end of coulmn 20000
> mydf <- data.frame(SN, CT1, CT2, CT3)
>
> First question: how can I split 12 into 1 2, 13 into 1 3, 14 into 1 4?
> What I am trying here is to split each number into two and make seperate
> variable CT1a and CT1b, CT2a and CT2b, CT3a and CT3b.
>
> Tried with strsplit () but I believe this works with characters only
You can convert your numbers to characters, if you like. Using your
dataset, consider:
R> ct1.char <- as.character(mydf$CT1)
R> ct1.char <- strsplit(as.character(mydf$CT1), '')
R> ct1a <- sapply(ct1.char, '[', 1) ## "non-obvious" use of '[' as
R> ct1b <- sapply(ct1.char, '[', 2) ## a function is intentional :-)
R> head(data.frame(ct1a=ct1a, ct1b=ct1b))
ct1a ct1b
1 3 4
2 1 4
3 2 3
4 1 4
5 3 4
6 2 3
> Q2
> Is there any function that works in the same manner as find and replace
> function MS excel. Just for example, if I want to replace all 1s in the
> above data frame with "A", 2 with "B". Thus the number 12 will be converted
> to "AB". I tried with car but it very slow as I need to very large
> dataframe.
Try gsub:
R> head(ct1a)
[1] "3" "1" "2" "1" "3" "2"
R> head(gsub("1", "A", ct1a))
[1] "3" "A" "2" "A" "3" "2"
or you can use a "translation table"
R> xlate <- c('1'='A', '2'='B', '3'='C')
R> head(xlate[ct1a])
3 1 2 1 3 2
"C" "A" "B" "A" "C" "B"
You might also consider not converting your original data into
characters and splitting off the integers -- you can use modulo
arithmetic to get each digit, ie:
R> head(mydf$CT1)
[1] 34 14 23 14 34 23
## First digit
R> head(as.integer(mydf$CT1 / 10))
[1] 3 1 2 1 3 2
## Second digit
R> head(mydf$CT1 %% 10)
[1] 4 4 3 4 4 3
There's some food for thought ..
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list