[R] searching and replacing in a data frame.

Joshua Wiley jwiley.psych at gmail.com
Mon Jul 18 10:04:06 CEST 2011


On Mon, Jul 18, 2011 at 12:22 AM, Ashim Kapoor <ashimkapoor at gmail.com> wrote:
>     ttt <- data.frame(A = c(Inf, 0, 0), B = c(1, 2, 3))
>>
>> apply(ttt, 2, function(x) {x[is.infinite(x)] <- 0; x})
>>
>
> Ok thank you. That does work. What does
>
> apply(ttt, 1, function(x) x[is.infinite(x)] <- 0 )
>
> this return. I get all 0's,but can you explai why ?

I think so, though it gets a bit messy.  First we can simplify things
by getting rid of apply for now and just dealing with a simple vector.

x <- c(Inf, 1)

When you type:

x[is.infinite(x)] <- 0

This function has the side effect of altering the object 'x', but it
does not actually return x (at least not for the default method, this
does not hold for data frames and possibly other methods that can be
dispatched).  Let's see what apply() gets to work with:

## simple example vector
x <- c(Inf, 1)
## store output of subassignment function
test <- x[is.infinite(x)] <- 0

## look at test and x
> test
[1] 0
> x
[1] 0 1

If you try different examples, you will see that 'test' will be
whatever the object on the right of the assignment operator was.  In
your case, it is a singleton 0.  Now, we can go look at the
documentation ?apply  sepcifically look at the "Value" section which
is what is returned.

     If each call to 'FUN' returns a vector of length 'n', then 'apply'
     returns an array of dimension 'c(n, dim(X)[MARGIN])' if 'n > 1'.
     If 'n' equals '1', 'apply' returns a vector if 'MARGIN' has length
     1 and an array of dimension 'dim(X)[MARGIN]' otherwise.  If 'n' is
     '0', the result has length 0 but not necessarily the 'correct'
     dimension.

since n = 1, apply returns an array of dimension dim(X)[MARGIN] which
in your original case is equivalent to:

> dim(ttt)[c(1, 2)]
[1] 3 2

so a 3 x 2 array is return populated with whatever value you were
using to replace Inf.  You might think that because ttt is a data
frame, the data frame method for `[<-` would get dispatched, but this
is not the case because what you are actually passing is rows or
columns of the data frame which are just vectors

> class(ttt)
[1] "data.frame"
> class(ttt)
[1] "data.frame"
> apply(ttt, 2, class)
        A         B
"numeric" "numeric"
> apply(ttt, 1, class)
[1] "numeric" "numeric" "numeric"
> apply(ttt, 1:2, class)
     A         B
[1,] "numeric" "numeric"
[2,] "numeric" "numeric"
[3,] "numeric" "numeric"


The simple way around all of this is to be clear what you what the
anonymous function (function(x) ) to return.

People better versed in the more inner workings of R may have some
corrections to how I have explained it.

HTH,

Josh

>
> Thank you.
> Ashim
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list