[R] How to globally convert NaN to NA in dataframe?
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Fri Sep 3 11:51:55 CEST 2021
Yes, even
> summary(NA_real_)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
NA NA NA NaN NA NA 1
which is presumably because the mean is an empty sum (= 0) divided by a zero count, and 0/0 = NaN.
Notice also the differenc between
> mean(NA_real_)
[1] NA
> mean(NA_real_, na.rm=TRUE)
[1] NaN
> On 3 Sep 2021, at 09:59 , Luigi Marongiu <marongiu.luigi using gmail.com> wrote:
>
> Fair enough, I'll check the actual data to see if there are indeed any
> NaN (which should not, since the data are categories, not generated by
> math).
> Thanks!
>
> On Fri, Sep 3, 2021 at 8:26 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>>
>> Hi Luigi.
>>
>> Weird. But maybe it is the desired behaviour of summary when calculating
>> mean of numeric column full of NAs.
>>
>> See example
>>
>> dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))
>>
>> # change all values in second column to NA
>> dat[,2] <- NA
>> # change some of them to NAN
>> dat[5:6, 2:3] <- 0/0
>>
>> # see summary
>> summary(dat)
>> x y z
>> Mode:logical Min. : NA Min. :-1.9798
>> NA's:110 1st Qu.: NA 1st Qu.:-0.4729
>> Median : NA Median : 0.1745
>> Mean :NaN Mean : 0.1856
>> 3rd Qu.: NA 3rd Qu.: 0.8017
>> Max. : NA Max. : 2.5075
>> NA's :110 NA's :2
>>
>> # change NAN values to NA
>> dat[sapply(dat, is.nan)] <- NA
>> *************************
>>
>> #summary is same
>> summary(dat)
>> x y z
>> Mode:logical Min. : NA Min. :-1.9798
>> NA's:110 1st Qu.: NA 1st Qu.:-0.4729
>> Median : NA Median : 0.1745
>> Mean :NaN Mean : 0.1856
>> 3rd Qu.: NA 3rd Qu.: 0.8017
>> Max. : NA Max. : 2.5075
>> NA's :110 NA's :2
>>
>> # but no NAN value in data
>> dat[1:10,]
>> x y z
>> 1 NA NA -0.9148696
>> 2 NA NA 0.7110570
>> 3 NA NA -0.1901676
>> 4 NA NA 0.5900650
>> 5 NA NA NA
>> 6 NA NA NA
>> 7 NA NA 0.7987658
>> 8 NA NA -0.5225229
>> 9 NA NA 0.7673103
>> 10 NA NA -0.5263897
>>
>> So my "nice compact command"
>> dat[sapply(dat, is.nan)] <- NA
>>
>> works as expected, but summary gives as mean NAN.
>>
>> Cheers
>> Petr
>>
>>> -----Original Message-----
>>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Luigi Marongiu
>>> Sent: Thursday, September 2, 2021 3:46 PM
>>> To: Andrew Simmons <akwsimmo using gmail.com>
>>> Cc: r-help <r-help using r-project.org>
>>> Subject: Re: [R] How to globally convert NaN to NA in dataframe?
>>>
>>> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I still
>> get
>>> NaN when using the summary function, for instance one of the columns give:
>>> ```
>>> Min. : NA
>>> 1st Qu.: NA
>>> Median : NA
>>> Mean :NaN
>>> 3rd Qu.: NA
>>> Max. : NA
>>> NA's :110
>>> ```
>>> I tried to implement the second solution but:
>>> ```
>>> df <- lapply(x, function(xx) {
>>> xx[is.nan(xx)] <- NA
>>> })
>>>> str(df)
>>> List of 1
>>> $ sd_ef_rash_loc___palm: logi NA
>>> ```
>>> What am I getting wrong?
>>> Thanks
>>>
>>> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons <akwsimmo using gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> I would use something like:
>>>>
>>>>
>>>> x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
>>>> as.data.frame() x[] <- lapply(x, function(xx) {
>>>> xx[is.nan(xx)] <- NA_real_
>>>> xx
>>>> })
>>>>
>>>>
>>>> This prevents attributes from being changed in 'x', but accomplishes the
>>> same thing as you have above, I hope this helps!
>>>>
>>>> On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu <marongiu.luigi using gmail.com>
>>> wrote:
>>>>>
>>>>> Hello,
>>>>> I have some NaN values in some elements of a dataframe that I would
>>>>> like to convert to NA.
>>>>> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
>>>>> Is there an alternative for the global modification at once of all
>>>>> instances?
>>>>> I have seen from
>>>>> https://stackoverflow.com/questions/18142117/how-to-replace-nan-
>>> value
>>>>> -with-zero-in-a-huge-data-frame/18143097#18143097
>>>>> that once could use:
>>>>> ```
>>>>>
>>>>> is.nan.data.frame <- function(x)
>>>>> do.call(cbind, lapply(x, is.nan))
>>>>>
>>>>> data123[is.nan(data123)] <- 0
>>>>> ```
>>>>> replacing o with NA, but I got
>>>>> ```
>>>>> str(df)
>>>>>> logi NA
>>>>> ```
>>>>> when modifying my dataframe df.
>>>>> What would be the correct syntax?
>>>>> Thank you
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Luigi
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Luigi
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Best regards,
> Luigi
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-help
mailing list