Rép : [R] 2 small problems: integer division and the nature of NA
Denis Chabot
chabotd at globetrotter.net
Sat Feb 5 16:31:33 CET 2005
Thanks to the many R users who convinced me that the sum of NAs should
be zero and gave me a solution if I did not want it to be zero.
Thank you also for the explanations of rounding errors with floating
point arithmetics. I did not expect it. This small error was a real
problem for me as I was trying to find a way to recode numeric values
into intervals. Because I wanted to retain numeric values as a result,
I tried not to use cut or cut2. Hence to convert a range of
temperatures into 0.2 degree intervals I had written:
(lets first make a fake temperature variable k for testing)
k <- seq(-5,5,0.1)
k1 <- ifelse(k<0,-0.2*(abs(k) %/% 0.2) - 0.1, 0.2 *(k %/% 0.2) + 0.1)
Note that this works well to quickly recode a numeric variable that
only takes integer values. But it produces the problem that prompted my
call for help when there are decimals: some values end up in a
different class than what you'd expect.
Considering your answers, I found 3 solutions:
k2 <- ifelse(k<0,-0.2*(abs(round(10*k)) %/% 2) - 0.1, 0.2 *(round(10*k)
%/% 2) + 0.1)
k3 <- (-0.1+min(k)) + 0.2 * as.numeric(cut(k,
seq(min(k),max(k)+0.2,0.2), right=F, labels=F))
k4 <- cut2(k, seq(min(k), max(k)+0.2, 0.2), levels.mean=T)
k5 <- as.numeric(levels(k7))[k7]
I could "round" to 1 decimal to be even more exact but this is good
enough. If it can be more elegant, please let me know!
Denis
> Subject: [R] 2 small problems: integer division and the nature of NA
>
>
> Hi,
>
> I'm wondering why
>
> 48 %/% 2 gives 24
> but
> 4.8 %/% 0.2 gives 23...
> I'm not trying to round up here, but to find out how many times
> something fits into something else, and the answer should have been the
> same for both examples, no?
>
> On a different topic, I like the behavior of NAs better in R than in
> SAS (at least they are not considered the smallest value for a
> variable), but at the same time I am surprised that the sum of NAs is 0
> instead of NA.
>
> The sum of a vector having at least one NA but also valid data gives NA
> if we do not specify na.rm=T. But with na.rm=T, we are telling sum to
> give the sum of valid data, ignoring NAs that do not tell us anything
> about the value of a variable. I found out while getting the sum of
> small subsets of my data (such as when subsetting by several
> variables), sometimes a "cell" only contained NAs for my response
> variable. I would have expected the sum to be NA in such cases, as I do
> not have a single data point telling me the value of my response here.
> But R tells me the sum was zero in that cell! Was this behavior
> considered "desirable" when sum was built? If not, any hope it will be
> fixed?
>
> Sincerely,
>
> Denis Chabot
>
More information about the R-help
mailing list