[R] rounding down with as.integer

Duncan Murdoch murdoch.duncan at gmail.com
Thu Jan 1 21:06:12 CET 2015


On 01/01/2015 2:43 PM, Mike Miller wrote:
> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
> 
>> On 01/01/2015 1:21 PM, Mike Miller wrote:
>>> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>>>
>>>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>>>
>>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>>>
>>>>>> This is probably a FAQ, and I don't really have a question about it, but I just ran across this in something I was working on:
>>>>>>
>>>>>>> as.integer(1000*1.003)
>>>>>> [1] 1002
>>>>>>
>>>>>> I didn't expect it, but maybe I should have.  I guess it's about the machine precision added to the fact that as.integer always rounds down:
>>>>>>
>>>>>>
>>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>>> [1] 1002
>>>>>>
>>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>>> [1] 1003
>>>>>>
>>>>>>
>>>>>> This does it right...
>>>>>>
>>>>>>> as.integer( round( 1000*1.003 ) )
>>>>>> [1] 1003
>>>>>>
>>>>>> ...but this seems to always give the same answer and it is a little faster in my application:
>>>>>>
>>>>>>> as.integer( 1000*1.003 + .1 )
>>>>>> [1] 1003
>>>>>>
>>>>>>
>>>>>> FYI - I'm reading in a long vector of numbers from a text file with no more than three digits to the right of the decimal.  I'm converting them to integers and saving them in binary format.
>>>>>>
>>>>>
>>>>> So just add 0.0001 or even .0000001 to all of them and coerce to integer.
>>>>
>>>> I don't think the original problem was stated clearly, so I'm not sure
>>>> whether this is a solution, but it looks wrong to me.  If you want to
>>>> round to the nearest integer, why not use round() (without the
>>>> as.integer afterwards)?  Or if you really do want an integer, why add
>>>> 0.1 or 0.0001, why not add 0.5 before calling as.integer()?  This is the
>>>> classical way to implement round().
>>>>
>>>> To state the problem clearly, I'd like to know what result is expected
>>>> for any real number x.  Since R's numeric type only approximates the
>>>> real numbers we might not be able to get a perfect match, but at least
>>>> we could quantify how close we get.  Or is the input really character
>>>> data?  The original post mentioned reading numbers from a text file.
>>>
>>>
>>> Maybe you'd like to know what I'm really doing.  I have 1600 text files
>>> each with up to 16,000 lines with 3100 numbers per line, delimited by a
>>> single space.  The numbers are between 0 and 2, inclusive, and they have
>>> up to three digits to the right of the decimal.  Every possible value in
>>> that range will occur in the data.  Some examples numbers: 0 1 2 0.325
>>> 1.12 1.9.  I want to multiply by 1000 and store them as 16-bit integers
>>> (uint16).
>>>
>>> I've been reading in the data like so:
>>>
>>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>>
>>> At first I tried making the integers like so:
>>>
>>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ; proc.time()-ptm
>>>     user  system elapsed
>>>    0.187   0.387   0.574
>>>
>>> I decided I should compare with the result I got using round():
>>>
>>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) ) ; proc.time()-ptm
>>>     user  system elapsed
>>>    1.595   0.757   2.352
>>>
>>> It is a curious fact that only a few of the values from 0 to 2000 disagree
>>> between the two methods:
>>>
>>>> table( ints2[ ints2 != ints ] )
>>>
>>>   1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
>>> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112
>>>
>>> I understand that it's all about the problem of representing digital
>>> numbers in binary, but I still find some of the results a little
>>> surprising, like that list of numbers from the table() output.  For
>>> another example:
>>>
>>>> 1000+3 - 1000*(1+3/1000)
>>> [1] 1.136868e-13
>>>
>>>> 3 - 1000*(0+3/1000)
>>> [1] 0
>>>
>>>> 2000+3 - 1000*(2+3/1000)
>>> [1] 0
>>>
>>> See what I mean?  So there is something special about the numbers around
>>> 1000.
>>
>> I think it's really that there is something special about the numbers
>> near 1, and you're multiplying that by 1000.
>>
>> Numbers from 1 to just below 2 are stored as their fractional part, with
>> 52 bit precision.  Some intermediate calculations will store them with
>> 64 bit precision.  52 bits gives about 15 or 16 decimal places.
>>
>> If your number x is close to 3/1000, it is stored as the fractional part
>> of 2^9 * x.  This gives it an extra 2 or 3 decimal digits of precision,
>> so that's why these values are accurate.
>>
>> If your number x is close to 2.003, it is stored as the fractional part
>> of x/2, i.e. with errors like 1.0015 would have.  So I would have
>> guessed that 2.006 would have the same problems as 1.003, but I thought
>> you didn't see that.  So I tried it myself, and I do see that:
>>
>>> 1000+3 - 1000*(1+3/1000)
>> [1] 1.136868e-13
>>> 2000+6 - 1000*(2+6/1000)
>> [1] 2.273737e-13
>>
>> Reading more closely, I see that you didn't test this particular case,
>> so there's no contradiction here.
>>
>> The one thing I couldn't think of an explanation for is why other
>> numbers between 1 and 2 don't have the same sorts of problems.  So I
>> tried the following:
>>
>> # Set data to 1.000 thru 1.999
>> data <- 1 + 0:999/1000
>>
>> # Find the errors
>> errors <- 1000 + 0:999 - 1000*data
>>
>> # Plot them
>> plot(data, errors)
>>
>> The plot doesn't show a uniform distribution, but much more uniform than 
>> yours:  so I think your data doesn't really cover all possible values 
>> from 0.000 to 1.999.  (I get a similar plot if I look at cases where 
>> ints != ints2 with my data.)
> 
> No, my data do cover all of the values in the range (I checked that by 
> listing all of them in a file:
> 
> tr ' ' '\n' < FILE | uniq | sort -n | uniq >| uniq.txt
> 
> They are definitely all there -- all 2001 of them.  The thing is, your way 
> of making the numbers is a little different from reading them from a file. 
> You construct the number through an arithmetic operation and you get the 
> error:
> 
>> as.integer( 1000 * (1+118/1000) )
> [1] 1117
> 
> (Your first error-producing value was 1.118.)  But when the number is read 
> froma file, it starts as character and is translated to numeric.  So I 
> start with the arithmetic, convert to character, back to numeric, and then 
> I do not see the error:
> 
>> as.integer( 1000 * as.numeric( as.character( 1+118/1000 ) ) )
> [1] 1118
> 
> That reflects what was happening in my work.  To see what I'm seeing, you 
> have to do this:
> 
> data <- 1 + 0:999/1000
> errors <- 1000 + 0:999 - 1000 * as.numeric( as.character( data ) )
> plot(data, errors)

That's really interesting!  I guess the differences basically cancel out.

Duncan Murdoch

> 
> Or you could cover he full range from 0 to 2:
> 
> data <- 0:2000/1000
> errors <- 0:2000 - 1000 * as.numeric( as.character( data ) )
> plot(data, errors)
> 
> Mike
>



More information about the R-help mailing list