[R] Improvement: function cut
Leonard Mada
|eo@m@d@ @end|ng |rom @yon|c@eu
Sat Sep 18 00:26:08 CEST 2021
Hello Andrew,
But "cut" generates factors. In most cases with real data one expects to
have also the ends of the interval: the argument "include.lowest" is
both ugly and too long.
[The test-code on the ftable thread contains this error! I have run
through this error a couple of times.]
The only real situation that I can imagine to be problematic:
- if the interval goes to +Inf (or -Inf): I do not know if there would
be any effects when including +Inf (or -Inf).
Leonard
On 9/18/2021 1:14 AM, Andrew Simmons wrote:
> While it is not explicitly mentioned anywhere in the documentation for
> .bincode, I suspect 'include.lowest = FALSE' is the default to keep
> the definitions of the bins consistent. For example:
>
>
> x <- 0:20
> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> cbind(
> .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
> .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> )
>
>
> by having 'include.lowest = TRUE' with different ends, you can get
> inconsistent behaviour. While this probably wouldn't be an issue with
> 'real' data, this would seem like something you'd want to avoid by
> default. The definitions of the bins are
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16]
>
>
> and
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16)
> [16, 20]
>
>
> so you can see where the inconsistent behaviour comes from. You might
> be able to get R-core to add argument 'warn', but probably not to
> change the default of 'include.lowest'. I hope this helps
>
>
> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.mada using syonic.eu
> <mailto:leo.mada using syonic.eu>> wrote:
>
> Thank you Andrew.
>
>
> Is there any reason not to make: include.lowest = TRUE the default?
>
>
> Regarding the NA:
>
> The user still has to suspect that some values were not included
> and run that test.
>
>
> Leonard
>
>
> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>> Regarding your first point, argument 'include.lowest' already
>> handles this specific case, see ?.bincode
>>
>> Your second point, maybe it could be helpful, but since both
>> 'cut.default' and '.bincode' return NA if a value isn't within a
>> bin, you could make something like this on your own.
>> Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>> <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
>>
>> Hello List members,
>>
>>
>> the following improvements would be useful for function cut
>> (and .bincode):
>>
>>
>> 1.) Argument: Include extremes
>> extremes = TRUE
>> if(right == FALSE) {
>> # include also right for last interval;
>> } else {
>> # include also left for first interval;
>> }
>>
>>
>> 2.) Argument: warn = TRUE
>>
>> Warn if any values are not included in the intervals.
>>
>>
>> Motivation:
>> - reduce risk of errors when using function cut();
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> ______________________________________________
>> R-help using r-project.org <mailto:R-help using r-project.org> mailing
>> list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible
>> code.
>>
[[alternative HTML version deleted]]
More information about the R-help
mailing list