[R] Condition to factor (easy to remember)

Gabor Grothendieck ggrothendieck at gmail.com
Wed Sep 30 16:44:36 CEST 2009

1. A common way of doing this is cut:

  > cut(data, c(-Inf, 10, Inf), lab = levs, right = TRUE)
  [1] Pre  Pre  Pre  Post Post
  Levels: Pre Post

We don't actually need right=TRUE as its the default but if you omit
it then it can be hard to remember whether the right end of intervals
are included or excluded in the subdivision so I would recommend
including it as a matter of course.  Slightly less safe but if you
knew the values were  integral then another approach that would allow
dropping the right= argument would be to use 10.5 as the breakpoint in
which case the setting of right= does not matter anyways.

2. Similar to cut is findInterval so the subscripting of your first
solution could be done via findInterval:

   > levs[ findInterval(data, c(-Inf, 10), right = TRUE) ]
   [1] "Pre"  "Pre"  "Pre"  "Post" "Post"

The same comment regarding 10.5 applies.  I've omitted the factor(...)
part to focus on the difference and in the remaining examples have
done that too.

3. Either of these could replace the ifelse.  Both work by vectorizing
an ordinary if but sapply is a more common way to do it so is likely
preferable from the viewpoint of clarity.

   > # 3a
   > sapply(data, function(x) if (x <= 10) levs[1] else levs[2])
   [1] "Pre"  "Pre"  "Pre"  "Post" "Post"

   > # 3b
   > Vectorize(function(x) if (x <= 10) levs[1] else levs[2])(data)
   [1] "Pre"  "Pre"  "Pre"  "Post" "Post"

4. The subscripting in your first solution could be done like this
which is a bit longer but is arguably easier to understand:

   > levs[ 1 * (data <=10) + 2 * (data > 10) ]
   [1] "Pre"  "Pre"  "Pre"  "Post" "Post"

On Wed, Sep 30, 2009 at 3:43 AM, Dieter Menne
<dieter.menne at menne-biomed.de> wrote:
> Dear List,
> creating factors in a given non-default orders is notoriously difficult to
> explain in a course. Students love the ifelse construct given below most,
> but I remember some comment from Martin Mächler (?) that ifelse should be
> banned from courses.
> Any better idea? Not necessarily short, easy to remember is important.
> Dieter
> data = c(1,7,10,50,70)
> levs = c("Pre","Post")
> # Typical C-Programmer style
> factor(levs[as.integer(data >10)+1], levels=levs)
> # Easiest to understand
> factor(ifelse(data <=10, levs[1], levs[2]), levels=levs)
> --
> View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list