[R] the difference between "-" and "!" between base and data.table package
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Sun Apr 16 09:51:28 CEST 2017
! is a logical operator... it means "not". When you write
lidx <- seq_along( mtcars[[ 1 ]] ) %in% train_indices
you end up with a vector of logical values for which ! makes sense. Since R supports logical indexing this can be a very convenient way to select one group or the other.
If you give an integer to the ! operator, any non-zero value is treated as TRUE, which can be useful sometimes but not in this case, since all of the train_indices are greater than zero. Look at what !train_indices actually is.
As the Introduction to R document says, integer indexing always starts at 1 instead of zero as in many other languages. This makes it feasible to let negative integers as indexes represent the idea of excluding those positions. Thus
identical( mtcars[ !lidx, ], mtcars[ -train_indices, ] )
The ItoR document is really quite informative to re-read occasionally. For example, look up indexing with a matrix as the index.
--
Sent from my phone. Please excuse my brevity.
On April 15, 2017 5:18:43 PM PDT, Carl Sutton via R-help <r-help at r-project.org> wrote:
>Hi
>
>
>I normally use package data.table but today was doing some base R
>coding. Had a problem for a bit which I finally resolved. I was
>attempting to separate a data frame between train and test sets, and in
>base R was using the "!" to exclude training set indices from the data
>frame. All I was getting was zero observations. Changed to using "-"
>and it worked. I recalled that in data.table the "!" function worked,
>so created this little bit of code.
>
># Base R Functions
>str(mtcars)
>train_indices <- sample(nrow(mtcars), round(0.75*nrow(mtcars)))
>train <- mtcars[train_indices,]
>mode(train_indices); class(train_indices)
>test <- mtcars[!train_indices,] # the "!" function returning 0
>observations
>test_1 <- mtcars[-train_indices,]
>identical(test, test_1)
>
># Using data.table package
>library(data.table)
>dt1 <- data.table(mtcars)
>train_indices <- sample(nrow(dt1), round(0.75*nrow(dt1)))
>train <- dt1[train_indices,]
>mode(train_indices); class(train_indices)
>test <- dt1[!train_indices,] # the "!" function
>test_1 <- dt1[-train_indices,]
>identical(test, test_1)
>The documentation appears to me to accept "!" in base, so do I have
>some kind of ridiculous error or ..??
>Carl Sutton
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list