[R] Removing variables from data frame with a wile card
Valentin Petzel
v@|ent|n @end|ng |rom petze|@@t
Sat Jan 14 19:21:07 CET 2023
Hello Avi,
while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily.
There is a data structure that is in fact mutable which are environments. For example compare
L <- list()
local({L$a <- 3})
L$a
with
E <- new.env()
local({E$a <- 3})
E$a
The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made.
Under the hood we have a parser trick: If R sees something like
f(a) <- ...
it will look for a function f<- and call
a <- f<-(a, ...)
(this also happens for example when you do names(x) <- ...)
So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result.
The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing
d[, (cols_to_remove) := NULL]
will actually change the data.
Regards,
Valentin
14.01.2023 18:28:33 avi.e.gross using gmail.com:
> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep.
>
> R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier.
>
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons <akwsimmo using gmail.com>
> Cc: R-help Mailing List <r-help using r-project.org>
> Subject: Re: [R] Removing variables from data frame with a wile card
>
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo using gmail.com> wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses
>> extended regular expressions to find matches, but you can also use
>> perl regular expressions and globbing (after converting to a regular expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns starting with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen using ntu.edu.tw> wrote:
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>> How do I remove them with a wild card----something similar to "del yr*"
>>> in Windows/doc? Thank you.
>>>
>>>> colnames(mydata)
>>> [1] "year" "weight" "confeduc" "confothr" "college"
>>> [6] ...
>>> [41] "yr3" "yr4" "yr5" "yr6" "yr7"
>>> [46] "yr8" "yr9" "yr10" "yr11" "yr12"
>>> [51] "yr13" "yr14" "yr15" "yr16" "yr17"
>>> [56] "yr18" "yr19" "yr20" "yr21" "yr22"
>>> [61] "yr23" "yr24" "yr25" "yr26" "yr27"
>>> [66] "yr28"...
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list