[R] How to remove all rows that have a numeric in the first (or any) column
Avi Gross
@v|gro@@ @end|ng |rom ver|zon@net
Wed Sep 15 07:22:56 CEST 2021
You are correct, Gregg, I am aware of that trick of asking something to not be evaluated in certain ways.
And you can indeed use base R to play with contents of beta as defined above. Here is a sort of incremental demo:
> sapply(mydf$beta, is.numeric)
[1] FALSE TRUE TRUE FALSE
> !sapply(mydf$beta, is.numeric)
[1] TRUE FALSE FALSE TRUE
> keeping <- !sapply(mydf$beta, is.numeric)
> mydf[keeping, ]
# A tibble: 2 x 2
alpha beta
<int> <list>
1 1 <chr [1]>
2 4 <chr [1]>
> str(mydf[keeping, ])
tibble [2 x 2] (S3: tbl_df/tbl/data.frame)
$ alpha: int [1:2] 1 4
$ beta :List of 2
..$ : chr "Hello"
..$ : chr "bye"
Now for the bad news. The original request was for ANY column. But presumably one way to do it, neither efficiently nor the best, would be to loop on the names of all the columns and starting with the original data.frame, whittle away at it column by column and adjust which column you search each time until what is left had nothing numeric anywhere.
Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to evaluate across a row.
Using your technique I made the following data.frame:
mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")),
beta=I(list(1, "second", 3.3, "Lasting")))
> mydf
alpha beta
1 first 1
2 2 second
3 3.3 3.3
4 Last Lasting
Do we agree only the fourth row should be kept as the others have one or two numeric values?
Here is some code I cobbled together that seems to work:
rowwise(mydf) %>%
mutate(alphazoid=!is.numeric(unlist(alpha)),
betazoid=!is.numeric(unlist(beta))) %>%
filter(alphazoid & betazoid) -> result
str(result)
print(result)
result[[1,1]]
result[[1,2]]
as.data.frame(result)
The results are shown below that only the fourth row was kept:
> rowwise(mydf) %>%
+ mutate(alphazoid=!is.numeric(unlist(alpha)),
+ betazoid=!is.numeric(unlist(beta))) %>%
+ filter(alphazoid & betazoid) -> result
>
> str(result)
rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame)
$ alpha :List of 1
..$ : chr "Last"
..- attr(*, "class")= chr "AsIs"
$ beta :List of 1
..$ : chr "Lasting"
..- attr(*, "class")= chr "AsIs"
$ alphazoid: logi TRUE
$ betazoid : logi TRUE
- attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
..$ .rows: list<int> [1:1]
.. ..$ : int 1
.. ..@ ptype: int(0)
> print(result)
# A tibble: 1 x 4
# Rowwise:
alpha beta alphazoid betazoid
<I<list>> <I<list>> <lgl> <lgl>
1 <chr [1]> <chr [1]> TRUE TRUE
> result[[1,1]]
[[1]]
[1] "Last"
> result[[1,2]]
[[1]]
[1] "Lasting"
> as.data.frame(result)
alpha beta alphazoid betazoid
1 Last Lasting TRUE TRUE
Of course, the temporary columns for alphazoid and betazoid can trivially be removed.
From: Andrew Simmons <akwsimmo using gmail.com>
Sent: Wednesday, September 15, 2021 12:44 AM
To: Avi Gross <avigross using verizon.net>
Cc: Gregg Powell via R-help <r-help using r-project.org>
Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column
I'd like to point out that base R can handle a list as a data frame column, it's just that you have to make the list of class "AsIs". So in your example
temp <- list("Hello", 1, 1.1, "bye")
data.frame(alpha = 1:4, beta = I(temp))
means that column "beta" will still be a list.
On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help <r-help using r-project.org <mailto:r-help using r-project.org> > wrote:
Calling something a data.frame does not make it a data.frame.
The abbreviated object shown below is a list of singletons. If it is a column in a larger object that is a data.frame, then it is a list column which is valid but can be ticklish to handle within base R but less so in the tidyverse.
For example, if I try to make a data.frame the normal way, the list gets made into multiple columns and copied to each row. Not what was expected. I think some tidyverse functionality does better.
Like this:
library(tidyverse)
temp=list("Hello", 1, 1.1, "bye")
Now making a data.frame has an odd result:
> mydf=data.frame(alpha=1:4, beta=temp)
> mydf
alpha beta..Hello. beta.1 beta.1.1 beta..bye.
1 1 Hello 1 1.1 bye
2 2 Hello 1 1.1 bye
3 3 Hello 1 1.1 bye
4 4 Hello 1 1.1 bye
But a tibble handles it:
> mydf=tibble(alpha=1:4, beta=temp)
> mydf
# A tibble: 4 x 2
alpha beta
<int> <list>
1 1 <chr [1]>
2 2 <dbl [1]>
3 3 <dbl [1]>
4 4 <chr [1]>
So if the data does look like this, with a list column, but access can be tricky as subsetting a list with [] returns a list and you need [[]].
I found a somehwhat odd solution like this:
mydf %>%
filter(!map_lgl(beta, is.numeric)) -> mydf2
# A tibble: 2 x 2
alpha beta
<int> <list>
1 1 <chr [1]>
2 4 <chr [1]>
When I saved that result into mydf2, I got this.
Original:
> str(mydf)
tibble [4 x 2] (S3: tbl_df/tbl/data.frame)
$ alpha: int [1:4] 1 2 3 4
$ beta :List of 4
..$ : chr "Hello"
..$ : num 1
..$ : num 1.1
..$ : chr "bye"
Output when any row with a numeric is removed:
> str(mydf2)
tibble [2 x 2] (S3: tbl_df/tbl/data.frame)
$ alpha: int [1:2] 1 4
$ beta :List of 2
..$ : chr "Hello"
..$ : chr "bye"
So if you try variations on your code motivated by what I show, good luck. I am sure there are many better ways but I repeat, it can be tricky.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org <mailto:r-help-bounces using r-project.org> > On Behalf Of Jeff Newmiller
Sent: Tuesday, September 14, 2021 11:54 PM
To: Gregg Powell <g.a.powell using protonmail.com <mailto:g.a.powell using protonmail.com> >
Cc: Gregg Powell via R-help <r-help using r-project.org <mailto:r-help using r-project.org> >
Subject: Re: [R] How to remove all rows that have a numeric in the first (or any) column
You cannot apply vectorized operators to list columns... you have to use a map function like sapply or purrr::map_lgl to obtain a logical vector by running the function once for each list element:
sapply( VPN_Sheet1$HVA, is.numeric )
On September 14, 2021 8:38:35 PM PDT, Gregg Powell <g.a.powell using protonmail.com <mailto:g.a.powell using protonmail.com> > wrote:
>Here is the output:
>
>> str(VPN_Sheet1$HVA)
>List of 2174
> $ : chr "Email: fffd using fffffffffff.com <mailto:fffd using fffffffffff.com> "
> $ : num 1
> $ : chr "Eloisa Libas"
> $ : chr "Percival Esquejo"
> $ : chr "Louchelle Singh"
> $ : num 2
> $ : chr "Charisse Anne Tabarno, RN"
> $ : chr "Sol Amor Mucoy"
> $ : chr "Josan Moira Paler"
> $ : num 3
> $ : chr "Anna Katrina V. Alberto"
> $ : chr "Nenita Velarde"
> $ : chr "Eunice Arrances"
> $ : num 4
> $ : chr "Catherine Henson"
> $ : chr "Maria Carla Daya"
> $ : chr "Renee Ireine Alit"
> $ : num 5
> $ : chr "Marol Joseph Domingo - PS"
> $ : chr "Kissy Andrea Arriesgado"
> $ : chr "Pia B Baluyut, RN"
> $ : num 6
> $ : chr "Gladys Joy Tan"
> $ : chr "Frances Zarzua"
> $ : chr "Fairy Jane Nery"
> $ : num 7
> $ : chr "Gladys Tijam, RMT"
> $ : chr "Sarah Jane Aramburo"
> $ : chr "Eve Mendoza"
> $ : num 8
> $ : chr "Gloria Padolino"
> $ : chr "Joyce Pearl Javier"
> $ : chr "Ayza Padilla"
> $ : num 9
> $ : chr "Walfredson Calderon"
> $ : chr "Stephanie Anne Militante"
> $ : chr "Rennua Oquilan"
> $ : num 10
> $ : chr "Neil John Nery"
> $ : chr "Maria Reyna Reyes"
> $ : chr "Rowella Villegas"
> $ : num 11
> $ : chr "Katelyn Mendiola"
> $ : chr "Maria Riza Mariano"
> $ : chr "Marie Vallianne Carantes"
> $ : num 12
>
>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
>On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller <jdnewmil using dcn.davis.ca.us <mailto:jdnewmil using dcn.davis.ca.us> > wrote:
>
>> An atomic column of data by design has exactly one mode, so if any
>> values are non-numeric then the entire column will be non-numeric.
>> What does
>>
>
>> str(VPN_Sheet1$HVA)
>>
>
>> tell you? It is likely either a factor or character data.
>>
>
>> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help r-help using r-project.org <mailto:r-help using r-project.org> wrote:
>>
>
>> > > Stuck on this problem - How does one remove all rows in a dataframe that have a numeric in the first (or any) column?
>> >
>
>> > > Seems straight forward - but I'm having trouble.
>> >
>
>> > I've attempted to used:
>> >
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
>> >
>
>> > and
>> >
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
>> >
>
>> > Neither work - Neither throw an error.
>> >
>
>> > class(VPN_Sheet1$HVA) returns:
>> >
>
>> > [1] "list"
>> >
>
>> > So, the HVA column returns a list.
>> >
>
>> > > Data looks like the attached screen grab -
>> >
>
>> > > The ONLY rows I need to delete are the rows where there is a numeric in the HVA column.
>> >
>
>> > > There are some 5000+ rows in the actual data.
>> >
>
>> > > Would be grateful for a solution to this problem.
>> >
>
>> > How to get R to detect whether the value in column 1 is a number so the rows with the number values can be deleted?
>> >
>
>> > > Thanks in advance to any and all willing to help on this problem.
>> >
>
>> > > Gregg Powell
>> >
>
>> > > Sierra Vista, AZ
>>
>
>> --
>>
>
>> Sent from my phone. Please excuse my brevity.
--
Sent from my phone. Please excuse my brevity.
______________________________________________
R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list