[R] Row exclude

Sun Jan 30 18:00:24 CET 2022

Thank you David.

What about if I want to list the excluded rows?
I used this
    (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ])

It did not work.The desired output  is,
  Alex,  20,  13X
 John,  3BC, 175
 Jack3, 34,  140

Thank you,

On Sat, Jan 29, 2022 at 10:15 PM David Carlson <dcarlson using tamu.edu> wrote:

> It is possible that there would be errors on the same row for different
> columns. This does not happen in your example. If row 4 was "John6, 3BC,
> 175X" then row 4 would be included 3 times, but we only need to remove it
> once. Removing the duplicates is not necessary since R would not get
> confused, but length(unique(c(BadName, BadAge, BadWeight)) indicates how
> many lines are being removed.
>
> David
>
> On Sat, Jan 29, 2022 at 8:32 PM Val <valkremk using gmail.com> wrote:
>
>> Thank you David for your help. I just have one question on this. What is
>> the purpose of  using the "unique" function on this?   (dat2 <-
>> dat1[-unique(c(BadName, BadAge, BadWeight)), ])   I got the same result
>> without using it. ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> ZjQcmQRYFpfptBannerEnd
>> Thank you David for your help.
>>
>> I just have one question on this. What is the purpose of  using the
>> "unique" function on this?
>>   (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
>>
>> I got the same result without using it.
>>        (dat2 <- dat1[-(c(BadName, BadAge, BadWeight)), ])
>>
>> My concern is when I am applying this for the large data set the
>> "unique"  function may consume resources(time  and memory).
>>
>> Thank you.
>>
>> On Sat, Jan 29, 2022 at 12:30 AM David Carlson <dcarlson using tamu.edu> wrote:
>>
>>> Given that you know which columns should be numeric and which should be
>>> character, finding characters in numeric columns or numbers in character
>>> columns is not difficult. Your data frame consists of three character
>>> columns so you can use regular expressions as Bert mentioned. First you
>>> should strip the whitespace out of your data:
>>>
>>> dat1 <-read.table(text="Name, Age, Weight
>>>   Alex,  20,  13X
>>>   Bob,  25,  142
>>>   Carol, 24,  120
>>>   John,  3BC,  175
>>>   Katy,  35,  160
>>>   Jack3, 34,  140",sep=",", header=TRUE, stringsAsFactors=FALSE,
>>> strip.white=TRUE)
>>>
>>> Now check to see if all of the fields are character as expected.
>>>
>>> sapply(dat1, typeof)
>>> #        Name         Age      Weight
>>> # "character" "character" "character"
>>>
>>> Now identify character variables containing numbers and numeric
>>> variables containing characters:
>>>
>>> BadName <- which(grepl("[[:digit:]]", dat1$Name))
>>> BadAge <- which(grepl("[[:alpha:]]", dat1$Age))
>>> BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight))
>>>
>>> Next remove those rows:
>>>
>>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
>>> #    Name Age Weight
>>> #  2   Bob  25    142
>>> #  3 Carol  24    120
>>> #  5  Katy  35    160
>>>
>>> You still need to convert Age and Weight to numeric, e.g. dat2$Age <-
>>> as.numeric(dat2$Age).
>>>
>>> David Carlson
>>>
>>>
>>> On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4567 using gmail.com>
>>> wrote:
>>>
>>>> As character 'polluted' entries will cause a column to be read in (via
>>>> read.table and relatives) as factor or character data, this sounds like a
>>>> job for regular expressions. If you are not familiar with this subject,
>>>> time to learn. And, yes, ZjQcmQRYFpfptBannerStart
>>>> This Message Is From an External Sender
>>>> This message came from outside your organization.
>>>> ZjQcmQRYFpfptBannerEnd
>>>>
>>>> As character 'polluted' entries will cause a column to be read in (via
>>>> read.table and relatives) as factor or character data, this sounds like a
>>>> job for regular expressions. If you are not familiar with this subject,
>>>> time to learn. And, yes, some heavy lifting will be required.
>>>> See ?regexp for a start maybe? Or the stringr package?
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jan 28, 2022, 7:08 PM Val <valkremk using gmail.com> wrote:
>>>>
>>>> > Hi All,
>>>> >
>>>> > I want to remove rows that contain a character string in an integer
>>>> > column or a digit in a character column.
>>>> >
>>>> > Sample data
>>>> >
>>>> > dat1 <-read.table(text="Name, Age, Weight
>>>> >  Alex,  20,  13X
>>>> >  Bob,   25,  142
>>>> >  Carol, 24,  120
>>>> >  John,  3BC,  175
>>>> >  Katy,  35,  160
>>>> >  Jack3, 34,  140",sep=",",header=TRUE,stringsAsFactors=F)
>>>> >
>>>> > If the Age/Weight column contains any character(s) then remove
>>>> > if the Name  column contains an digit then remove that row
>>>> > Desired output
>>>> >
>>>> >    Name   Age weight
>>>> > 1   Bob     25    142
>>>> > 2   Carol   24    120
>>>> > 3   Katy    35    160
>>>> >
>>>> > Thank you,
>>>> >
>>>> > ______________________________________________
>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
>>>> > PLEASE do read the posting guide
>>>> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>> >
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
>>>> PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>

	[[alternative HTML version deleted]]