[R] Cleaning
Sarah Goslee
sarah.goslee at gmail.com
Thu Nov 12 01:38:30 CET 2015
Please keep replies on the list so others may participate in the conversation.
If you have a character vector containing the potential values, you
might look at %in% for one approach to subsetting your data.
Var1 %in% myvalues
Sarah
On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote:
> Thank you Sarah for your prompt response!
>
> I have the list of values of the variable Var1 it is around 20.
> How can I modify this one to include all the 20 valid values?
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>
> Is there a way (efficient ) of doing it?
>
> Thank you again
>
>
>
> On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote:
>> > Hi all,
>> >
>> > I have a data frame with huge rows and columns.
>> >
>> > When I looked at the data, it has several garbage values need to be
>> >
>> > cleaned. For a sample I am showing you the frequency distribution
>> > of one variables
>> >
>> > Var1 Freq
>> > 1 : 3
>> > 2 ] 6
>> > 3 MSN 1040
>> > 4 YYZ 300
>> > 5 \\ 4
>> > 6 + 3
>> > 7. ?> 15
>>
>> Please use dput() to provide your data. I made a guess at what you had
>> in R, but could be wrong.
>>
>>
>> > and continues.
>> >
>> > I want to keep those rows that contain only a valid variable value
>> >
>> > In this case MSN and YYZ. I tried the following
>> >
>> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>> >
>> > but I am not getting the desired result.
>>
>> What are you getting? How does it differ from the desired result?
>>
>> > I have
>> >
>> > Any help or idea?
>>
>> I get:
>>
>> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "\\\\",
>> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X",
>> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>> >
>> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>> > test
>> X Var1 Freq
>> 3 3 MSN 1040
>> 4 4 YYZ 300
>>
>> Which seems reasonable to me.
>>
>>
>> >
>> > [[alternative HTML version deleted]]
>>
>> Please don't post in HTML either: it introduces all sorts of errors to
>> your message.
>>
>> Sarah
>>
More information about the R-help
mailing list