[R] How to group by and get distinct rows of of grouped rows based on certain criteria
William Dunlap
wdunlap at tibco.com
Thu Jul 14 22:53:51 CEST 2016
> txt <- "|ATP Group|Business Event|Category|
|02 |A |AC |
|02 |A |AD |
|02 |A |EQ |
|ZM |A |AU |
|ZM |A |AV |
|ZM |A |AW |
|02 |B |AC |
|02 |B |AY |
|02 |B |EQ |
"
> d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE,
check.names=FALSE)[,2:4]
> str(d)
'data.frame': 9 obs. of 3 variables:
$ ATP Group : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1
$ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
$ Category : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7
> unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")])
ATP Group Business Event
1 02 A
4 ZM A
7 02 B
> unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")])
ATP Group Business Event
3 02 A
9 02 B
Some folks prefer to use subset() instead of "[". The previous expression
is equivalent to:
> unique( subset(d, Category=="EQ", c("ATP Group", "Business Event")))
ATP Group Business Event
3 02 A
9 02 B
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
satish.vadlamani at gmail.com> wrote:
> Hello All:
> I would like to get your help on the following problem.
>
> I have the following data and the first row is the header. Spaces are not
> important.
> I want to find out distinct combinations of ATP Group and Business Event
> (these are the field names that you can see in the data below) that have
> the Category EQ (Category is the third field) and those that do not have
> the category EQ. In the example below, the combinations 02/A and 02/B have
> EQ and the combination ZM/A does not.
>
> If I have a larger file, how to get to this answer?
>
> What did I try (with dplyr)?
>
> # I know that the below is not correct and not giving desired results
> file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>%
> filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
> # for some reason, I have to convert to data.frame to print the data
> correctly
> file1_1 <- as.data.frame(file1_1)
> file1_1
>
>
> *Data shown below*
> |ATP Group|Business Event|Category|
> |02 |A |AC |
> |02 |A |AD |
> |02 |A |EQ |
> |ZM |A |AU |
> |ZM |A |AV |
> |ZM |A |AW |
> |02 |B |AC |
> |02 |B |AY |
> |02 |B |EQ |
>
> --
>
> Satish Vadlamani
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list