[R] Filtering String Variables
Adams, Jean
jvadams at usgs.gov
Mon May 23 21:33:24 CEST 2016
George,
You are very close. Try this ...
# make Debitor a character variable in the data frame
ds_example$Debitor <- as.character(ds_example$Debitor)
duplicates <- duplicated(ds_example$Debitor)
duplicated_debitors <- unique(ds_example$Debitor[duplicates])
ds_duplicates <- ds_example[ds_example$Debitor %in% duplicated_debitors, ]
Jean
On Mon, May 23, 2016 at 8:28 AM, <G.Maubach at weinwolf.de> wrote:
> # Hi All,
> #
> # I have the following data frame (example):
>
> Debitor <- c("968691", "968691", "968691",
> "A04046", "A04046",
> "L0006", "L0006", "L0006",
> "L0023", "L0023",
> "L0056", "L0056",
> "L0094", "L0094", "L0094",
> "L0124", "L0124",
> "L0143",
> "L0170",
> "13459",
> "473908",
> "394704",
> "4711",
> "4712",
> "4713")
> Debitor <- as.character(Debitor)
> var1 <- c(11, 12, 13,
> 14, 14,
> 12, 13, 14,
> 10, 11,
> 12, 12,
> 12, 12, 12,
> 15, 17,
> 11,
> 14,
> 12,
> 17,
> 13,
> 15,
> 16,
> 11)
> ds_example <- data.frame(Debitor, var1)
> ds_example$case_id <- 1:nrow(ds_example)
> ds_example <- ds_example[, sort(colnames(ds_example))]
> ds_example
>
> # I would like to generate a data frame that contains the duplicates AND
> the
> # corresponding non-duplicates to the duplicates.
> # For example, finding the duplicates with deliver case 2 and 3 but the
> list
> # should also contain case 1 because case 1 is the corresponding case to
> the
> # duplicate cases 2 and 3.
> # For the whole example dataset that would be:
> needed <- c(1, 1, 1,
> 1, 1,
> 1, 1, 1,
> 1, 1,
> 1, 1,
> 1, 1, 1,
> 1, 1,
> 0, 0, 0, 0, 0, 0, 0, 0)
> needed <- as.logical(needed)
> ds_example <- data.frame(ds_example, needed)
> ds_example
>
> # To find the duplicates and the corresponding non-duplicates
> duplicates <- duplicated(ds_example$Debitor)
>
> list_of_duplicated_debitors <- as.character(ds_example[duplicates,
> "Debitor"])
>
> filter_variable <- unique(list_of_duplicated_debitors)
>
> ds_duplicates <- ds_example["Debitor" == filter_variable] # Result:
> dataset with 0 columns
>
> ds_duplicates <- ds_example["Debitor"] %in% filter_variable # Result:
> FALSE
>
> # How can I create a dataset like this
>
> ds_example <- ds_example[needed, ]
> ds_example
>
> # using the Debitor IDs?
>
> Kind regards
>
> Georg Maubach
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list