[R] Antwort: Re: Merging variables

Tue Jun 7 08:18:51 CEST 2016

Hi Michael,

yes, I was astonished about this behaviour either. I have worked with SPSS 
a lot - and that works different.

I would like to share some of my data. Can you tell me how I can dump a 
dataset in a way that I can post it here as text?

Kind regards

Georg

Von:    Michael Dewey <lists at dewey.myzen.co.uk>
An:     G.Maubach at weinwolf.de, r-help at r-project.org, 
Datum:  06.06.2016 15:45
Betreff:        Re: [R] Merging variables

X-Originating-<%= hostname %>-IP: [217.155.205.190]

Dear Georg

I find it a bit surprising that you end up with customer.x and 
customer.y. Can you share with us a toy example of two data.frames which 
exhibit this behaviour?

On 06/06/2016 13:29, G.Maubach at weinwolf.de wrote:
> Hi All,
>
> I merged two datasets:
>
> ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> ds_zww_customer_4_match,
>   by.x = "customer", by.y = "customer",
>   all.x = TRUE, all.y = FALSE)
>
> R created a new dataset with the variables customer.x and customer.y. I
> would like to merge these two variable back together. I wrote a little
> function (code can be run) for it:
>
> -- cut --
>
> customer.x <- c("Miller", "Smith", NA,    "Bird", NA)
> customer.y <- c("Miller",  NA,     "Doe", "Fish", NA)
> ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE)
>
> t_merge_variables <-
>   function(dataset,
>            var1,
>            var2,
>            merged_var) {
>
>     # Initialize
>     dataset[[merged_var]] = rep(NA, nrow(dataset))
>     dataset[["mismatch"]] = rep(NA, nrow(dataset))
>
>     for (i in 1:nrow(dataset)) {
>
>       # Check 1: var1 missing, var2 missing
>       if (is.na(dataset[[i, var1]]) &
>           is.na(dataset[[i, var2]])) {
>         dataset[["mismatch"]] <- 1  # var1 & var2 are missing
>
>       # Check 2: var1 filled, var2 missing
>       } else if (!is.na(dataset[[i, var1]]) &
>                  is.na(dataset[[i, var2]])) {
>         dataset[[i, merged_var]] <- dataset[[i, var1]]
>         dataset[["mismatch"]] <- 0
>
>       # Check 3: var1 missing, var2 filled
>       } else if (is.na(dataset[[i, var1]]) &
>                  !is.na(dataset[i, var2])) {
>         dataset[[i, merged_var]] <- dataset[[i, var2]]
>         dataset[["mismatch"]] <-  0
>
>       # Check 4: var1 == var2
>       } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
>       dataset[[i, merged_var]] <- dataset[[i, var1]]
>       dataset[["mismatch"]] <- 0
>
>       # Leftover: var1 != var2
>       } else {
>         dataset[[i, merged_var]] <- NA
>         dataset[["mismatch"]] <- 2  # var1 != var2
>       }  # end if
>     }  # end for
>     return(dataset)
> }
>
> ds_var_merge1 <- t_merge_variables(dataset = ds_test,
>   var1 = "customer.x",
>   var2 = "customer.y",
>   merged_var = "customer")
>
> ds_var_merge1
>
> -- cut --
>
> It is executed without error but delivers the wrong values in the 
variable
> "mismatch". This variable is always 1 although it should be NA, 1 or 2
> respectively.
>
> Can you tell me why the variable is not correctly set?
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Michael
http://www.dewey.myzen.co.uk/home.html