[R] Replace NAs in split lists

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Jan 8 09:12:42 CET 2018


Upon closer examination I see that you are not using the split version of 
df1 as I usually would, so here is a reproducible example:

#----
df1 <- read.table( text=
"ID ID_2 Firist Value
1  a   aa   TRUE     2
2  a   ab  FALSE    NA
3  a   ac  FALSE    NA
4  b   aa   TRUE     5
5  b   ab  FALSE    NA
", header=TRUE, as.is=TRUE )

sdf <- split( df1, df1$ID )
# note the extra [ 1 ] in case you have more than one non-NA value 
# per ID
sdf2 <- lapply( sdf
               , function( z ) {
                  z$Value <- ifelse( is.na( z$Value )
                                   , z$Value[ !is.na( z$Value ) ][ 1 ]
                                   , z$Value
                                   )
                  z
                 }
               )
df2 <- do.call( rbind, sdf2 )
df2
#>     ID ID_2 Firist Value
#> a.1  a   aa   TRUE     2
#> a.2  a   ab  FALSE     2
#> a.3  a   ac  FALSE     2
#> b.4  b   aa   TRUE     5
#> b.5  b   ab  FALSE     5

# or using tidyverse methods

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
df3 <- (   df1
        %>% group_by( ID )
        %>% do({
               mutate( .
                     , Value = ifelse( is.na( Value )
                                     , Value[ !is.na( Value ) ][ 1 ]
                                     , Value
                                     )
                     )
            })
        %>% ungroup
        )
df3
#> # A tibble: 5 x 4
#>   ID    ID_2  Firist Value
#>   <chr> <chr> <lgl>  <int>
#> 1 a     aa    T          2
#> 2 a     ab    F          2
#> 3 a     ac    F          2
#> 4 b     aa    T          5
#> 5 b     ab    F          5
#----

On Sun, 7 Jan 2018, Jeff Newmiller wrote:

> Why do you want to modify df1?
>
> Why not just reassemble the parts as a new data frame and use that going 
> forward in your calculations? That is generally the preferred approach 
> in R so you can re-do your calculations easily if you find a mistake 
> later.
> -- 
> Sent from my phone. Please excuse my brevity.
>
> On January 7, 2018 7:35:59 PM PST, Ek Esawi <esawiek at gmail.com> wrote:
>> I just came up with a solution right after i posted the question, but
>> i figured there must be a better and shorter one.than my solution
>> sdf1[[1]][1,4]<-lapplyresults[[1]]
>> sdf1[[2]][1,4]<-lapplyresults[[2]]
>>
>> EK
>>
>> On Sun, Jan 7, 2018 at 10:13 PM, Ek Esawi <esawiek at gmail.com> wrote:
>>> Hi all--
>>>
>>> I stumbled on this problem online. I did not like the solution given
>>> there which was a long UDF. I thought why cannot split and l/s apply
>>> work here. My aim is to split the data frame, use l/sapply, make
>>> changes on the split lists and combine the split lists to new data
>>> frame with the desired changes/output.
>>>
>>> The data frame shown below has a column named ID which has 2
>> variables
>>> a and b; i want to replace the NAs on the Value column by 2, which is
>>> the only numeric entry, for ID=a and by 5 for ID=b.
>>>
>>> I worked out the solution but could not replace the results in the
>> split lists.
>>>
>>> Original dataframe , df1
>>>   ID ID_2 Firist Value
>>> 1  a   aa   TRUE     2
>>> 2  a   ab  FALSE    NA
>>> 3  a   ac  FALSE    NA
>>> 4  b   aa   TRUE     5
>>> 5  b   ab  FALSE    NA
>>> Sdf1
>>> $a
>>> ID ID_2 Firist Value
>>> 1  a   aa   TRUE     2
>>> 2  a   ab  FALSE    NA
>>> 3  a   ac  FALSE    NA
>>> $b
>>>   ID ID_2 Firist Value
>>> 4  b   aa   TRUE     5
>>> 5  b   ab  FALSE    NA
>>> Desired results
>>> ID ID_2 Firist Value
>>> 1  a   aa   TRUE    2
>>> 2  a   ab  FALSE    2
>>> 3  a   ac  FALSE    2
>>>
>>> $b
>>>   ID ID_2 Firist Value
>>> 4  b   aa   TRUE     5
>>> 5  b   ab  FALSE     5
>>>
>>> My code
>>>
>>> sdf <- split(df1,df$ID)
>>> lapply(sdf, function(z)
>> ifelse(is.na(z$Value),z$Value[!is.na(z$Value)],z$Value))
>>> result:
>>> $ a: num [1:3] 2 2 2
>>> $ b: num [1:2] 5 5
>>>
>>> How could I put these two lists back in the split data frame, sdf1?
>>> Then I could use do.call to reassemble a data frame from the split
>>> lists,
>>>
>>> Thanks,
>>> EK
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list