[R] Combine two dataframe with different row number and interpolation between values

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Thu Sep 1 12:29:09 CEST 2022


Hi

You could also use approxfun to fill NA values

fff <- approxfun(1:dim(df3)[1], df3$ws.x)
df3$ws2 <- fff(1:dim(df3)[1])
head(df3)
     y d h     ws.x ws.y      ws2
1 2010 1 1       NA   NA       NA
2 2010 1 2       NA   NA       NA
3 2010 1 3 20.46314   NA 20.46314
4 2010 1 4       NA   NA 20.74001
5 2010 1 5       NA   NA 21.01689
6 2010 1 6 21.29376   NA 21.29376

And fill starting or trailing NA by na.locf function from zoo package or
change starting or trailing NA to some value before calling approxfun.

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jim Lemon
> Sent: Thursday, September 1, 2022 11:31 AM
> To: javad bayat <j.bayat194 using gmail.com>; r-help mailing list <r-help using r-
> project.org>
> Subject: Re: [R] Combine two dataframe with different row number and
> interpolation between values
> 
> Hi Javad,
> You seem to have the data frame join worked out, so here is a function
> to interpolate over sequences of NAs. I have found it quite useful.
> 
> # interpolate over sequences of NAs
> # NA sequences at the beginning of a file are replaced with the first
> nonNA value
> # NA sequences at the end are filled with the last nonNA value
> # sequences of all NAs are returned unaltered
> interpNA<-function(x) {
>  if(any(is.na(x))) {
>   xrle<-rle(is.na(x))
>   begin<-end<-1
>   # if the sequence begins with NA
>   # set the initial run of NAs to the 1st non-NA
>   if(is.na(x[1])) x[1:xrle$lengths[1]]<-x[xrle$lengths[1]+1]
>   for(i in 1:length(xrle$values)) {
>    # if this is a run of NA, fill it
>    if(xrle$values[i]) {
>     end<-begin+xrle$lengths[i]+(i>1)
>     if(is.na(x[end])) x[begin:(end-1)]<-x[begin]
>     else x[begin:end]<-seq(x[begin],x[end],length.out=1+end-begin)
>     begin<-end+xrle$lengths[i]-1
>   }
>    else {
>     # set begin to the end of the non-NA values
>     begin<-end+xrle$lengths[i]-1
>    }
>   }
>  }
>  return(x)
> }
> 
> Jim
> 
> On Thu, Sep 1, 2022 at 3:29 PM javad bayat <j.bayat194 using gmail.com> wrote:
> >
> >  Dear Tim;
> > The dplyr did not work for me. My data frames have exactly similar
columns
> > but different row numbers.
> > Mr Petr sent this code and the code worked but it copied the second
> > dataframe into the first one and did not replace the corresponding row.
> > > df3 = merge(df1, df2, all = TRUE)
> > Regarding filling the "NA" data, it does not matter for me to
interpolate
> > between numbers or put the mean of numbers.
> > Sincerely
> >
> >
> >
> >
> >
> >
> > On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert using ufl.edu>
> wrote:
> >
> > > Can I interest you in the join functions in dplyr?
> > > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/
> > >
> > > Filling in missing data is a useful practice when the fake (simulated)
> > > data is a small proportion of all data. When 2/3 of the data is fake
one
> > > must wonder if anything based on those numbers is real or an artifact
of
> > > assumptions made to generate the numbers.
> > >
> > > I deal with weather data and some weather stations are set up to
average
> > > measurements between recorded values and others take a single reading
at
> > > regular intervals. How you interpolate might depend on which option
> > > describes your data. The alternative is to use the method for
recording ws
> > > and apply it to the other data that will be merged with these data. I
> > > assume there is more data, otherwise I see little point in expanding
these
> > > values out.
> > >
> > > Tim
> > >
> > > -----Original Message-----
> > > From: R-help <r-help-bounces using r-project.org> On Behalf Of javad bayat
> > > Sent: Wednesday, August 31, 2022 2:09 AM
> > > To: r-help using r-project.org
> > > Subject: [R] Combine two dataframe with different row number and
> > > interpolation between values
> > >
> > > [External Email]
> > >
> > >  Dear all,
> > > I am trying to combine two large dataframe in order to make a
dataframe
> > > with exactly the dimension of the second dataframe.
> > > The first df is as follows:
> > >
> > > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d
=
> > > rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
> > >       h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by =
> > > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
> > >       ws = rnorm(1:14600, mean=20))
> > > > head(df1)
> > >      y       d   h        ws
> > > 1  2010  1  3     20.71488
> > > 2  2010  1  6     19.70125
> > > 3  2010  1  9     21.00180
> > > 4  2010  1 12     20.29236
> > > 5  2010  1 15     20.12317
> > > 6  2010  1 18     19.47782
> > >
> > > The data in the "ws" column were measured with 3 hours frequency and I
> > > need data with one hour frequency. I have made a second df as follows
with
> > > one hour frequency for the "ws" column.
> > >
> > > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d
=
> > > rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
> > >       h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> > > > head(df2)
> > >       y      d    h   ws
> > > 1  2010  1    1   NA
> > > 2  2010  1    2   NA
> > > 3  2010  1    3   NA
> > > 4  2010  1    4   NA
> > > 5  2010  1    5   NA
> > > 6  2010  1    6   NA
> > >
> > > What I am trying to do is combine these two dataframes so as to the
rows in
> > > df1 (based on the values of "y", "d", "h" columns) that have values
> > > exactly similar to df2's rows copied in its place in the new df (df3).
> > > For example, in the first dataframe the first row was measured at 3
> > > o'clock on the first day of 2010 and this row must be placed on the
third
> > > row of the second dataframe which has a similar value (2010, 1, 3).
Like
> > > the below
> > > table:
> > >       y      d    h   ws
> > > 1  2010  1    1   NA
> > > 2  2010  1    2   NA
> > > 3  2010  1    3   20.71488
> > > 4  2010  1    4   NA
> > > 5  2010  1    5   NA
> > > 6  2010  1    6   19.70125
> > >
> > > But regarding the values of the "ws" column for df2 that do not have
value
> > > (at 4 and 5 o'clock), I need to interpolate between the before and
after
> > > values to fill in the missing data of the "ws".
> > > I have tried the following codes but they did not work correctly.
> > >
> > > > df3 = merge(df1, df2, by = "y")
> > > Error: cannot allocate vector of size 487.9 Mb or
> > > > library(dplyr)
> > > > df3<- df1%>% full_join(df2)
> > >
> > >
> > > Is there any way to do this?
> > > Sincerely
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Javad Bayat
> > > M.Sc. Environment Engineering
> > > Alternative Mail: bayat194 using yahoo.com
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >
> > >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.et
> hz.ch%2Fmailman%2Flistinfo%2Fr-
> help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d90
> 8da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C6379
> 75232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&a
> mp;sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&a
> mp;reserved=0
> > > PLEASE do read the posting guide
> > >
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-
> project.org%2Fposting-
> guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc
> 23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7
> C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%
> 7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3
> D&reserved=0
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail: bayat194 using yahoo.com
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list