[R] [External] challenging data merging/joining problem
Rasmus Liland
jr@| @end|ng |rom po@teo@no
Mon Jul 6 18:15:00 CEST 2020
On 2020-07-06 12:03 +0300, Eric Berger wrote:
> On Mon, Jul 6, 2020 at 2:07 AM Richard M. Heiberger <rmh using temple.edu> wrote:
> > On Sun, Jul 5, 2020 at 2:51 PM Christopher W. Ryan <cryan using binghamton.edu> wrote:
> > >
> > > I've been conducting relatively simple
> > > COVID-19 surveillance for our
> > > jurisdiction.
> >
> > Have you talked directly to the designers
> > of the new database?
>
> Hi Christopher,
> This seems pretty standard and
> straightforward, unless I am missing
> something. You can do the "full join"
> without changing variable names. Here's a
> small code example with two tibbles, a and
> b, where the column 'x' in a corresponds to
> the column 'u' in b.
>
> a <- tibble(x=1:15,y=21:35)
> b <- tibble(u=c(1:10,51:55),z=31:45)
> foo <- merge(a,b,by.x="x",by.y="u",all.x=TRUE,all.y=TRUE)
Perhaps something like
new_names <-
c("dob"="birthdate",
"lastName"="last_name",
"firstName"="first_name")
idx <- match(x=names(new_names),
table=colnames(dataSystemA))
colnames(dataSystemA)[idx] <- new_names
merge(
x=dataSystemA,
y=dataSystemB,
by=new_names,
all=TRUE)
which yields
birthdate last_name first_name onsetDate
1 2010-10-11 LOVEGOOD luna <NA>
2 2010-12-06 GRAINGER hermione 2020-07-09
3 2011-01-25 LONGBOTTOM neville 2020-07-10
4 2011-07-03 MALFOY draco <NA>
5 2011-07-14 WEASLEY ron 2020-07-08
6 2011-10-04 POTTER harry 2020-07-07
7 2012-02-13 DIGGORY cedric <NA>
symptomatic date_of_onset symptoms_present
1 NA 2020-07-12 FALSE
2 NA 2020-07-09 TRUE
3 NA 2020-07-10 TRUE
4 NA 2020-07-11 FALSE
5 FALSE <NA> NA
6 TRUE <NA> NA
7 NA 2020-07-13 TRUE
?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200706/b6602a2b/attachment.sig>
More information about the R-help
mailing list