[R] sqldf not joining all the fields
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Mar 12 18:01:33 CET 2010
Can you show the output of dput(x_data) and dput(y_data).
On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02 <nvanzuydam at gmail.com> wrote:
>
> Dear R users,
>
> I have two data frames that were read from text files as follows:
>
> x_data <- read.table("x.txt", header = TRUE, sep = "|", quote = "\"'",
> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
> nrows = 3864284,
> skip = 0, check.names = TRUE,fill=TRUE,
> strip.white = TRUE, blank.lines.skip = TRUE,
> comment.char = "#", allowEscapes = FALSE, flush = FALSE,
> fileEncoding = "", encoding = "unknown")
>
> x_data
>
> prochi prescribed_date dataMonth item_code res_seqno quantity directions
> CAO0000713 22/06/2001 NULL 842752 NULL 60 1/D
> CAO0000713 28/04/2000 NULL 7800 NULL 100G A/TD
> CAO0000713 10/04/2000 NULL 842652 NULL 60 1/D
> CAO0000713 03/07/2000 NULL 842652 NULL 60 1/D
> CAO0000713 09/01/2001 NULL 842752 NULL 60 1/D
> CAO0000713 16/10/2001 NULL 842752 NULL 60 1/D
> CAO0000713 16/08/2001 NULL 842752 NULL 60 1/D
> CAO0000713 17/09/1993 NULL 39620 NULL 5ML NIL
> CAO0000713 01/05/2001 NULL 842752 NULL 60 1/D
> CAO0000713 05/03/2001 NULL 842752 NULL 60 1/D
>
>
>
> y_data
>
> item_code name formulation_code strength
> bnf_code
> 100 NEONACLEX K TABS NULL 2.2.8
> 110 NEONACLEX TABS 5MG 2.2.1
> 50 MESORB DRESS 10CMX10CM 20.3.1
> 160 ABSORBENT CELLULOSE MESO DRESS 10CMX10CM 20.3.1
> 161 ABSORBENT CELLULOSE MESO DRESS 10CMX15CM 20.3.1
> 164 ABSORBENT CELLULOSE MESO DRESS 20CMX25CM 20.3.1
> 200 SEPTRIN TABS 480MG 5.1.8
> 210 SEPTRIN PAED SF SUSP 240MG/5ML 5.1.8
> 212 SEPTRIN ADULT SUSP 480MG/5ML 5.1.8
> 220 SEPTRIN FORTE TABS 960MG 5.1.8
> etc....
>
>
> contains all the information for the item codes
> y was read in in the same way.
>
> I then used the following code:
>
> z <- sqldf("select * from x left join y using (code)")
>
> when I use this on my real data I get an output:
> prochi prescribed_date dataMonth item_code res_seqno quantity directions
> 1 CAO0000713 22/06/2001 NULL 842752 NULL 60
> 1/D
> 2 CAO0000713 28/04/2000 NULL 7800 NULL 100G
> A/TD
> 3 CAO0000713 10/04/2000 NULL 842652 NULL 60
> 1/D
> 4 CAO0000713 03/07/2000 NULL 842652 NULL 60
> 1/D
> 5 CAO0000713 09/01/2001 NULL 842752 NULL 60
> 1/D
> 6 CAO0000713 16/10/2001 NULL 842752 NULL 60
> 1/D
> 7 CAO0000713 16/08/2001 NULL 842752 NULL 60
> 1/D
> 8 CAO0000713 17/09/1993 NULL 39620 NULL 5ML
> NIL
> 9 CAO0000713 01/05/2001 NULL 842752 NULL 60
> 1/D
> 10 CAO0000713 05/03/2001 NULL 842752 NULL 60
> 1/D
> no_of_packs datasource scan_ref_no name formulation_code strength
> 1 NULL TSF NULL <NA> <NA> <NA>
> 2 NULL TSF NULL BETNOVATE RD OINT 0.025%
> 3 NULL TSF NULL <NA> <NA> <NA>
> 4 NULL TSF NULL <NA> <NA> <NA>
> 5 NULL TSF NULL <NA> <NA> <NA>
> 6 NULL TSF NULL <NA> <NA> <NA>
> 7 NULL TSF NULL <NA> <NA> <NA>
> 8 NULL TSF NULL GAMMABULIN INJ 320MG
> 9 NULL TSF NULL <NA> <NA> <NA>
> 10 NULL TSF NULL <NA> <NA> <NA>
> bnf_code
> 1 <NA>
> 2 13.4.1.2
> 3 <NA>
> 4 <NA>
> 5 <NA>
> 6 <NA>
> 7 <NA>
> 8 14.5
> 9 <NA>
> 10 <NA>
>
>
> There is absolutely no reason for there to be <NA> anywhere as the
> information for both the tables is complete.
>
> Not sure what the problem is?
>
> Thanks,
> Natalie
> --
> View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list