[R] Importing data coming from Splus into R.
William Dunlap
wdunlap at tibco.com
Fri Feb 5 20:32:36 CET 2010
> -----Original Message-----
> From: gerald.jean at dgag.ca [mailto:gerald.jean at dgag.ca]
> Sent: Friday, February 05, 2010 10:58 AM
> To: William Dunlap
> Cc: Uwe Ligges; r-help at r-project.org
> Subject: RE: [R] Importing data coming from Splus into R.
>
> Hello Bill,
>
> here is what I tried with the Splus built-in data set "claims".
>
> In Splus:
>
> apply(claims, 2, class)
> age car.age type cost number
> "ordered" "ordered" "factor" "numeric" "numeric"
> dump(list = "claims",
> fileout = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
> oldStyle = T) ## I tried both, oldStyle = T and
> oldStyle = F, same
> results.
>
> In R:
>
> claims <- source("/home/jeg002/splus/R/Exemples/R/myclaims.txt")
> apply(claims$value, 2, class) ## oldStyle = T this time.
> age car.age type cost number
> "character" "character" "character" "character" "character"
>
Use lapply(claims$value, class) instead of
apply(claims$value, 2, class). In R apply
converts its first argument into a matrix,
which will be a character matrix if any
columns are factors. In recent versions of
S+ apply(data.frame, MARGIN=2,...) avoids
the convert-to-matrix step and works on the
columns of the data.frame.
In this example it looks like the Splus dump -> R source
route works.
R> lapply(claims$value, class)
$age
[1] "ordered" "factor"
$car.age
[1] "ordered" "factor"
$type
[1] "factor"
$cost
[1] "numeric"
$number
[1] "numeric"
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> I must admit I had not tried using "write.table" from Splus.
> I did, now,
> always with the "claims" data set. On the first attempt R
> complained of no
> method to change the character variables to the "ordered"
> class. I made a
> copy of the data set in Splus, changed the class of two variables from
> "ordered" to "factor" and gave it another try. Here are the results:
>
> In Splus:
>
> new.claims <- claims
> class(new.claims$age) <- "factor"
> class(new.claims$car.age) <- "factor"
> apply(new.claims, 2, class)
> age car.age type cost number
> "factor" "factor" "factor" "numeric" "numeric"
> write.table(data = new.claims,
> file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
> sep = "@", append = F, quote.strings = T,
> dimnames.write = T, na = NA, end.of.row = "\n",
> justify.format = "decimal")
>
> In R:
>
> claims.classes <- c("character", "factor", "factor",
> "factor", "numeric",
> "numeric") ## The first "character" is for the
> row.names
> claims <-
> read.table(file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
> header = TRUE, sep = "@", quote = "\"", as.is = FALSE,
> strip.white = FALSE, comment.char = "",
> na.strings = "NA",
> nrows = 200, colClasses = claims.classes)
> apply(claims, 2, class)
> row.names age car.age type cost
> number
> "character" "character" "character" "character" "character"
> "character"
>
>
> I'd be more than happy to supply you a small sample of my
> data set if the
> built-in "claims" doesn't do the job.
>
> Thanks for your support,
>
> Gérald Jean
> Conseiller senior en statistiques,
> VP Planification et Développement des Marchés,
> Desjardins Groupe d'Assurances Générales
> télephone : (418) 835-4900 poste (7639)
> télecopieur : (418) 835-6657
> courrier électronique: gerald.jean at dgag.ca
>
> "In God we trust, all others must bring data" W. Edwards Deming
>
>
> "William Dunlap" <wdunlap at tibco.com> a écrit sur 2010/02/05 12:37:25 :
>
> > For a data.frame with only numeric and factor
> > columns using dump() on the S+ end and source()
> > on the R end ought to work. If you have timeDate
> > columns you will need to convert them to character
> > data before exporting and convert them to your
> > favorite R time/date class after importing them.
> >
> > If you could send me a fairly small sample of your
> > data that shows the incompatibility between S+'s
> > write.table and R's read.table I could try to fix
> > things up so they were more compatible.
> >
> > Code that reads the S+ native binary format must
> > be 32/64 bit aware, since S+ integers are 32 bits
> > on 32-bit versions of S+ and 64 bits on 64-bit
> > versions.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org
> > > [mailto:r-help-bounces at r-project.org] On Behalf Of Uwe Ligges
> > > Sent: Friday, February 05, 2010 8:05 AM
> > > To: Gerald Jean
> > > Cc: r-help at r-project.org
> > > Subject: Re: [R] Importing data coming from Splus into R.
> > >
> > > 1. I am stuck with a copy of S-PLUS 4.x. At that time I used
> > > dump() in
> > > S-PLUS and source() to get things into R afterwards ...
> > >
> > > 2. Why do you think that 32-bit vs. 64-bit issues matter? The file
> > > format does not change (well, this is guessed since I do
> not have any
> > > 64-bit S-PLUS version available).
> > >
> > > Best,
> > > Uwe Ligges
> > >
> > >
> > > On 05.02.2010 16:35, gerald.jean at dgag.ca wrote:
> > > >
> > > > Hello there,
> > > >
> > > > I spent all day yesterday trying to get a small data set
> > > from Splus into R,
> > > > no luck! Both, Splus and R, are run on a 64-bit RedHat
> > > Linux machine, the
> > > > versions of the softwares are 64-bit and are as what follows:
> > > >
> > > > Splus:
> > > > TIBCO Software Inc. Confidential Information
> > > > Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED.
> > > > TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL,
> 64-bit : 2008
> > > >
> > > > R:
> > > > R version 2.8.0 (2008-10-20)
> > > > Copyright (C) 2008 The R Foundation for Statistical Computing
> > > > ISBN 3-900051-07-0
> > > >
> > > > I know that the "foreign" package has a function to
> > > directly import Splus
> > > > data sets into R, but I also know that it is working
> only for 32-bit
> > > > versions of the softwares, hence I didn't try that route.
> > > Here is what I
> > > > have done:
> > > >
> > > > In Splus:
> > > >
> > > > ttt<- exportData(data = FMD.CR.test,
> > > > file =
> > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > > > type = "ASCII", delimiter = "@", quote =
> > > T, na.string =
> > > > "NA")
> > > > ttt.class<- unlist(lapply(FMD.CR.test, class))
> > > >
> > > > ### I am using "@" as delimiter since some factor levels
> > > contain both the
> > > > "," and the ";".
> > > >
> > > > In R:
> > > >
> > > > FMD.CR.test.fields<- count.fields(file =
> > > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > > > sep = "@", quote =
> > > "\"", comment.char =
> > > > "")
> > > > all(FMD.CR.test.fields == 327)
> > > > [1] TRUE ## Hence all observations have the same number of
> > > fields, so far,
> > > > so good!
> > > >
> > > > FMD.CR.test.classes<- c("factor", "character",
> "factor", "factor",
> > > > "factor",
> > > > "factor", "factor", "factor",
> > > "factor", "factor",
> > > > "factor", "numeric", "character",
> > > and so on)
> > > > names(FMD.CR.test.classes)<- c("RTA","police", "mnt.rent.bnct",
> > > > "mnt.rent.boni", "mnt.rent.cred.bnct",
> > > > "mnt.rent.epar.bnct", "mnt.rent.snbn",
> > > > "mnt.rent.trxl", "solde.eop",
> > > "solde.nenr.es",
> > > > "solde.enr.es", "num.enreg",
> > > "trouve", and so on)
> > > > FMD.CR.test<-
> > > > read.table(file =
> > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > > > header = TRUE, sep = "@", quote = "\"",
> > > as.is = FALSE,
> > > > strip.white = FALSE, comment.char = "",
> > > na.strings = "NA",
> > > > nrows = 65000, colClasses = FMD.CR.test.classes)
> > > > dim(FMD.CR.test)
> > > > [1] 64093 327 ## OK
> > > >
> > > > ### Testing if classes are the same as the Splus classes.
> > > >
> > > > FMD.CR.test.R.classes<- apply(FMD.CR.test, 2, FUN = class)
> > > > sum(FMD.CR.test.R.classes == FMD.CR.test.classes)
> > > > [1] 79 ## Not exactly what I was expecting!
> > > > all(FMD.CR.test.R.classes == "character")
> > > > [1] TRUE
> > > >
> > > > Hence all variables were imported as character, which I
> find very
> > > > inconvenient; since the data set has a few hundred
> factor variables
> > > > recoding them is a lot of work, this work has already been
> > > done in Splus;
> > > > furthermore, the numeric variables would need
> conversion as well.
> > > >
> > > > I tried all combinations of the arguments "as.is",
> > > "stringsAsFactors" and
> > > > "colClasses" to no avail. I also tried to export the data
> > > set in SAS
> > > > transport format from Splus and read it through the
> > > foreign's read.xport
> > > > function, always the same result, everything is imported as
> > > character. I
> > > > search the r-help archives, I found several messages
> > > relating this problem
> > > > but no satisfactory solution!
> > > >
> > > > I am a long time user of Splus and I am planning to use R
> > > more often,
> > > > mainly due to its wealth of packages and the convenience of
> > > installing
> > > > them. I hope to find a reliable and convivial way of
> > > transferring data
> > > > between the two cousins pieces of software.
> > > >
> > > > Thanks for any insights,
> > > >
> > > > Gérald Jean
> > > > Conseiller senior en statistiques,
> > > > VP Planification et Développement des Marchés,
> > > > Desjardins Groupe d'Assurances Générales
> > > > télephone : (418) 835-4900 poste (7639)
> > > > télecopieur : (418) 835-6657
> > > > courrier électronique: gerald.jean at dgag.ca
> > > >
> > > > "In God we trust, all others must bring data" W. Edwards Deming
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Le message ci-dessus, ainsi que les documents
> > > l'accompagnant, sont destinés
> > > > uniquement aux personnes identifiées et peuvent contenir
> > > des informations
> > > > privilégiées, confidentielles ou ne pouvant être
> > > divulguées. Si vous avez
> > > > reçu ce message par erreur, veuillez le détruire.
> > > >
> > > > This communication ( and/or the attachments ) is
> intended for named
> > > > recipients only and may contain privileged or confidential
> > > information
> > > > which is not to be disclosed. If you received this
> > > communication by mistake
> > > > please destroy all copies.
> > > >
> > > >
> > > >
> > > >
> > > > Faites bonne impression et imprimez seulement au besoin !
> > > > Think green before you print !
> > > >
> > > > Le message ci-dessus, ainsi que les documents
> > > l'accompagnant, sont destinés uniquement aux personnes
> > > identifiées et peuvent contenir des informations
> > > privilégiées, confidentielles ou ne pouvant être divulguées.
> > > Si vous avez reçu ce message par erreur, veuillez le détruire.
> > > >
> > > > This communication (and/or the attachments) is intended for
> > > named recipients only and may contain privileged or
> > > confidential information which is not to be disclosed. If you
> > > received this communication by mistake please destroy all copies.
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained,
> reproducible code.
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
>
>
> Le message ci-dessus, ainsi que les documents l'accompagnant,
> sont destinés
> uniquement aux personnes identifiées et peuvent contenir des
> informations
> privilégiées, confidentielles ou ne pouvant être divulguées.
> Si vous avez
> reçu ce message par erreur, veuillez le détruire.
>
> This communication ( and/or the attachments ) is intended for named
> recipients only and may contain privileged or confidential information
> which is not to be disclosed. If you received this
> communication by mistake
> please destroy all copies.
>
>
>
>
> Faites bonne impression et imprimez seulement au besoin !
> Think green before you print !
>
> Le message ci-dessus, ainsi que les documents l'accompagnant,
> sont destinés uniquement aux personnes identifiées et peuvent
> contenir des informations privilégiées, confidentielles ou ne
> pouvant être divulguées. Si vous avez reçu ce message par
> erreur, veuillez le détruire.
>
> This communication (and/or the attachments) is intended for
> named recipients only and may contain privileged or
> confidential information which is not to be disclosed. If you
> received this communication by mistake please destroy all copies.
>
More information about the R-help
mailing list