[R] Numeric class and sasxport.get

Frank E Harrell Jr f.harrell at vanderbilt.edu
Thu Feb 5 00:06:45 CET 2009


Sebastien Bihorel wrote:
> I also realized the flaw after testing the script on various datasets...
> 
> Following up on your last note:
> 1- Is that the reason why the class of integer and regular numeric 
> variable is solely "labelled" following sasxport.get?

Yes.  R gurus might correct me but just creating a numeric vector 
doesn't create a 'hard' class, add adding your own class attribute equal 
to 'numeric' or 'integer' might cause a problem downstream.

> 2- Can class be 'soft' for other 'kind' of variables?

Not that I can recall.

> 3- Would you anticipate the following wrapper function to generate 
> incompatibilities with other R functions?

I'm going to beg off on that.  I'm not enough of an expert on the impact 
of adding such classes.

Frank

> 
> 
> SASxpt.get <- function(file, force.single = TRUE,
>                  method=c('read.xport','dataload','csv'), formats=NULL, 
> allow=NULL,
>                  out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
> 
>  foo <- sasxport.get(file=file, force.single=force.single, method=method,
>                      formats=formats, allow=allow, out=out, keep=keep,
>                      drop=drop, as.is=as.is, FUN=FUN)
> 
>  # For each variable of class "labelled" (and only "labelled"), add the 
> native class as a second class argument
> 
>  sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
> 
>  for (i in 1:length(sglClassVarInd)){
>    x <- foo[,sglClassVarInd[i]]      if (class(x)=="labelled") 
> class(foo[,sglClassVarInd[i]]) <- c(class(x), class(unclass(x)))
>  }
>  return(foo)
> }
> 
> 
> *Sebastien Bihorel, PharmD, PhD*
> PKPD Scientist
> Cognigen Corp
> Email: sebastien.bihorel at cognigencorp.com 
> <mailto:sebastien.bihorel at cognigencorp.com>
> Phone: (716) 633-3463 ext. 323
> 
> 
> Frank E Harrell Jr wrote:
>> Sebastien Bihorel wrote:
>>> Thanks a lot Frank,
>>>
>>> One last question, though. I was tempted to remove all attributes of 
>>> my variables after the sasxport.get call using
>>> foo <- sasxport.get(...)
>>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>>> Since I never worked with the objects of class 'labeled', I was 
>>> wondering what I will loose by removing this attribute.
>>
>> Not a good idea, for many reasons including dates and other types.
>>
>> And the labelled type is need if you subset the data, in order to keep 
>> the labels.
>>
>> Note that your original issue is related to "class" being "soft" for 
>> integers and regular numerics:
>>
>>  x <- 1:3
>> > attributes(x)
>> NULL
>> > class(x)
>> [1] "integer"
>> > x <- runif(3)
>> > class(x)
>> [1] "numeric"
>> > attributes(x)
>> NULL
>>
>> Frank
>>
>>>
>>> *Sebastien Bihorel, PharmD, PhD*
>>> PKPD Scientist
>>> Cognigen Corp
>>> Email: sebastien.bihorel at cognigencorp.com 
>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>> Phone: (716) 633-3463 ext. 323
>>>
>>>
>>> Frank E Harrell Jr wrote:
>>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>>> The problem is actually not related to a broken command but a 
>>>>> attempt of
>>>>> operational qualification of R. A few years ago, my company 
>>>>> developed a
>>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>>> switching to R so I am currently trying to port the scripts to R.
>>>>> All Splus scripts imported SAS data using the importData function, 
>>>>> which I
>>>>> substituted by sasxport.get. One particular script returns the 
>>>>> class of
>>>>> each variable of the imported data frame; the output must match the
>>>>> expected values: numeric, factor, integer, etc... The R 
>>>>> 'translation' with
>>>>> sasxport.get is thus problematic.
>>>>> If there is no easy tweak of the function, we will probably have to 
>>>>> remove
>>>>> this script from our list of 'qualification' scripts.
>>>>>
>>>>> Although it would be nice
>>>>
>>>> Then my advice is to write your own wrapper function for 
>>>> sasxport.get that takes its output, looks for labelled variables, 
>>>> and adds a new class of your choosing depending on properties of the 
>>>> variable, making sure that you write methods needed for that class 
>>>> (if any).  Then test your new function, not sasxport.get explicitly.
>>>>
>>>> Frank
>>>>
>>>>>
>>>>>> Sebastien Bihorel wrote:
>>>>>>> Frank,
>>>>>>>
>>>>>>> It is a non existing issue for me if the variables of class 
>>>>>>> "labelled"
>>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>>> numeric).
>>>>>>>
>>>>>>> Sebastien
>>>>>> 'labelled' can apply to any type of vector.  I'm not clear on the
>>>>>> problem this causes you.  Please provide a command that is broken by
>>>>>> this behavior.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>> Dear R-users,
>>>>>>>>>
>>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>>> although character variables might end up been defined as 
>>>>>>>>> "labelled"
>>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>>> Is there a way to tell sasxport.get to define numeric variable as
>>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>>> Sebastien,
>>>>>>>>
>>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>> Thank you
>>>>>>>>>
>>>>>>>>> Sebastien
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>>>>>                       Department of Biostatistics   Vanderbilt 
>>>>>> University
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list