[R] use of variable labels

Thomas Lumley tlumley at u.washington.edu
Wed Apr 9 00:19:06 CEST 2003


On Tue, 8 Apr 2003, janet rosenbaum wrote:
>
> The mean was just an example.  We have a 4000 line program that expects
> numbers.  I was hoping that there would be some way of dealing with this
> problem on the level of the data.frame.

as.data.frame(lapply(df,as.numeric))

would work if all your variables were either unlabelled or completely
labelled, but it doesn't seem any simpler than using convert.factors=FALSE

> I'm guessing I'm just going to have to throw out the labels since it's
> not practical to cast as a number every time and I also just noticed
> something strange about having convert.factors=TRUE:
>
> When I do
> read.dta("filename.dta")
> some of the variables which are numbers are read as NA:
>      age         educyrs
>       refuse:   0   refuse:   0
>       DK    :   0   DK    :   0
>       NA's  :1068   NA's  :1068
>
> When I do
> read.dta("filename.dta", convert.factors=FALSE)
> the variables are again treated like numbers:
>
>       age           educyrs
>  Min.   :18.00   Min.   : 0.00
>  1st Qu.:30.00   1st Qu.: 5.00
>  Median :41.00   Median : 9.00
>  Mean   :43.18   Mean   : 8.65
>  3rd Qu.:54.00   3rd Qu.:12.00
>  Max.   :88.00   Max.   :40.00
>  NA's   :18.00   NA's   :87.00
>
> I'm guessing that this means that by default -only- the labels are used
> when convert.factors=TRUE, and even variables without labels have to be
> cast as numbers.

No, that is not the case.  I suspect you have variable labels
declared in Stata for these variables, it's just that the variables don't
take on those values.

read.dta does assume that if any value of a variable has a label then all
values should. It doesn't eg handle labels for different types of missing
on an otherwise numeric variable.

	-thomas



More information about the R-help mailing list