[R] Converting factors back to numbers. Trouble with SPSS import data
Paul Johnson
pauljohn32 at gmail.com
Sun Feb 19 21:16:53 CET 2006
I'm using Fedora Core 4, R-2.2.
The basic question is: can one recover the numerical values used in
SPSS after importing data into R with read.spss from the foreign
library? Here's why I ask.
My colleague sent an SPSS data set. I must replicate some results she
calculated in SPSS and one problem is that the numbers used in SPSS
for variable values are not easily recovered in R.
I'm comparing 2 imported datasets, "eldat" (read.spss with No
convert-to-factors) and
"eldatfac" (read.spss with convert-to-factors)
If I bring in the data without conversion to factors:
library(foreign)
eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
to.data.frame=T)
I can see the variable HAPPY is coded 0, 1, 2, 3. Those are the
numbers that SPSS
uses as contrast values when it runs a regression with HAPPY.
In contrast, allow R to translate the variables with a few value
labels into factors.
library(foreign)
eldatfac <- read.spss("18CitySCBSsorted.sav",
max.value.labels=7,to.data.frame=T)
Consider the first 50 observations on the variable HAPPY
> f<- eldatfac$HAPPY[1:50]
> f
[1] Happy Happy Very happy Happy Very happy
[6] Very happy Happy Very happy Happy Very happy
[11] Happy Happy Not very happy Very happy Very happy
[16] Happy Happy Very happy Happy Happy
[21] Not very happy Happy Happy Very happy Happy
[26] Happy Happy Happy Happy Happy
[31] Happy Happy Happy Happy Happy
[36] Happy Very happy Very happy Happy Very happy
[41] Very happy Very happy Happy Very happy Very happy
[46] Happy Happy Happy Very happy Very happy
6 Levels: Not happy at all Not very happy Happy Very happy ... Refused
> levels(f)
[1] "Not happy at all" "Not very happy" "Happy" "Very happy"
[5] "Don't know" "Refused"
I need the numerical values back in order to have a regression like
SPSS. Isn't this what ?factor says one ought to do? Why are these all
missing?
> as.numeric(levels(f))[f]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> as.numeric(f)
[1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 4 4
[39] 3 4 4 4 3 4 4 3 3 3 4 4
Comparing against the "as.numeric" output from the unconverted factor,
I can see the levels are just one digit different.
> g <- eldat$HAPPY[1:50]
> as.numeric(g)
[1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3
[39] 2 3 3 3 2 3 3 2 2 2 3 3
I'm more worried about the kinds of variables that are coded
irregularly 1, 3, 7, 11 in the SPSS scheme.
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-help
mailing list