[R] Survey Design / Rake questions
Thomas Lumley
tlumley at u.washington.edu
Thu Aug 28 20:42:52 CEST 2008
On Mon, 25 Aug 2008, Farley, Robert wrote:
> I see a number of things that bother me.
> 1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
> Even though "StnTraveld <- c(as.factor(1:12))"
You don't want the c()
> a<-as.factor(1:12)
> str(a)
Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(c(a))
int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
As the help for c() says "all attributes except names are removed.",
which includes the factor levels.
> 2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the data. Where would they have come from?
No, that's just R printing things in columns
> a<-factor(1:12, labels=c(1:11,"antidisestablishmentarianism"))
> a
[1] 1 2
[3] 3 4
[5] 5 6
[7] 7 8
[9] 9 10
[11] 11 antidisestablishmentarianism
Levels: 1 2 3 4 5 6 7 8 9 10 11 antidisestablishmentarianism
> 3) I'd like to verify that the order (value) of "EBSurvey$lineon"
> matches my definition in "StnName"
all(levels(EBSurvey$lineon)==StnName)
-thomas
>
> Thanks for helping...
>
>
> ***************************************************************************
> ***************************************************************************
>> library(survey)
>> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>> #===============================================================================
>> temp <- sub(' +$', '', SurveyData$direction_)
>> SurveyData$direction_ <- temp
>> #===============================================================================
>> SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyData$lineoff))
>> mean(SurveyData$NumStn)
> [1] 6.785276
>> ### Kludge
>> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
>> mean(SurveyData$NumStn)
> [1] 6.789877
>> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
>> ###
>> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
>> XTTable <- xtabs(~direction_ , EBSurvey)
>> XTTable
> direction_
> EASTBOUND
> 345
>> WBSurvey <- subset(SurveyData, direction_ == "WESTBOUND" )
>> XTTable <- xtabs(~direction_ , WBSurvey)
>> XTTable
> direction_
> WESTBOUND
> 307
>> #
>> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
>> # svytable(~lineon+lineoff, EBDesign)
>> StnName <- c( "Warner Center", "De Soto", "Pierce College", "Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys", "Woodman", "Valley College", "Laurel Canyon", "North Hollywood")
>> EBOnNewTots <- c( 1000, 600, 1200, 500, 1000, 500, 200, 250, 1000, 300, 100, 123.65, 0 )
>> StnTraveld <- c(as.factor(1:12))
>> EBNumStn <- c(673.65, 800, 1000, 1000, 800, 700, 600, 500, 400, 200, 50, 50 )
>> ByEBOn <- data.frame(StnName, Freq=EBOnNewTots)
>> ByEBNum <- data.frame(StnTraveld, Freq=EBNumStn)
>> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn, ByEBNum) )
> Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], :
> Stratifying variables don't match
>>
>> str(EBSurvey$lineon)
> Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
>> EBSurvey$lineon[1:5]
> [1] Pierce College Warner Center Warner Center Warner Center De Soto
> 13 Levels: Warner Center De Soto Pierce College Tampa Reseda Balboa ... North Hollywood
>> str(ByEBOn$StnName)
> Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
>> ByEBOn$StnName[1:5]
> [1] Warner Center De Soto Pierce College Tampa Reseda
> 13 Levels: Balboa De Soto Laurel Canyon North Hollywood ... Woodman
>>
>> str(EBSurvey$NumStn)
> Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
>> EBSurvey$NumStn[1:5]
> [1] 10 12 4 12 8
> Levels: 1 2 3 4 5 6 7 8 9 10 11 12
>> str(ByEBNum$StnTraveld)
> int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
>> ByEBNum$StnTraveld[1:5]
> [1] 1 2 3 4 5
>>
> ********************************************************************************************************************************************************
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> Sent: Saturday, August 23, 2008 09:38
> To: Farley, Robert
> Cc: r-help at r-project.org
> Subject: Re: [R] Survey Design / Rake questions
>
> On Fri, 22 Aug 2008, Farley, Robert wrote:
>
>> I *think* I'm making progress, but I'm still failing at the same step. My rake call fails with:
>> Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], :
>> Stratifying variables don't match
>>
>> To my naïve eyes, it seems that my factors are "in the wrong order". If so,
>> how do I "assert" an ordering in my survey dataframe, or copy an "image" from
>> the survey dataframe to my marginals dataframes? I'd prefer to "pull" the
>> original marginals dataframe(s) from the survey dataframe so that I can
>> automate that in production.
>
> It looks like a problem with the NumStn factor. One copy has been converted to character and then factor, giving levels in alphabetical order; the other copy has been converted directly to factor, giving levels in numerical order.
>
> If you use as.factor(1:12) rather than as.character(1:12) it should work.
>
> -thomas
>
>
>
>> If that's not my problem, where might I look for enlightenment? Neither "?why" nor ?whatamimissing return citations. :-)
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list