[R] Survey Design / Rake questions
Thomas Lumley
tlumley at u.washington.edu
Fri Aug 29 19:23:16 CEST 2008
On Thu, 28 Aug 2008, Farley, Robert wrote:
> I'm feeling like I just don't get it. My attempt at rake now fails
> with:
> Error in postStratify.survey.design(design, strata[[i]],
> population.margins[[i]], :
> Stratifying variables don't match
Ah. Now we have an easy one to fix. This means that the names of the
variables don't match, which they don't, because the variable names in the
formula are lineon and NumStn and the variable names in the population
tables are StnName and StnTraveld. You just need to rename the variables
in the population tables.
-thomas
> The factors in the data frame looks fine. Should I have the same
> structure in the design?
>> str(EBDesign$lineon)
> NULL
>> str(EBSurvey$lineon)
> Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
>> str(ByEBOn$StnName)
> Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
>> all(levels(EBSurvey$lineon)==StnName)
> [1] TRUE
>> #
>> str(EBDesign$NumStn)
> NULL
>> str(EBSurvey$NumStn)
> Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
>> str(ByEBNum$StnTraveld)
> Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
>> all(levels(EBSurvey$NumStn)==StnTraveld)
> [1] TRUE
>
> A complete listing is below:
> **************************************************
> **************************************************
> **************************************************
>> sessionInfo() # List loaded packages
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] graphics grDevices utils datasets stats methods base
>
>
> other attached packages:
> [1] survey_3.8-1 fortunes_1.3-5 moonsun_0.1 prettyR_1.3-2
> foreign_0.8-29
>> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",
> use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>>
> #=======================================================================
> ========
>> temp <- sub(' +$', '', SurveyData$direction_)
>> SurveyData$direction_ <- temp
>>
> #=======================================================================
> ========
>> # Calc. # stations traversed from StnOn/StnOff
>>
> SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyDat
> a$lineoff))
>> #################################################### Kludge
>> mean(SurveyData$NumStn)
> [1] 6.785276
>> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
>> mean(SurveyData$NumStn)
> [1] 6.789877
>> ####################################################
>> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
>>
> #=======================================================================
> ========
>> # Adjust one direction at a time. Start W/ EB {learn subsetting
> later}
>> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
>> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
>>
> #=======================================================================
> ========
>> # New Marignals {start w/ 2 dimensions: StnOn X Distance}
>> StnName <- as.factor(c( "Warner Center", "De Soto", "Pierce College",
> "Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys",
> "Woodman", "Valley College", "Laurel Canyon", "North Hollywood"))
>> EBOnNewTots <- c( 1000, 600, 1200,
> 500, 1000, 500, 200, 250, 1000, 300,
> 100, 123.65, 0 )
>> ByEBOn <- data.frame(StnName, Freq=EBOnNewTots)
>> #
>> StnTraveld <- as.factor(1:12)
>> EBNumStn <- c(673.65, 800, 1000, 1000, 800, 700, 600, 500,
> 400, 200, 50, 50 )
>> ByEBNum <- data.frame(StnTraveld, Freq=EBNumStn)
>> #
>> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn,
> ByEBNum) )
> Error in postStratify.survey.design(design, strata[[i]],
> population.margins[[i]], :
> Stratifying variables don't match
>> #
>> str(EBDesign$lineon)
> NULL
>> str(EBSurvey$lineon)
> Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
>> str(ByEBOn$StnName)
> Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
>> all(levels(EBSurvey$lineon)==StnName)
> [1] TRUE
>> #
>> str(EBDesign$NumStn)
> NULL
>> str(EBSurvey$NumStn)
> Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
>> str(ByEBNum$StnTraveld)
> Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
>> all(levels(EBSurvey$NumStn)==StnTraveld)
> [1] TRUE
>> #
> **************************************************
> **************************************************
> **************************************************
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> Sent: Thursday, August 28, 2008 11:43
> To: Farley, Robert
> Cc: r-help at r-project.org
> Subject: Re: [R] Survey Design / Rake questions
>
> On Mon, 25 Aug 2008, Farley, Robert wrote:
>
>> I see a number of things that bother me.
>> 1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
>> Even though "StnTraveld <- c(as.factor(1:12))"
>
> You don't want the c()
>> a<-as.factor(1:12)
>> str(a)
> Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
>> str(c(a))
> int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
>
> As the help for c() says "all attributes except names are removed.",
> which includes the factor levels.
>
>> 2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the
> data. Where would they have come from?
>
> No, that's just R printing things in columns
>> a<-factor(1:12, labels=c(1:11,"antidisestablishmentarianism"))
>> a
> [1] 1 2
> [3] 3 4
> [5] 5 6
> [7] 7 8
> [9] 9 10
> [11] 11 antidisestablishmentarianism
> Levels: 1 2 3 4 5 6 7 8 9 10 11 antidisestablishmentarianism
>
>
>> 3) I'd like to verify that the order (value) of "EBSurvey$lineon"
>> matches my definition in "StnName"
>
> all(levels(EBSurvey$lineon)==StnName)
>
> -thomas
>
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list