[R] Survey Design / Rake questions
Farley, Robert
FarleyR at metro.net
Mon Aug 25 18:33:37 CEST 2008
Still no joy. :-(
I see a number of things that bother me.
1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
Even though "StnTraveld <- c(as.factor(1:12))"
2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the data. Where would they have come from?
3) I'd like to verify that the order (value) of "EBSurvey$lineon" matches my definition in "StnName"
Thanks for helping...
***************************************************************************
***************************************************************************
> library(survey)
> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
> #===============================================================================
> temp <- sub(' +$', '', SurveyData$direction_)
> SurveyData$direction_ <- temp
> #===============================================================================
> SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyData$lineoff))
> mean(SurveyData$NumStn)
[1] 6.785276
> ### Kludge
> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
> mean(SurveyData$NumStn)
[1] 6.789877
> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
> ###
> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
> XTTable <- xtabs(~direction_ , EBSurvey)
> XTTable
direction_
EASTBOUND
345
> WBSurvey <- subset(SurveyData, direction_ == "WESTBOUND" )
> XTTable <- xtabs(~direction_ , WBSurvey)
> XTTable
direction_
WESTBOUND
307
> #
> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
> # svytable(~lineon+lineoff, EBDesign)
> StnName <- c( "Warner Center", "De Soto", "Pierce College", "Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys", "Woodman", "Valley College", "Laurel Canyon", "North Hollywood")
> EBOnNewTots <- c( 1000, 600, 1200, 500, 1000, 500, 200, 250, 1000, 300, 100, 123.65, 0 )
> StnTraveld <- c(as.factor(1:12))
> EBNumStn <- c(673.65, 800, 1000, 1000, 800, 700, 600, 500, 400, 200, 50, 50 )
> ByEBOn <- data.frame(StnName, Freq=EBOnNewTots)
> ByEBNum <- data.frame(StnTraveld, Freq=EBNumStn)
> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn, ByEBNum) )
Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], :
Stratifying variables don't match
>
> str(EBSurvey$lineon)
Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
> EBSurvey$lineon[1:5]
[1] Pierce College Warner Center Warner Center Warner Center De Soto
13 Levels: Warner Center De Soto Pierce College Tampa Reseda Balboa ... North Hollywood
> str(ByEBOn$StnName)
Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
> ByEBOn$StnName[1:5]
[1] Warner Center De Soto Pierce College Tampa Reseda
13 Levels: Balboa De Soto Laurel Canyon North Hollywood ... Woodman
>
> str(EBSurvey$NumStn)
Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
> EBSurvey$NumStn[1:5]
[1] 10 12 4 12 8
Levels: 1 2 3 4 5 6 7 8 9 10 11 12
> str(ByEBNum$StnTraveld)
int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
> ByEBNum$StnTraveld[1:5]
[1] 1 2 3 4 5
>
********************************************************************************************************************************************************
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu]
Sent: Saturday, August 23, 2008 09:38
To: Farley, Robert
Cc: r-help at r-project.org
Subject: Re: [R] Survey Design / Rake questions
On Fri, 22 Aug 2008, Farley, Robert wrote:
> I *think* I'm making progress, but I'm still failing at the same step. My rake call fails with:
> Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], :
> Stratifying variables don't match
>
> To my naïve eyes, it seems that my factors are "in the wrong order". If so,
>how do I "assert" an ordering in my survey dataframe, or copy an "image" from
>the survey dataframe to my marginals dataframes? I'd prefer to "pull" the
>original marginals dataframe(s) from the survey dataframe so that I can
>automate that in production.
It looks like a problem with the NumStn factor. One copy has been converted to character and then factor, giving levels in alphabetical order; the other copy has been converted directly to factor, giving levels in numerical order.
If you use as.factor(1:12) rather than as.character(1:12) it should work.
-thomas
> If that's not my problem, where might I look for enlightenment? Neither "?why" nor ?whatamimissing return citations. :-)
>
More information about the R-help
mailing list