[R] spsurvey analysis

Law, Jason Jason.Law at portlandoregon.gov
Fri Nov 1 19:47:37 CET 2013

I use the spsurvey package a decent amount.  The cont.cdftest function bins the cdf in order to perform the test which I think is the root of the problem.  Unfortunately, the default is 3 which is the minimum number of bins.

I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about this problem.

Another option would be to take a different analytical approach (e.g., a mixed effects model) which would allow you a lot more flexibility.

Jason Law
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
jason.law at portlandoregon.gov

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Tim Howard
Sent: Friday, November 01, 2013 7:49 AM
To: r-help at r-project.org
Subject: [R] spsurvey analysis

I've used the excellent package, spsurvey, to create spatially balanced samples many times in the past. I'm now attempting to use the analysis portion of the package, which compares CDFs among sub-populations to test for differences in sub-population metrics. 
- My data (count data) have many zeros, following a negative binomial or even zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) species sampled. Warning message: "The combined number of values in at least one class is less than five. Action: The user should consider using a smaller number of classes."

- There are plenty of samples in my two time periods (the dummy set below: Yr1=27, Yr2=31 non-zero values). 
My Question:
Why is it throwing this error and is there a way to get around it?

Reproduceable example (change path to spsurvey sample data), requires us to use spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")


# Create the design list
Stratdsgn <- list("Central Basin and Range"=list(panel=c(PanelOne=25), seltype="Equal"),
                  "Colorado Plateaus"=list(panel=c(PanelOne=25), seltype="Equal"),
                  "Mojave Basin and Range"=list(panel=c(PanelOne=10), seltype="Equal"),
                  "Northern Basin and Range"=list(panel=c(PanelOne=10), seltype="Equal"),
                  "Southern Rockies"=list(panel=c(PanelOne=14), seltype="Equal"),
                  "Wasatch and Uinta Mountains"=list(panel=c(PanelOne=10), seltype="Equal"),
                  "Wyoming Basin"=list(panel=c(PanelOne=6), seltype="Equal"))

# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
                   type.frame="area", src.frame="sp.object",
                   sp.object=shp, att.frame=att, stratum="Level3_Nam", shapefile=FALSE)

Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
                   type.frame="area", src.frame="sp.object",
                   sp.object=shp, att.frame=att, stratum="Level3_Nam", shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID to link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1 at data[,c(1,2,3,5)], grp = "Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2 at data[,c(1,2,3,5)], grp = "Yr2")         
sitedat <- rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100) bn.b <- rnbinom(size = 0.06, mu = 20.15, n=100) dat.a <- data.frame(pltID = 1001:1100, grp = "Yr1",count = bn.a) dat.b <- data.frame(pltID = 2001:2100, grp = "Yr2",count = bn.b) dat <- rbind(dat.a, dat.b)

## Analysis begins here
data.cont <- data.frame(siteID = dat$pltID, Density=dat$count) sites <- data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop <- data.frame(siteID = dat$pltID, 
                        Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
                        wgt = sitedat$wgt,
                        xcoord = sitedat$xcoord,
                        ycoord = sitedat$ycoord)
framesize <- c("Yr1"=888081202000, "Yr2"=888081202000)                        
## There seem to be pretty good estimates CDF_Estimates <- cont.analysis(sites, subpop, design, data.cont, 
                        popsize = list(All_years=sum(framesize),
                        Year = as.list(framesize)))


## this test fails
CDF_Tests <- cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,

## how many records have values greater than zero, by year?   Probably irrelevant!
notZero <- dat[dat$count > 0,]

### end

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

Thanks in advance.

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list