[R] Percentage cover data with many zeros
Cade, Brian
cadeb at usgs.gov
Mon Jan 26 17:12:31 CET 2015
Ben: You have a statistical problem with a bounded response variable (0 to
100%, or 0.0 to 1.0) and thus, might make use of a logistic quantile
regression model (see Bottai et al. 2010. Logistic quantile regression for
bounded outcomes. Statistics in Medicine 29: 309-317). This requires a
logit transformation, log ((y - ymin)/(ymax - y)) of your percent cover
response variable and then estimation in linear quantile regression, rq()
function in quantreg package. There are details in Bottai et al. that you
will need to understand about back transforming your estimates,
intepretations, etc. But it is fairly easy to use. Quantile regression
by modeling the conditional cumulative distribution function readily
accomodates the heterogeneous variance patterns that typically occur with
bounded outcomes. Depending on the pattern and mass of zero values, you
may still have lower regions of the cumulative distribution function about
which you are able to make no inferential statements.
Brian
Brian S. Cade, PhD
U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818
email: cadeb at usgs.gov <brian_cade at usgs.gov>
tel: 970 226-9326
On Sat, Jan 24, 2015 at 6:26 AM, Ben Brooker <awe.ben at googlemail.com> wrote:
> Hi,
>
> I am new to R and have not had the most exposure to statistics.
> I have a dataset of percentage cover (so 0-100) for certain species in 3
> different shore zones (High, mid and low). The data was recorded for
> different protected areas as well (17 of them) and my number of obs is
> large (3358). I'm obviously interested in the difference in percentage
> cover of species between shore zones as well as between protected areas.
> The problem is that my data contains loads of zeros and I haven't dealt yet
> in statistics with how to manipulate the data so as to perform robust tests
> on it. I previously used Kruskal-Wallis ANOVAs to look at cover differences
> in shore zone but I am worried that it is inappropriate because of the
> large sample size that I have and because my variances are not equal.
>
> I've read a bit about using a zero-inflated negative binomial regression to
> fit to my data, but I'm not sure if that will work because it is for count
> data.
>
> I would very much appreciate it if someone could point me in the correct
> direction wrt a transformation that may help or an appropriate model to fit
> or test to use. I've searched quite a bit but I'm a out of my depth.
>
> PS sorry if I sound like a halfwit
>
> Thanks a lot
>
> Ben
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list