[R] NADA Data Frame Format: Wide or Long?

MacQueen, Don macqueen1 at llnl.gov
Sat Jul 7 02:03:45 CEST 2012

Hi Rich,

So what you're faced with is that the cenros() function has no built-in
methods for grouping or subsetting -- unlike some other R methods,
especially those that work with the lattice package, or the many modeling
functions like lm() that have a subset argument or employ a conditioning
syntax for models [like  y ~ x | g ]. In effect, this means you have to
roll your own.

The wide format could help, but you would still probably end up writing
loops. Each parameter would then presumably be represented by two columns,
one for the result, one for non-detection indicator. And they would all
have different names, such as ceneq1.ag, ceneq1.al, and so on. I think
you'd probably end up with more complicated scripts. This approach is
especially tricky if not all analtyes and locations were sampled on the
same days (which is normally the case for my data).

You're probably aware that there are various functions for splitting a
dataframe into subsets and then applying the same function to every
subset, such as by() and aggregate(), and probably others. These may turn
out to be fairly simple to use with a NADA function such as cenros(), but
you won't really know until you start trying them.

One can also do it oneself with constructs like

tmpsub <- split( mydf, list(mydf$site, mydf$param) )
tmpss <- lapply(tmpsub, myfun)

where myfun is a wrapper function around, say, cenros().

This is obviously just an outline.


Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550

On 7/5/12 1:15 PM, "Rich Shepard" <rshepard at appl-ecosys.com> wrote:

>On Thu, 5 Jul 2012, MacQueen, Don wrote:
>> This example follows exactly the example in ?cenros.
>>    with( subset(yourdataframe, param=='Ag'),  cenros(quant,ceneq1) )
>> This should do a simple censored summary statistica calculation for
>> (assuming quant contains your reporting level for censored results,
>> appears to be the case).
>   That makes sense to me. I was hoping to avoid subsetting the data frame
>for each of the 37 chemical parameters, but ... I will review the use of
>> I'd also suggest you try to load your data so that site and param are
>> factors, though this could depend on your ultimate analysis.
>   I do need to differentiate results by site and chemical paramater.
>Many thanks,
>R-help at r-project.org mailing list
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list