[R] ZOO: Learning to apply it to my data
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Sep 14 01:22:13 CEST 2011
On Tue, Sep 13, 2011 at 2:07 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
> I have read ?zoo but am not sure how to relate the parameters (x,
> order.by, frequency, and style) to my data.frame. The structure of the
> data.frame is
>
> 'data.frame': 11169 obs. of 4 variables:
> $ stream : Factor w/ 37 levels "Burns","CIL",..: 1 1 1 1 1 1 1 1 1 1 ...
> $ sampdate: Date, format: "1987-07-23" "1987-09-17" ...
> $ param : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ...
> $ quant : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0 ...
>
> The numeric column ('x' in zoo, I believe) is associated with the unique
> combination of param, sampdate, and stream in each row. For example:
>
> tail(streamdata)
> stream sampdate param quant
> 11164 Winters 2010-06-30 SO4 120.000
> 11165 Winters 2010-06-30 Zn 0.010
> 11166 Winters 2011-06-06 As 0.005
> 11167 Winters 2011-06-06 Cl 5.000
> 11168 Winters 2011-06-06 SO4 150.000
> 11169 Winters 2011-06-06 Zn 0.010
>
> I'm in the early exploratory stage of understanding these data, but want
> to produce time series plots and analyses by stream and param using zoo
> objects since the sampdate varies by both stream and chemical.
>
> I assume that order.by, the index, is sampdate. The frequency option is
> FALSE because these samples are not temporally regular. I've no idea what to
> do with the style option, if anything.
>
> Most of the examples I see on using R (including in the lattice book I'm
> now reading) have one or more numeric columns in the data.frame associated
> with a single factor. I have a single numeric column associated with two
> factors and a date.
>
> If there are other documents or books I should read to learn how to
> effectively use the zoo package for my project (in addition to zoo.pdf that
> lists the methods and is quite obtuse to me), please point me to them. I
> would greatly appreciate any and all help in getting up to speed with zoo.
>
As in ?zoo a zoo object is a numeric matrix, numeric vector or factor
together with an ordered time index which is unique. Its not clear
that that is what you have; however, if we can assume that for each
value of param we have a unique set of dates then quant could form a
multivariate zoo series with Date index. We used text=Lines in
read.zoo below to keep the example self-contained but in reality the
first argument to read.zoo would be something like "myfile.dat" to
refer to the file holding the data . The "NULL" entries in the
colClasses argument of read.zoo cause the respective columns to be
ignored.
Lines <- "stream sampdate param quant
11164 Winters 2010-06-30 SO4 120.000
11165 Winters 2010-06-30 Zn 0.010
11166 Winters 2011-06-06 As 0.005
11167 Winters 2011-06-06 Cl 5.000
11168 Winters 2011-06-06 SO4 150.000
11169 Winters 2011-06-06 Zn 0.010"
library(zoo)
packageVersion("zoo") # should be >= 1.7-4
z <- read.zoo(text = Lines, skip = 1, split = 2,
colClasses = c("NULL", "NULL", NA, NA, NA))
which gives
> z
As Cl SO4 Zn
2010-06-30 NA NA 120 0.01
2011-06-06 0.005 5 150 0.01
Read over ?zoo and ?read.zoo and also the 5 vignettes. The zoo-read
vignette is entirely about read.zoo . If you really do want to keep
all that info you might want to use a data frame instead or possibly
several zoo objects.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list