[R] relation in aggregated data

Joris Meys jorismeys at gmail.com
Thu Jul 8 10:44:12 CEST 2010

Depending on the data and the research question, a meta-analytic
approach might be appropriate. You can see every campaign as a
"study". See the package metafor for example. You can only draw very
general conclusions, but at least your inference will be closer to


On Thu, Jul 8, 2010 at 10:03 AM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
> Thank you
> Actually when I do this myself I always try to make day or week averages
> if possible. However this was done by one of my colleagues and basically
> the aggregation was done on basis of campaigns. There is 4 to 6 campaigns
> per year and sometimes there is apparent relationship in aggregated data
> sometimes is not. My opinion is that I can not say much about exact
> relations until I have other clues or ways like expected underlaying laws
> of physics.
> Thanks again
> Best regards
> Petr
> Joris Meys <jorismeys at gmail.com> napsal dne 07.07.2010 17:33:55:
>> You examples are pretty extreme... Combining 120 data points in 4
>> points is off course never going to give a result. Try :
>> fac <- rep(1:8,each=15)
>> xprum <- tapply(x, fac, mean)
>> yprum <- tapply(y, fac, mean)
>> plot(xprum, yprum)
>> Relation is not obvious, but visible.
>> Yes, you lose information. Yes, your hypothesis changes. But in the
>> case you describe, averaging the x-values for every day (so you get an
>> average linked to 1 y value) seems like a possibility, given you take
>> that into account when formulating the hypothesis. Optimally, you
>> should take the standard error on the average into account for the
>> analysis, but this is complicated, often not done and in most cases
>> ignoring this issue is not influencing the results to that extent it
>> becomes important.
>> Cheers
>> On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL <petr.pikal at precheza.cz>
> wrote:
>> > Dear all
>> >
>> > My question is more on statistics than on R, however it can be
>> > demonstrated by R. It is about pros and cons trying to find a
> relationship
>> > by aggregated data. I can have two variables which can be related and
> I
>> > measure them regularly during some time (let say a year) but I can not
>> > measure them in a same time - (e.g. I can not measure x and respective
>> > value of y, usually I have 3 or more values of x and only one value of
> y
>> > per day).
>> >
>> > I can make a aggregated values (let say quarterly). My questions are:
>> >
>> > 1.      Is such approach sound? Can I use it?
>> > 2.      What could be the problems
>> > 3.      Is there any other method to inspect variables which can be
>> > related but you can not directly measure them in a same time?
>> >
>> > My opinion is, that it is not much sound to inspect aggregated values
> and
>> > there can be many traps especially if there are only few aggregated
>> > values. Below you can see my examples.
>> >
>> > If you have some opinion on this issue, please let me know.
>> >
>> > Best regards
>> > Petr
>> >
>> > Let us have a relation x/y
>> >
>> > set.seed(555)
>> > x <- rnorm(120)
>> > y <- 5*x+3+rnorm(120)
>> > plot(x, y)
>> >
>> > As you can see there is clear relation which can be seen from plot.
> Now I
>> > make a factor for aggregation.
>> >
>> > fac <- rep(1:4,each=30)
>> >
>> > xprum <- tapply(x, fac, mean)
>> > yprum <- tapply(y, fac, mean)
>> > plot(xprum, yprum)
>> >
>> > Relationship is completely gone. Now let us make other fake data
>> >
>> > xn <- runif(120)*rep(1:4, each=30)
>> > yn <- runif(120)*rep(1:4, each=30)
>> > plot(xn,yn)
>> >
>> > There is no visible relation, xn and yn are independent but related to
>> > aggregation factor.
>> >
>> > xprumn <- tapply(xn, fac, mean)
>> > yprumn <- tapply(yn, fac, mean)
>> > plot(xprumn, yprumn)
>> >
>> > Here you can see perfect relation which is only due to aggregation
> factor.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> --
>> Joris Meys
>> Statistical consultant
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>> tel : +32 9 264 59 87
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

More information about the R-help mailing list