[R] Mean-Centering Question
David L Carlson
dcarlson at tamu.edu
Sun Dec 9 18:55:18 CET 2012
If you are willing to rethink the definition of your special function, the
process can be simplified. The function lmc() log-mean centers a single
grouped numeric vector. Then sapply() can be used to center a batch of them.
> lmc <- function(x, g) unsplit(lapply(split(log(x), g), scale,
scale=FALSE), g)
> dat2 <- data.frame(dat[,1:2], sapply(dat[,3:4], lmc, g=dat[,1]))
> dat2
Location X..TimePeriod Units AveragePrice
1 Los Angeles 5/1/11 0.213682659 0.071790268
2 Los Angeles 5/8/11 -0.005370907 -0.072872965
3 Los Angeles 5/15/11 -0.208311751 0.001082696
4 New York 5/1/11 0.235465925 0.101474328
5 New York 5/8/11 -0.090253520 -0.087116841
6 New York 5/15/11 -0.145212404 -0.014357487
7 Paris 5/1/11 0.219331999 0.117331641
8 Paris 5/8/11 -0.048703076 -0.049141723
9 Paris 5/15/11 -0.170628923 -0.068189918
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of arun
> Sent: Sunday, December 09, 2012 10:27 AM
> To: Ray DiGiacomo, Jr.
> Cc: R help
> Subject: Re: [R] Mean-Centering Question
>
> Hi,
>
> You could also use:
> newFunction1<-function(x) {t(t(log(x))-colMeans(log(x)))}
>
> res1<-
> by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction1)
> res1
> #Location: Los Angeles
> # Units AveragePrice
> #1 0.213682659 0.071790268
> #2 -0.005370907 -0.072872965
> #3 -0.208311751 0.001082696
> #------------------------------------------------------------
> #Location: New York
> # Units AveragePrice
> #4 0.23546592 0.10147433
> #5 -0.09025352 -0.08711684
> #6 -0.14521240 -0.01435749
> #------------------------------------------------------------
> #Location: Paris
> # Units AveragePrice
> #7 0.21933200 0.11733164
> #8 -0.04870308 -0.04914172
> #9 -0.17062892 -0.06818992
>
>
> newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-")
> }
> res<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction)
> res
> #Location: Los Angeles
> # Units AveragePrice
> #1 0.213682659 0.071790268
> #2 -0.005370907 -0.072872965
> #3 -0.208311751 0.001082696
> #------------------------------------------------------------
> #Location: New York
> # Units AveragePrice
> #4 0.23546592 0.10147433
> #5 -0.09025352 -0.08711684
> #6 -0.14521240 -0.01435749
> #------------------------------------------------------------
> #Location: Paris
> # Units AveragePrice
> #7 0.21933200 0.11733164
> #8 -0.04870308 -0.04914172
> #9 -0.17062892 -0.06818992
>
> #the ?identical() will be FALSE, as the list elements for res is
> data.frame and res1 is matrix.
>
> A.K.
>
>
> ----- Original Message -----
> From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com>
> To: R Help <r-help at r-project.org>
> Cc:
> Sent: Saturday, December 8, 2012 11:11 PM
> Subject: Re: [R] Mean-Centering Question
>
> Hi David and Arun,
>
> Thanks for looking into this. I think I have found a solution.
>
> The "by" function will run ok without errors but the values returned in
> the
> second row of the "Los Angeles" output are both incorrect. These
> incorrect
> values are shown below in red.
>
> I think my original custom function was causing the incorrect values
> because the subtraction inside the original custom function was
> subtracting
> frames that had different dimensions and I think there was some
> "recycling"
> happening.
>
> Using the "sweep" function fixes the problem. This is what I did to
> fix
> things:
>
> # here is my "new" custom function
> newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }
>
> # this gives the correct values
> by(PullData[c("Units","AveragePrice")],
> PullData[c("StoreLocation")],
> newFunction)
>
> - Ray
>
>
>
>
>
> On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius
> <dwinsemius at comcast.net>wrote:
>
> >
> > On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:
> >
> > Hello,
> >>
> >> I'm trying to create a custom function that "mean-centers" data and
> can be
> >> applied across many columns.
> >>
> >> Here is an example dataset, which is similar to my dataset:
> >>
> >>
> >> dat <- read.table(text="Location,**TimePeriod,Units,AveragePrice
> >
> > Los Angeles,5/1/11,61,5.42
> > Los Angeles,5/8/11,49,4.69
> > Los Angeles,5/15/11,40,5.05
> > New York,5/1/11,259,6.4
> > New York,5/8/11,187,5.3
> > New York,5/15/11,177,5.7
> > Paris,5/1/11,672,6.26
> > Paris,5/8/11,514,5.3
> > Paris,5/15/11,455,5.2", header=TRUE, sep=",")
> >
> >
> >> I want to mean-center the "Units" and "AveragePrice" Columns.
> >>
> >> So, I created this function:
> >>
> >> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T)
> }
> >>
> >
> > I needed to modify this to avoid errors relating to how colMeans is
> > expecting its arguments:
> >
> > specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }
> >
> > aggregate(dat[3:4], dat[1], FUN=specialFunction2)
> >
> > Location Units.1 Units.2 Units.3 AveragePrice.1
> > AveragePrice.2
> > 1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903
> > -0.0728730
> > 2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743
> > -0.0871168
> > 3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316
> > -0.0491417
> > AveragePrice.3
> > 1 0.0010827
> > 2 -0.0143575
> > 3 -0.0681899
> >
> >
> >
> >> If I use only "one" column in the first argument of the "by"
> function,
> >> everything is in fine. For example the following code will work
> fine:
> >>
> >> by(data[c("Units")],
> >> data["Location"],
> >> specialFunction)
> >>
> >> But the following code will "not" work, because I have "two" columns
> in
> >> the
> >> first argument...
> >>
> >> by(data[c("Units", "AveragePrice")],
> >> data["Location"],
> >> specialFunction)
> >>
> >
> > OK. So then I tried this with your function and was surprised to see
> that
> > it also works:
> >
> > > by(dat[c("Units", "AveragePrice")],
> > + dat["Location"],
> > + specialFunction)
> > Location: Los Angeles
> > Units AveragePrice
> > 1 0.21368 0.0717903
> > 2 *2.27351 -2.3517586*
> > 3 -0.20831 0.0010827
> > ------------------------------**------------------------------**-----
> -
> > Location: New York
> > Units AveragePrice
> > 4 0.23547 0.101474
> > 5 3.47628 -3.653655
> > 6 -0.14521 -0.014357
> > ------------------------------**------------------------------**-----
> -
> > Location: Paris
> > Units AveragePrice
> > 7 0.21933 0.11733
> > 8 4.52537 -4.62322
> > 9 -0.17063 -0.06819
> >
> >
> >
> >> Does anyone have any ideas as to what I am doing wrong?
> >>
> >
> > I guess I don't. Cannot reproduce and my other methods worked as
> well.This
> > also works with your version and with mine but I get the deprecation
> > message for `mean.data.frame` from mine:
> >
> > > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
> > $`Los Angeles`
> > Units AveragePrice
> > 1 0.21368 0.0717903
> > 2 2.27351 -2.3517586
> > 3 -0.20831 0.0010827
> >
> > $`New York`
> > Units AveragePrice
> > 4 0.23547 0.101474
> > 5 3.47628 -3.653655
> > 6 -0.14521 -0.014357
> >
> > $Paris
> > Units AveragePrice
> > 7 0.21933 0.11733
> > 8 4.52537 -4.62322
> > 9 -0.17063 -0.06819
> >
> >
> >
> >> Please note that I'm trying to get the following results (for the
> "Los
> >> Angeles" group):
> >>
> >> Los Angeles "Units" variable (Mean-Centered)
> >> 0.213682659
> >> -0.005370907
> >> -0.208311751
> >>
> >> Los Angeles "AveragePrice" variable (Mean-Centered)
> >> 0.071790268
> >> -0.072872965
> >> 0.001082696
> >>
> >
> > --
> >
> > David Winsemius, MD
> > Alameda, CA, USA
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list