[R] Mean-Centering Question

David L Carlson dcarlson at tamu.edu
Sun Dec 9 18:55:18 CET 2012


If you are willing to rethink the definition of your special function, the
process can be simplified. The function lmc() log-mean centers a single
grouped numeric vector. Then sapply() can be used to center a batch of them.

> lmc <- function(x, g) unsplit(lapply(split(log(x), g), scale,
scale=FALSE), g)
> dat2 <- data.frame(dat[,1:2], sapply(dat[,3:4], lmc, g=dat[,1]))
> dat2
     Location X..TimePeriod        Units AveragePrice
1 Los Angeles        5/1/11  0.213682659  0.071790268
2 Los Angeles        5/8/11 -0.005370907 -0.072872965
3 Los Angeles       5/15/11 -0.208311751  0.001082696
4    New York        5/1/11  0.235465925  0.101474328
5    New York        5/8/11 -0.090253520 -0.087116841
6    New York       5/15/11 -0.145212404 -0.014357487
7       Paris        5/1/11  0.219331999  0.117331641
8       Paris        5/8/11 -0.048703076 -0.049141723
9       Paris       5/15/11 -0.170628923 -0.068189918

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of arun
> Sent: Sunday, December 09, 2012 10:27 AM
> To: Ray DiGiacomo, Jr.
> Cc: R help
> Subject: Re: [R] Mean-Centering Question
> 
> Hi,
> 
> You could also use:
> newFunction1<-function(x) {t(t(log(x))-colMeans(log(x)))}
> 
>  res1<-
> by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction1)
>  res1
> #Location: Los Angeles
> #         Units AveragePrice
> #1  0.213682659  0.071790268
> #2 -0.005370907 -0.072872965
> #3 -0.208311751  0.001082696
> #------------------------------------------------------------
> #Location: New York
>  #       Units AveragePrice
> #4  0.23546592   0.10147433
> #5 -0.09025352  -0.08711684
> #6 -0.14521240  -0.01435749
> #------------------------------------------------------------
> #Location: Paris
>  #       Units AveragePrice
> #7  0.21933200   0.11733164
> #8 -0.04870308  -0.04914172
> #9 -0.17062892  -0.06818992
> 
> 
>   newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-")
> }
>  res<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction)
>  res
> #Location: Los Angeles
>  #        Units AveragePrice
> #1  0.213682659  0.071790268
> #2 -0.005370907 -0.072872965
> #3 -0.208311751  0.001082696
> #------------------------------------------------------------
> #Location: New York
>  #       Units AveragePrice
> #4  0.23546592   0.10147433
> #5 -0.09025352  -0.08711684
> #6 -0.14521240  -0.01435749
> #------------------------------------------------------------
> #Location: Paris
>  #       Units AveragePrice
> #7  0.21933200   0.11733164
> #8 -0.04870308  -0.04914172
> #9 -0.17062892  -0.06818992
> 
> #the ?identical() will be FALSE, as the list elements for res is
> data.frame and res1 is matrix.
> 
> A.K.
> 
> 
> ----- Original Message -----
> From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com>
> To: R Help <r-help at r-project.org>
> Cc:
> Sent: Saturday, December 8, 2012 11:11 PM
> Subject: Re: [R] Mean-Centering Question
> 
> Hi David and Arun,
> 
> Thanks for looking into this.  I think I have found a solution.
> 
> The "by" function will run ok without errors but the values returned in
> the
> second row of the "Los Angeles" output are both incorrect.  These
> incorrect
> values are shown below in red.
> 
> I think my original custom function was causing the incorrect values
> because the subtraction inside the original custom function was
> subtracting
> frames that had different dimensions and I think there was some
> "recycling"
> happening.
> 
> Using the "sweep" function fixes the problem.  This is what I did to
> fix
> things:
> 
> # here is my "new" custom function
> newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }
> 
> # this gives the correct values
> by(PullData[c("Units","AveragePrice")],
> PullData[c("StoreLocation")],
>         newFunction)
> 
> - Ray
> 
> 
> 
> 
> 
> On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius
> <dwinsemius at comcast.net>wrote:
> 
> >
> > On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:
> >
> >  Hello,
> >>
> >> I'm trying to create a custom function that "mean-centers" data and
> can be
> >> applied across many columns.
> >>
> >> Here is an example dataset, which is similar to my dataset:
> >>
> >>
> >>  dat <- read.table(text="Location,**TimePeriod,Units,AveragePrice
> >
> > Los Angeles,5/1/11,61,5.42
> > Los Angeles,5/8/11,49,4.69
> > Los Angeles,5/15/11,40,5.05
> > New York,5/1/11,259,6.4
> > New York,5/8/11,187,5.3
> > New York,5/15/11,177,5.7
> > Paris,5/1/11,672,6.26
> > Paris,5/8/11,514,5.3
> > Paris,5/15/11,455,5.2", header=TRUE, sep=",")
> >
> >
> >> I want to mean-center the "Units" and "AveragePrice" Columns.
> >>
> >> So, I created this function:
> >>
> >> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T)
> }
> >>
> >
> > I needed to modify this to avoid errors relating to how colMeans is
> > expecting its arguments:
> >
> > specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }
> >
> > aggregate(dat[3:4], dat[1], FUN=specialFunction2)
> >
> >      Location    Units.1    Units.2    Units.3 AveragePrice.1
> > AveragePrice.2
> > 1 Los Angeles  0.2136827 -0.0053709 -0.2083118      0.0717903
> > -0.0728730
> > 2    New York  0.2354659 -0.0902535 -0.1452124      0.1014743
> > -0.0871168
> > 3       Paris  0.2193320 -0.0487031 -0.1706289      0.1173316
> > -0.0491417
> >   AveragePrice.3
> > 1      0.0010827
> > 2     -0.0143575
> > 3     -0.0681899
> >
> >
> >
> >> If I use only "one" column in the first argument of the "by"
> function,
> >> everything is in fine.  For example the following code will work
> fine:
> >>
> >> by(data[c("Units")],
> >> data["Location"],
> >> specialFunction)
> >>
> >> But the following code will "not" work, because I have "two" columns
> in
> >> the
> >> first argument...
> >>
> >> by(data[c("Units", "AveragePrice")],
> >> data["Location"],
> >> specialFunction)
> >>
> >
> > OK. So then I tried this with your function and was surprised to see
> that
> > it also works:
> >
> > > by(dat[c("Units", "AveragePrice")],
> > + dat["Location"],
> > + specialFunction)
> > Location: Los Angeles
> >      Units AveragePrice
> > 1  0.21368    0.0717903
> > 2  *2.27351   -2.3517586*
> > 3 -0.20831    0.0010827
> > ------------------------------**------------------------------**-----
> -
> > Location: New York
> >      Units AveragePrice
> > 4  0.23547     0.101474
> > 5  3.47628    -3.653655
> > 6 -0.14521    -0.014357
> > ------------------------------**------------------------------**-----
> -
> > Location: Paris
> >      Units AveragePrice
> > 7  0.21933      0.11733
> > 8  4.52537     -4.62322
> > 9 -0.17063     -0.06819
> >
> >
> >
> >> Does anyone have any ideas as to what I am doing wrong?
> >>
> >
> > I guess I don't. Cannot reproduce and my other methods worked as
> well.This
> > also works with your version and with mine but I get the deprecation
> > message for `mean.data.frame` from mine:
> >
> > > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
> > $`Los Angeles`
> >      Units AveragePrice
> > 1  0.21368    0.0717903
> > 2  2.27351   -2.3517586
> > 3 -0.20831    0.0010827
> >
> > $`New York`
> >      Units AveragePrice
> > 4  0.23547     0.101474
> > 5  3.47628    -3.653655
> > 6 -0.14521    -0.014357
> >
> > $Paris
> >      Units AveragePrice
> > 7  0.21933      0.11733
> > 8  4.52537     -4.62322
> > 9 -0.17063     -0.06819
> >
> >
> >
> >> Please note that I'm trying to get the following results (for the
> "Los
> >> Angeles" group):
> >>
> >> Los Angeles "Units" variable (Mean-Centered)
> >> 0.213682659
> >> -0.005370907
> >> -0.208311751
> >>
> >> Los Angeles "AveragePrice" variable (Mean-Centered)
> >> 0.071790268
> >> -0.072872965
> >> 0.001082696
> >>
> >
> > --
> >
> > David Winsemius, MD
> > Alameda, CA, USA
> >
> >
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list