[R] Mean-Centering Question
David Winsemius
dwinsemius at comcast.net
Sun Dec 9 04:12:20 CET 2012
On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:
> Hello,
>
> I'm trying to create a custom function that "mean-centers" data and
> can be
> applied across many columns.
>
> Here is an example dataset, which is similar to my dataset:
>
>
dat <- read.table(text="Location,TimePeriod,Units,AveragePrice
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2", header=TRUE, sep=",")
>
> I want to mean-center the "Units" and "AveragePrice" Columns.
>
> So, I created this function:
>
> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }
I needed to modify this to avoid errors relating to how colMeans is
expecting its arguments:
specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }
aggregate(dat[3:4], dat[1], FUN=specialFunction2)
Location Units.1 Units.2 Units.3 AveragePrice.1
AveragePrice.2
1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903
-0.0728730
2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743
-0.0871168
3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316
-0.0491417
AveragePrice.3
1 0.0010827
2 -0.0143575
3 -0.0681899
>
> If I use only "one" column in the first argument of the "by" function,
> everything is in fine. For example the following code will work fine:
>
> by(data[c("Units")],
> data["Location"],
> specialFunction)
>
> But the following code will "not" work, because I have "two" columns
> in the
> first argument...
>
> by(data[c("Units", "AveragePrice")],
> data["Location"],
> specialFunction)
OK. So then I tried this with your function and was surprised to see
that it also works:
> by(dat[c("Units", "AveragePrice")],
+ dat["Location"],
+ specialFunction)
Location: Los Angeles
Units AveragePrice
1 0.21368 0.0717903
2 2.27351 -2.3517586
3 -0.20831 0.0010827
------------------------------------------------------------------
Location: New York
Units AveragePrice
4 0.23547 0.101474
5 3.47628 -3.653655
6 -0.14521 -0.014357
------------------------------------------------------------------
Location: Paris
Units AveragePrice
7 0.21933 0.11733
8 4.52537 -4.62322
9 -0.17063 -0.06819
>
> Does anyone have any ideas as to what I am doing wrong?
I guess I don't. Cannot reproduce and my other methods worked as
well.This also works with your version and with mine but I get the
deprecation message for `mean.data.frame` from mine:
> lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
$`Los Angeles`
Units AveragePrice
1 0.21368 0.0717903
2 2.27351 -2.3517586
3 -0.20831 0.0010827
$`New York`
Units AveragePrice
4 0.23547 0.101474
5 3.47628 -3.653655
6 -0.14521 -0.014357
$Paris
Units AveragePrice
7 0.21933 0.11733
8 4.52537 -4.62322
9 -0.17063 -0.06819
>
> Please note that I'm trying to get the following results (for the "Los
> Angeles" group):
>
> Los Angeles "Units" variable (Mean-Centered)
> 0.213682659
> -0.005370907
> -0.208311751
>
> Los Angeles "AveragePrice" variable (Mean-Centered)
> 0.071790268
> -0.072872965
> 0.001082696
--
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list