[R] Using MASS::boxcox for a single variable gives different results than the original paper

peter dalgaard pdalgd at gmail.com
Mon Oct 12 15:43:19 CEST 2015


Two things:

A) x2 ~ zpoints(x2) is not right
B) granularity:
> a <- MASS::boxcox(x2 ~ 1,lambda=seq(-2,2,1e-5))
> mle(a)
[1] 0.40783

-pd

On 12 Oct 2015, at 11:17 , Tal Galili <tal.galili at gmail.com> wrote:

> Hello all,
> 
> Given a set of observations, I would like to find lambda for a boxcox
> transformation so to get a symmetric/normal result as possible.
> 
> I try to use MASS::boxcox, but get different results than when using the
> formula from the original box-cox paper (link
> <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
> 
> I probably have made an error somewhere, but I can't figure out where.
> 
> Here is an example in which the lambda by MASS::boxcox is 0.42424, while by
> the formula from the paper I get 0.40782.
> 
> 
> 
> 
> 
> 
> 
> 
> # Toy data
> ################
> set.seed(13241089)
> x <- rnorm(1000, 10)
> x2 <- x**2 # we want to transform x2 to something more normal
> plot(density(x2))
> 
> # using MASS::boxcox
> ################
> 
> zpoints <- function(y) {
> n <- length(y)
> qnorm(ppoints(n))[order(order(y))]
> }
> mle <- function(BC) {
> with(BC, x[which.max(y)])
> }
> 
> a <- MASS::boxcox(x2 ~ zpoints(x2))
> mle(a)
> # lambda:
> # 0.42424
> 
> 
> 
> # using formula from the paper
> ################
> 
> loglik_lambda <- function(l, y) {
> GM <- exp(mean(log(y)))
> if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) )
> #  if(l==0) x <- log(y) else x <- (y^l-1)/ (l )
> sd(x)
> }
> fo <- function(l) loglik_lambda(l, y = x2)
> V_fo <- Vectorize(fo)
> V_fo(2)
> curve(V_fo, -.5,1.5)
> optimize(V_fo, c(-3,3))
> # lambda:
> # 0.40782
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list