[R] Bootstrapping in R
Bryan Mac
bryanmac.24 at gmail.com
Mon Oct 3 09:24:50 CEST 2016
Hi all,
Here is the first six rows of my data. In total I have 1269 rows.
My goal is to get conduct nonparametric bootstrap and case resampling.
I would like to randomly select 100 out of the 1269 After that, I wish to bootstrap that randomly selected 100 out of 1269.
I assume I need to set the seed to conduct this randomization, as with bootstrapping you would get varied results each time the code is run.
## NAR SQRTNAR NIC SQRTNIC
## 1 2.6 1.612452 5.6 2.366432
## 2 8.1 2.846050 9.9 3.146427
## 3 5.7 2.387467 7.1 2.664583
## 4 8.3 2.880972 8.1 2.846050
## 5 7.3 2.701851 9.9 3.146427
## 6 4.9 2.213594 8.6 2.932576
Here is my definition of the DataSummary function.
DataSummary <- function(df, indices){
sample <- df[indices, ]
sumry_for_NAR <- summary(sample$NAR)
nms <- names(sumry_for_NAR)
nms <- c(nms, 'std')
out_for_NAR <- c(sumry_for_NAR, sd(sample$NAR))
names(out_for_NAR) <- nms
sumry_for_SQRTNAR <- summary(sample$SQRTNAR)
nms <- names(sumry_for_SQRTNAR)
nms <- c(nms, 'std')
out_for_SQRTNAR <- c(sumry_for_SQRTNAR, sd(sample$SQRTNAR))
names(out_for_SQRTNAR) <- nms
sumry_for_NIC <- summary(sample$NIC)
nms <- names(sumry_for_NIC)
nms <- c(nms, 'std')
out_for_NIC <- c(sumry_for_NIC, sd(sample$NIC))
names(out_for_NIC) <- nms
sumry_for_SQRTNIC <- summary(sample$SQRTNIC)
nms <- names(sumry_for_SQRTNIC)
nms <- c(nms, 'std')
out_for_SQRTNIC <- c(sumry_for_SQRTNIC, sd(sample$SQRTNIC))
names(out_for_SQRTNIC) <- nms
OUT <- c(out_for_NAR, out_for_SQRTNAR, out_for_NIC, out_for_SQRTNIC)
return(OUT)
}
Again, here is my attempt at bootstrapping.
result <- boot(n_data, statistic = DataSummary, R = 100)
result
Per suggestions, would I go with this code to achieve my goal? So, the best reference/resource is the boot help page. I found code through various sites and I got really confused because they were very different from each other.
> set.seed(1007)
>
> x <- rnorm(100)
> y <- x + rnorm(100)
> dat <- data.frame(x, y)
> stat2 <- function(DF, f){
> model <- lm(y ~ x, data = DF[f,])
> coef(model)
> }
>
> boot(dat, stat1, R = 100)
> boot(dat, stat2, R = 100)
Bryan Mac
bryanmac.24 at gmail.com
> On Oct 2, 2016, at 5:37 AM, ruipbarradas at sapo.pt wrote:
>
> Right.
> To see it in action just compare the results of the two calls to boot.
>
> library(boot)
>
> set.seed(1007)
>
> x <- rnorm(100)
> y <- x + rnorm(100)
> dat <- data.frame(x, y)
>
> #Wrong
> stat1 <- function(DF, f){
> model <- lm(DF$y ~ DF$x, data = DF[f,]) #Doesn't bootstrap DF
> coef(model)
> }
>
> #Correct
> stat2 <- function(DF, f){
> model <- lm(y ~ x, data = DF[f,])
> coef(model)
> }
>
> boot(dat, stat1, R = 100)
> boot(dat, stat2, R = 100)
>
>
> Rui Barradas
>
>
> Citando peter dalgaard <pdalgd at gmail.com>:
>
>>> On 01 Oct 2016, at 16:11 , Daniel Nordlund <djnordlund at gmail.com> wrote:
>>>
>>> You haven't told us anything about the structure of your data, or the definition of the DataSummary function.
>>
>> Yes. Just let me add that a common error with boot() is not to pay attention to the required form of the statistic= function argument. It should depend on the data and a set of indices and (for nonparametic bootstrap) it is the indices that are random.
>>
>> Typical mistakes are to completely ignore the index argument, or to write clumsy code that ignores the data specification, as in
>> coef(lm(df$y~df$x, data=d[f])).
>>
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list