[R] indexing??

Tue Feb 28 16:33:44 CET 2012

On Tue, Feb 28, 2012 at 05:59:24AM -0800, helin_susam wrote:
> Hello All,
> 
> My algorithm as follows;
> y <- c(1,1,1,0,0,1,0,1,0,0)
> x <- c(1,0,0,1,1,0,0,1,1,0)
> 
> n <- length(x)
> 
> t <- matrix(cbind(y,x), ncol=2)
> 
> z = x+y
> 
> for(j in 1:length(x)) {
> out <- vector("list", )
> 
> for(i in 1:10) {
> 
> t.s <- t[sample(n,n,replace=T),]
> 
> y.s <- t.s[,1]
> x.s <- t.s[,2]
> 
> z.s <- y.s+x.s
> 
> out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j]))
> kk <- sapply(out, function(x) {x$finding})
> ff <- out[! kk]
> }
> 
> I tried to find the total of the two vectors as statistic by using
> bootstrap. Finally, I want to get the values which do not contain the y's
> each elemet. In the algorithm ti is referred to "ff". But i get always the
> same result ;
> > ff
> list()
> > kk
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> Because, my "y" vector contains only 2 elements, and probably all of the
> bootstrap resamples  include "1", or all of resamples include "0". So I can
> not find the true matches. Can anyone help me about how to be?

Hi.

First of all, there are some unclear points in your code.
In particular, i would expect "}" between the line

  out[[i]] <- list(...

and

  kk <- sapply(...

Moreover, i do not see, why the loop over j contains the
loop over i. I would expect these loops be disjoint,
since the loop over i collects all the samples to a list.

The following code is a modification, which i suggest
as an alternative.

  y <- c(1:5, 1:5)
  x <- c(1,0,0,1,1,0,0,1,1,0)

  n <- length(x)

  t <- matrix(cbind(y,x), ncol=2)

  z = x+y

  # generate 10 bootstrap samples and keep z.s, y.s
  out <- vector("list", 10)
  for(i in 1:10) {
    t.s <- t[sample(n,n,replace=T),]
    y.s <- t.s[,1]
    x.s <- t.s[,2]
    z.s <- y.s+x.s
    out[[i]] <- list(zz = z.s, yy =y.s)
  }

  # check, which replications do not contain y[j] in their y.s,
  # and take the OR of these conditions over j
  ff <- rep(FALSE, times=length(out))
  for(j in 1:length(y)) {
     kk <- sapply(out, function(x) {any(x$yy == y[j])})
     ff <- ff | (! kk)
  }
  out[ff]

With the original y <- c(1,1,1,0,0,1,0,1,0,0), the probability
that a bootstrap sample contains only 1's or only 0's is
2 * (1/2)^10, so i replaced the vector y with another, where
a missing value is more frequent. I obtained, for example

  [[1]]
  [[1]]$zz
   [1] 2 2 5 2 3 2 3 2 2 6

  [[1]]$yy
   [1] 1 1 5 1 3 2 3 2 1 5   # 4 is missing

  [[2]]
  [[2]]$zz
   [1] 5 5 5 5 3 5 2 5 6 4

  [[2]]$yy
   [1] 4 4 5 4 3 5 2 5 5 3  # 1 is missing

  [[3]]
  [[3]]$zz
   [1] 5 2 5 1 5 1 2 5 5 5

  [[3]]$yy
   [1] 4 2 5 1 5 1 1 4 5 4  # 3 is missing

Hope this helps.

Petr Savicky.