[R] sampling rows with values never sampled before

C W tmrsg11 at gmail.com
Tue Jun 23 00:13:17 CEST 2015


Hi Jean,

Thanks!

Daniel,
Yes, you are absolutely right.  I want sampled vectors to be as different
as possible.

I added a little more to the earlier data set.
        x1  x2  x3
 [1,]  1 3.7  2.1
 [2,]  2 3.7  5.3
 [3,]  3 3.7  6.2
 [4,]  4 3.7  8.9
 [5,]  5 3.7  4.1
 [6,]  1 2.9  2.1
 [7,]  2 2.9  5.3
 [8,]  3 2.9  6.2
 [9,]  4 2.9  8.9
[10,]  5 2.9 4.1
[11,]  1 5.2 2.1
[12,]  2 5.2 5.3
[13,]  3 5.2 6.2
[14,]  4 5.2 8.9
[15,]  5 5.2 4.1

If I sampled row, 1, 6, 11, solving the system of equations will not be
possible.  So, I am avoiding "similar vectors".

Thanks,

Mike


On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund <djnordlund at frontier.com>
wrote:

> On 6/22/2015 9:42 AM, C W wrote:
>
>> Hello R list,
>>
>> I am have question about sampling unique coordinate values.
>>
>> Here's how my data looks like
>>
>>  dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
>>> dat
>>>
>>        x1  x2
>>   [1,]  1 3.7
>>   [2,]  2 3.7
>>   [3,]  3 3.7
>>   [4,]  4 3.7
>>   [5,]  5 3.7
>>   [6,]  1 2.9
>>   [7,]  2 2.9
>>   [8,]  3 2.9
>>   [9,]  4 2.9
>> [10,]  5 2.9
>> [11,]  1 5.2
>> [12,]  2 5.2
>> [13,]  3 5.2
>> [14,]  4 5.2
>> [15,]  5 5.2
>>
>>
>> If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).
>>
>> I want to avoid either the first or second coordinate repeated.  It leads
>> to undefined matrix inversion.
>>
>> I thought of using sampling(), but not sure about applying it to a data
>> frame.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
> I am not sure you gave us enough information to solve your real world
> problem.  But I have a few comments and a potential solution.
>
> 1. In your example the unique values in in x1 are completely crossed with
> the unique values in x2.
> 2. since you don't want duplicates of either number, then the maximum
> number of samples that you can take is the minimum number of unique values
> in either vector, x1 or x2 (in this case x2 with 3 unique values).
> 3. Sample without replace from the smallest set of unique values first.
> 4. Sample without replacement from the larger set second.
>
> > x <- 1:5
> > xx <- c(3.7, 2.9, 5.2)
> > s2 <- sample(xx,2, replace=FALSE)
> > s1 <- sample(x,2, replace=FALSE)
> > samp <- cbind(s1,s2)
> >
> > samp
>      s1  s2
> [1,]  5 3.7
> [2,]  1 5.2
> >
>
> Your actual data is probably larger, and the unique values in each vector
> may not be completely crossed, in which case the task is a little harder.
> In that case, you could remove values from your data as you sample.  This
> may not be efficient, but it will work.
>
> smpl <- function(dat, size){
>   mysamp <- numeric(0)
>   for(i in 1:size) {
>     s <- dat[sample(nrow(dat),1),]
>     mysamp <- rbind(mysamp,s, deparse.level=0)
>     dat <- dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
>     }
>   mysamp
> }
>
>
> This is just an example of how you might approach your real world
> problem.  There is no error checking, and for large samples it may not
> scale well.
>
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Bothell, WA USA
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list