[R] sampling rows with values never sampled before
C W
tmrsg11 at gmail.com
Tue Jun 23 00:13:17 CEST 2015
Hi Jean,
Thanks!
Daniel,
Yes, you are absolutely right. I want sampled vectors to be as different
as possible.
I added a little more to the earlier data set.
x1 x2 x3
[1,] 1 3.7 2.1
[2,] 2 3.7 5.3
[3,] 3 3.7 6.2
[4,] 4 3.7 8.9
[5,] 5 3.7 4.1
[6,] 1 2.9 2.1
[7,] 2 2.9 5.3
[8,] 3 2.9 6.2
[9,] 4 2.9 8.9
[10,] 5 2.9 4.1
[11,] 1 5.2 2.1
[12,] 2 5.2 5.3
[13,] 3 5.2 6.2
[14,] 4 5.2 8.9
[15,] 5 5.2 4.1
If I sampled row, 1, 6, 11, solving the system of equations will not be
possible. So, I am avoiding "similar vectors".
Thanks,
Mike
On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund <djnordlund at frontier.com>
wrote:
> On 6/22/2015 9:42 AM, C W wrote:
>
>> Hello R list,
>>
>> I am have question about sampling unique coordinate values.
>>
>> Here's how my data looks like
>>
>> dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
>>> dat
>>>
>> x1 x2
>> [1,] 1 3.7
>> [2,] 2 3.7
>> [3,] 3 3.7
>> [4,] 4 3.7
>> [5,] 5 3.7
>> [6,] 1 2.9
>> [7,] 2 2.9
>> [8,] 3 2.9
>> [9,] 4 2.9
>> [10,] 5 2.9
>> [11,] 1 5.2
>> [12,] 2 5.2
>> [13,] 3 5.2
>> [14,] 4 5.2
>> [15,] 5 5.2
>>
>>
>> If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).
>>
>> I want to avoid either the first or second coordinate repeated. It leads
>> to undefined matrix inversion.
>>
>> I thought of using sampling(), but not sure about applying it to a data
>> frame.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
> I am not sure you gave us enough information to solve your real world
> problem. But I have a few comments and a potential solution.
>
> 1. In your example the unique values in in x1 are completely crossed with
> the unique values in x2.
> 2. since you don't want duplicates of either number, then the maximum
> number of samples that you can take is the minimum number of unique values
> in either vector, x1 or x2 (in this case x2 with 3 unique values).
> 3. Sample without replace from the smallest set of unique values first.
> 4. Sample without replacement from the larger set second.
>
> > x <- 1:5
> > xx <- c(3.7, 2.9, 5.2)
> > s2 <- sample(xx,2, replace=FALSE)
> > s1 <- sample(x,2, replace=FALSE)
> > samp <- cbind(s1,s2)
> >
> > samp
> s1 s2
> [1,] 5 3.7
> [2,] 1 5.2
> >
>
> Your actual data is probably larger, and the unique values in each vector
> may not be completely crossed, in which case the task is a little harder.
> In that case, you could remove values from your data as you sample. This
> may not be efficient, but it will work.
>
> smpl <- function(dat, size){
> mysamp <- numeric(0)
> for(i in 1:size) {
> s <- dat[sample(nrow(dat),1),]
> mysamp <- rbind(mysamp,s, deparse.level=0)
> dat <- dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
> }
> mysamp
> }
>
>
> This is just an example of how you might approach your real world
> problem. There is no error checking, and for large samples it may not
> scale well.
>
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Bothell, WA USA
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list