[R] Conditional Random selection
ruipbarradas at sapo.pt
ruipbarradas at sapo.pt
Sat Nov 21 22:40:55 CET 2015
Hello,
Is that a real doubt? Like Bert said, you should spend some time with
an R tutorial. All you need is to know how to form a data.frame.
tmp <- tapply(tab1$S1, tab1$time, function(x) length(unique(x)))
data.frame(time = names(tmp), S1 = tmp)
Rui Barradas
Citando Ashta <sewashm at gmail.com>:
> Hi Rui ,
>
> I tried that one before I send out my original message.
> it gave me only this,
>
> tapply(tab$S1, tab$time, function(x) length(unique(x)))
> 1 2 3
> 2 1 3
>
> I am expecting an output of like this
>
> time S1
> 1 2
> 2 1
> 3 3
>
> On Sat, Nov 21, 2015 at 2:38 PM, <ruipbarradas at sapo.pt> wrote:
>> Hello,
>>
>> Try
>>
>> tapply(tab$S1, tab$time, function(x) length(unique(x)))
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Citando Ashta <sewashm at gmail.com>:
>>
>> Hi Bert and all,
>> I have related question. In each time period there were different
>> locations where the samples were collected (S1). I want count the
>> number of unique locations (S1) for each unique time period . So in
>> time 1 the samples were collected from two locations and time 2 only
>> from one location and time 3 from three locations..
>>
>> tab <- read.table(textConnection(" time S1 rep
>> 1 1 1
>> 1 2 1
>> 1 2 2
>> 2 1 1
>> 2 1 2
>> 2 1 3
>> 2 1 4
>> 3 1 1
>> 3 2 1
>> 3 3 1 "),header = TRUE)
>>
>> what I want is
>>
>> time S1
>> 1 2
>> 2 1
>> 3 3
>>
>> Thank you again.
>>
>> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote:
>>
>> Thank you Bert!
>>
>> What I want is at least 500 samples based on random sampling of time
>> period. This allows samples collected at the same time period are
>> included together.
>>
>> Your script is doing what I wanted to do!!
>>
>> Many thanks
>>
>> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>
>> David's "solution" is incorrect. It can also fail to give you times
>> with a total of 500 items to sample from in the time periods.
>>
>> It is not entirely clear what you want. The solution below gives you a
>> random sample of time periods in which X1>0 and the total number of
>> samples among them is >= 500. It does not give you the fewest number
>> of periods that can do this. Is this what you want?
>>
>> tab[with(tab,{
>> rownums<- sample(seq_len(nrow(tab))[X1>0])
>> sz <- cumsum(X2[rownums])
>> rownums[c(TRUE,sz<500)]
>> }),]
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> -- Clifford Stoll
>>
>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote:
>>
>> Thank you David!
>>
>> I rerun the your script and it is giving me the first three time periods
>> is it doing random sampling?
>>
>> tab.fan
>> time X1 X2
>> 2 2 5 230
>> 3 3 1 300
>> 5 5 2 10
>>
>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote:
>>
>> Use dput() to send data to the list as it is more compact:
>>
>> dput(tab)
>>
>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names =
>> c("time",
>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>>
>> You can just remove the lines with X1 = 0 since you don't want to use them.
>>
>> tab.sub <- tab[tab$X1>0, ]
>>
>> Then the following gives you a sample:
>>
>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>
>> Note, that your "solution" of times 6, 7, and 8 will never appear because
>> the sum of the values is 586.
>>
>> David L. Carlson
>> Department of Anthropology
>> Texas A&M University
>>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta
>> Sent: Saturday, November 21, 2015 11:53 AM
>> To: R help <r-help at r-project.org>
>> Subject: [R] Conditional Random selection
>>
>> Hi all,
>>
>> I have a data set that contains samples collected over time. In
>> each time period the total number of samples are given (X2) The goal
>> is to select 500 random samples. The selection should be based on
>> time (select time periods until I reach 500 samples). Also the time
>> period should have greater than 0 for X1 variable. X1 is an indicator
>> variable.
>>
>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0
>>
>> tab <- read.table(textConnection(" time X1 X2
>> 1 0 251
>> 2 5 230
>> 3 1 300
>> 4 0 25
>> 5 2 10
>> 6 3 101
>> 7 1 300
>> 8 4 185 "),header = TRUE)
>>
>> In the above example, samples from time 1 and 4 will not be selected
>> ( X1 is zero)
>> So I could reach my target by selecting time 6,7, and 8 or time 2 and
>> 3 and so on.
>>
>> Can any one help to do that?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.htmland provide commented, minimal,
>> self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list