[R] help for a loop procedure
Petr Savicky
savicky at praha1.ff.cuni.cz
Thu Jan 27 17:30:15 CET 2011
On Thu, Jan 27, 2011 at 11:30:37AM +0100, Serena Corezzola wrote:
> Hello everybody!
>
>
>
> I?m trying to define the optimal number of surveys to detect the highest
> number of species within a monitoring season/session.
>
> To do this I want to run all the possible combinations between a set of
> samples and to calculate the total number of species for each combination of
> 2, 3, 4 ?n samples events, so that at the end I will be able to define which
> is the lowest number of samples that I need to obtain the best result.
>
>
>
> I?ve already done this operation manually, just to see if it works, but the
> point is that some of my datasets have more than 30 samples and more than 35
> species, so that the number of combinations will be HUGE!
>
> So here is the question: I need to find a way for R to make all possible
> combinations of samples automatically, and then to automatically return the
> total number of species in every combination.
>
> I?ve tried to search for a loop script, or something like that. However, I?m
> relatively new to R and I don?t know what I need to do? Can anyone help me?
>
>
>
> Here I?ve written a simple example of the operations I need to do, just to
> make my problem clearer.
>
>
>
> My dataset (matrix) has sample events by rows (U1,U2,U3) and detected
> species by columns.
>
>
>
> U<-read.table("C:\\Documents
> \\tre_usc.txt",header=T,row.names=1,sep="\t",dec = ",")
Hello:
For simplicity of preparing a reply, let me include your data
as an R command.
U <- structure(list(Aadi = c(0L, 0L, 0L), Aagl = c(0L, 0L, 0L),
Apap = c(0L, 0L, 0L), Aage = c(0L, 0L, 0L), Bdia = c(7L, 4L, 0L),
Beup = c(0L, 2L, 0L), Crub = c(5L, 1L, 0L), Carc = c(0L, 0L, 0L),
Cpam = c(1L, 0L, 14L)), .Names = c("Aadi", "Aagl", "Apap", "Aage",
"Bdia", "Beup", "Crub", "Carc", "Cpam"), class = "data.frame",
row.names = c("U1", "U2", "U3"))
Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam
U1 0 0 0 0 7 0 5 0 1
U2 0 0 0 0 4 2 1 0 0
U3 0 0 0 0 0 0 0 0 14
> First, I?ve created from this matrix all the subsets based on single
> samples,
>
>
>
> U1 <- U [c(1), ]
>
> U2 <- U [c(2), ]
>
> U3 <- U [c(3), ]
>
[...]
>
> then I?ve combined them summing each time the values of the chosen lines
> (total n? of combination = 4).
>
>
>
> U12<-U1+U2
>
> U13<-U1+U3
>
> U23<-U2+U3
>
> U123<-U1+U2+U3
>
[...]
>
>
> Then I?ve applied the command ?length? to find the number of species for
> every new combination.
>
>
>
> length(U12[U12>0])
>
> [1] 4
>
>
>
> length(U13[U13>0])
>
> [1] 3
>
This can be partially automatized as follows
UM <- as.matrix(U)
A <- rbind(
c(1, 0, 0),
c(0, 1, 0),
c(0, 0, 1),
c(1, 1, 0),
c(1, 0, 1),
c(0, 1, 1),
c(1, 1, 1))
rownam <- rep("U", times=nrow(A))
for (i in 1:3) {
rownam[A[, i] == 1] <- paste(rownam[A[, i] == 1], i, sep="")
}
dimnames(A) <- list(rownam, NULL)
C <- A %*% UM
C
Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam
U1 0 0 0 0 7 0 5 0 1
U2 0 0 0 0 4 2 1 0 0
U3 0 0 0 0 0 0 0 0 14
U12 0 0 0 0 11 2 6 0 1
U13 0 0 0 0 7 0 5 0 15
U23 0 0 0 0 4 2 1 0 14
U123 0 0 0 0 11 2 6 0 15
rowSums(C != 0)
U1 U2 U3 U12 U13 U23 U123
3 3 1 4 3 4 4
> Now I need to do this with 10 and 32 sample events??.: (
If i understand you correctly, your real table U has 32 rows
and you want to consider all subsets of at most 10 rows. If this
is so, then the number of combinations is
sum(choose(32, 1:10))
# [1] 107594212
A matrix of this number of rows and 35 columns requires 30 GB
of memory. How do you want to summarize the results? There may
be a more efficient way to compute the required parameters.
For example, the average number of species, which are contained
in a sum of a random selection of k rows may be computed easily,
since we can consider the columns (species) individually and
for each column, the probability to get a nonzero sum may be
computed without actually constructing all the subsets.
If you need a parameter, which is harder to compute than the
average, it is possible to consider simulation. In this case,
not all subsets would be generated, but a smaller number
of randomly chosen subsets of k rows for a given k.
Petr Savicky.
More information about the R-help
mailing list