[R] a problem of approach
Petr Savicky
savicky at cs.cas.cz
Wed Jun 27 20:03:02 CEST 2012
On Wed, Jun 27, 2012 at 05:36:08PM +0300, Adrian Duşa wrote:
> Dear R-help list,
>
> Part of a program I wrote seem to take a significant amount of time,
> therefore I am looking for an alternative approach.
> In order to explain what is does:
>
> - the input is a sorted vector of integer numbers
> - some higher numbers may be derived (using a mathematical formula)
> from lower numbers, therefore they should be eliminated
> - at the end, the vector should contain only uniquely defined numbers
>
> Pet hypothetical example, input vector:
> - 2 3 4 5 6 7 8 9 10
> - number 2 generates 4, 7, 10
> - 2 3 5 6 8 9 (surviving vector)
> - number 3 generates 5 and 9
> - 2 3 6 8 (surviving vector)
> - number 6 generates 8
> - final surviving vector 2 3 6
>
> Function foo(x, ...) generates the numbers, my current approach being:
> ####
> index <- 0
> while ((index <- index + 1) < length(numbers)) {
> numbers <- setdiff(numbers, foo(numbers[index]))
> }
> ####
>
> This seem to take quite some time (but I don't know any other way of
> doing it), hence my question(s):
> - would there be another (quicker) implementation in R?
> - alternatively, should I go for a C implementation?
>
> (actually, I did create a C implementation, but it doesn't bring any
> more speed... it is actually a bit slower).
>
> A real-life pet example, using the function findSubsets() from the QCA
> package (our foo function above):
>
> ####
> library(QCA)
> testfoo <- function(x, y) {
> index <- 0
> while((index <- index + 1) < length(x)) {
> x <- setdiff(x, findSubsets(y, x[index], max(x)))
> }
> return(x)
> }
>
> nofl <- rep(3, 14)
> set.seed(12345)
> numbers <- sort(sample(seq(prod(nofl)), 1000000))
Hi.
How large the numbers are? If the bound 3^14 = 4782969 used
above apply also to the real situation, then it is possible
to represent the set using a logical vector of this length,
which has TRUE for the numbers present in the set and FALSE
for the other. Removing a number means to set its bit from
TRUE to FALSE, which is a very simple operation.
# prepare an example
numbers <- c(2, 3, 6, 8, 12, 17)
x <- rep(FALSE, times=max(numbers))
x[numbers] <- TRUE
which(x)
[1] 2 3 6 8 12 17
#remove 3, 8, 12 from the set
x[c(3, 8, 12)] <- FALSE
which(x)
[1] 2 6 17
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list