[R] The 'subset matching' challenge
Detlef Steuer
detlef.steuer at hsu-hamburg.de
Thu Oct 29 16:53:20 CET 2009
If you look for "subset sum problem", you will find relevant
information.
A starter:
http://en.wikipedia.org/wiki/Subset_sum_problem
Detlef
On Thu, 29 Oct 2009 15:47:22 +0100
Yvonnick Noël <yvonnick.noel at uhb.fr> wrote:
> Dear all,
>
> The following problem just has been submitted to me by an accountant.
>
> In his new job, he has to close some old accounts. He has yearly
> amounts, and a list of products that have been bought over the years, at
> certain prices for which he has an exhaustive record. The problem is: He
> does not know what product was bought this or that year (don't ask). He
> does not want to find back the real story, but just write realistic
> accounts, for which the sum of a subset of product prices will give the
> exact yearly amount.
>
> Here is a real example from his data:
>
> # A list of 64 product prices
> products =
> c(30500,30500,30500,30500,42000,42000,42000,42000,42000,42000,42000,42000,42000,42000,71040,90900,76950,35100,71190,
> 53730,456000,70740,70740,533600,83800,59500,27465,28000,28000,28000,28000,28000,26140,49600,77000,123289,27000,27000,27000,
> 27000,27000,27000,80000,33000,33000,55000,77382,48048,51186,40000,35000,21716,63051,15025,15025,15025,15025,800000,1110000,
> 59700,25908,829350,1198000,1031655)
>
> # Global amount
> amount = 4748652
>
> Now he wants to find all subsets of the 'product' vector which sums to
> 'amount'.
>
> I wrote the following code, which is clearly not optimal:
>
> # Create a matrix of subsets of size r among the integer set 1:n
> subsets <- function(n, r, v = 1:n) {
> if(r <= 0) vector(mode(v), 0)
> else if(r >= n) v[1:n]
> else rbind(cbind(v[1], Recall(n-1, r-1, v[-1])),Recall(n-1, r, v[-1]))
> }
>
> # Main function
> find.amount = function(amount,products) {
>
> if(sum(products)<amount) {
> cat("There is no solution.\n")
> return()
> }
>
> l = length(products)
> cat("\nThere are",l,"product prices\n\n")
> names(products) = paste("Product",1:l,sep="")
> products = sort(products)
>
> for(i in 2:l) {
>
> # If the sum of the i smallest prices is greater than amount, then stop
> if(sum(products[1:i])>amount) break
>
> # Look for matching subsets only in the case when the sum of i
> largest prices is greater than amount
> if(sum(rev(products)[1:i])>=amount) {
> # Generates all subsets of i indicies in 1:l
> subs = subsets(l,i)
> nl = nrow(subs)
> nc = ncol(subs)
>
> # Compute sums of corresponding price subsets
> sums = rowSums(matrix(products[subs],nl,nc))
>
> # Which ones match the global amount ?
> w = which(sums == amount)
> how.many = length(w)
> if(how.many>0) {
> cat("\n-->> There are",how.many,"solutions with",nc,"products :\n")
> for(j in 1:how.many) {
> print(products[subs[w[j],]])
> }
> }
> else cat("\n-->> There is no solution with",nc,"products.\n")
> }
> else cat("\n-->> There is no solution with",i,"products.\n")
> }
> }
>
>
> Then I can use these functions on a smaller example:
>
> > find.amount(4,c(1,1,1,1,2,2))
>
> and a number of matching subsets are provided. The problem is: This
> approach creates a whole matrix of subsets of r integers among 1:n,
> which rapidly gives huge matrices, and this is clearly not optimal for
> the real data provided above.
>
> Would anyone have a suggestion as to an alternative and more efficient
> strategy?
>
> Good luck,
>
> Yvonnick Noel
> University of Brittany, Rennes 2
> France
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list