[R] A combinatorial optimization problem: finding the best permutation of a complex vector

Ravi Varadhan rvaradhan at jhmi.edu
Sat Nov 14 16:51:18 CET 2009


Hi,

I have solved the problem that I had posed before.  Here is a statement of the problem:

"I have a complex-valued vector X in C^n.  Given another complex-valued vector Y in C^n, I want to find a permutation of Y, say, Y*,  that minimizes ||X - Y*||, the distance between X and Y*. "

I was talking to Professor Moody T. Chu, who is a well-known numerical analyst from NC State Univ, and he pointed out that this problem is an instance of the classical "linear sum assignment problem (LSAP)" in discrete mathematics.  Once this was revealed to me, it didn't take me long to find out the existence of various algorithms (e.g. Hungarian algorithm) and codes (C, Fortran, Matlab) for solving this problem.  I also looked in the CRAN task view on optimization and found that the LSAP solver is present in the "clue" package.  Thanks to Kurt Hornik for this package.  

So, here is an illustration of the "amazing" power of mathematics:

n <- 1000

x <- rt(n, df=3) + 1i * rt(n, df=3)  # this is the target vector to be matched

y <- x[sample(n)]  # this is the vector to be permuted

# Note:  I have chosen a random permutation of the target so that I know the answer is "x" itself
# and the minimum distance is zero

Cmat <- outer(x, y, FUN=function(x,y) Mod(x - y))

require(clue)

ans <- solve_LSAP(Cmat, maximum=FALSE)  # We are minimizing the linear sum

dist <- function(x, y) sqrt(sum(Mod(x - y)^2))

dist(x, y[c(ans)])


This is remarkable.  It takes only about 0.3 seconds to solve this difficult combinatorial problem!


Best,
Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Date: Thursday, November 12, 2009 2:20 pm
Subject: Re: [R] A combinatorial optimization problem: finding the best permutation of a complex vector
To: Ravi Varadhan <rvaradhan at jhmi.edu>
Cc: r-help at r-project.org


> On Thu, 12 Nov 2009, Ravi Varadhan wrote:
> 
> >
> > Hi,
> >
> > I have a complex-valued vector X in C^n.  Given another 
> complex-valued 
> > vector Y in C^n, I want to find a permutation of Y, say, Y*, that 
> > minimizes ||X - Y*||, the distance between X and Y*.
> >
> > Note that this problem can be trivially solved for "Real" vectors, 
> since 
> > real numbers possess the ordering property. Complex numbers, 
> however, do 
> > not possess this property.  Hence the challenge.
> >
> > The obvious solution is to enumerate all the permutations of Y and 
> pick 
> > out the one that yields the smallest distance.  This, however, is 
> only 
> > feasible for small n.  I would like to be able to solve this for n 
> as 
> > large as 100 - 1000, in which cases the permutation approach is 
> > infeasible.
> >
> > I am looking for algorithms, possibly iterative, that can provide a 
> 
> > "good" approximate solutions that are not necessarily optimal for 
> > high-dimensional vectors. I can do random sampling, but this can be 
> very 
> > inefficient in high-dimensional problems.  I am looking for 
> efficient 
> > algorithms because this step has to be performed in each iteration 
> of an 
> > "outer" algorithm.
> >
> > Are there any clever adaptive algorithms out there?
> >
> 
> I do not know.
> 
> But would you settle for a not-so-clever adaptive heuristic?
> 
> If so, see below.
> 
> 
> 
> > Here is an example illustrating the problem:
> >
> > require(e1071)
> >
> > n <- 8
> > x <- runif(n) + 1i * runif(n)
> > y <- runif(n) + 1i * runif(n)
> >
> > dist <- function(x, y) sqrt(sum(Mod(x - y)^2))
> >
> > perms <- permutations(n)
> > dim(perms)  # [1] 40320     8
> > tmp <- apply(perms, 1, function(ord) dist(x, y[ord]))
> > z <- y[perms[which.min(tmp), ]]  # exact solution
> > dist(x, z)
> >
> > # an aproximate random-sampling approach
> > nsamp <- 10000
> > perms <- t(replicate(nsamp, sample(1:n, size=n, replace=FALSE)))
> > tmp <- apply(perms, 1, function(ord) dist(x, y[ord]))
> > z.app <- y[perms[which.min(tmp), ]]  # approximate solution
> > dist(x, z.app)
> >
> 
> The heuristic is to use a stochastic greedy updates. Here is a very 
> simple 
> one:
> 
> swap.samp <-
>   function(index) {
>          sub.ind <- sample(seq(along=index),2)
>          index[sub.ind]<- rev(sub.ind)
>          index
>   }
> 
> 
> z.app <- y
> z.cand <- y
> 
> for (i in 1:100) z.cand <-
>  	if( dist(x,z.app) < dist(x,z.cand) ) {
> 
>          	z.app[swap.samp(1:8)]
> 
>  	} else {
>  	z.app <- z.cand
>  	z.cand[swap.samp(1:8)]
>  	}
> 
> On your toy example, this usually finds the min(dist(x,z.app))  in < 
> 100 
> trials.
> 
> Note that when
> 
>  	z.diff <- z.app != z.cand
> 
>  	dist(x[ z.diff ],z.app[ z.diff ])^2 - dist(x[ z.diff ],z.cand[ 
> z.diff ])^2
> 
> equals
> 
>  	dist(x,z.app)^2 - dist(x,z.cand)^2
> 
> so you could vectorize the above to randomly pair up all the points 
> (for n 
> even) then update the n%/%2 pairs all at once.
> 
> 
> For large problems, you might try to preferentially sample pairs of 
> points 
> with similar values of Mod ( x - z.app ) to begin with. The heuristic 
> 
> being that in two pairs of points that are both distant, swapping them 
> 
> might realize a bigger benefit.
> 
> 
> --
> 
> Another version is to sample k points, randomly permute, then do a 
> greedy 
> update:
> 
> sub.samp <-
>   function(index,n) {
>  	sub.ind <- sample(seq(along=index),n)
>  	index[sub.ind]<- sample(sub.ind)
>  	index
> }
> 
> 
> # k is 4 here
> 
> for (i in 1:100) z.cand <-
>  	if( dist(x,z.app) < dist(x,z.cand) ){
>          	z.app[sub.samp(1:8,4)]
>  	} else {
>          	z.app <- z.cand
>          	z.cand[sub.samp(1:8,4)]
>  	}
> 
> 
> On your toy problem, this takes in the low 100's to find the min.
> 
> I think you can do the similar vectorization trick here.
> --
> 
> I suppose another variant would repeatedly sample k points, then 
> enumerate 
> distances for all permutations of them, then choose the best one to 
> update z.app.
> 
> Again, in larger problems, one might weight those k points towards 
> higher 
> values of Mod( x - z.app ) at least to begin with.
> 
> HTH,
> 
> Chuck
> 
> 
> > Thanks for any help.
> >
> > Best,
> > Ravi.
> >
> >
> > ____________________________________________________________________
> >
> > Ravi Varadhan, Ph.D.
> > Assistant Professor,
> > Division of Geriatric Medicine and Gerontology
> > School of Medicine
> > Johns Hopkins University
> >
> > Ph. (410) 502-2619
> > email: rvaradhan at jhmi.edu
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > 
> > PLEASE do read the posting guide 
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> Charles C. Berry                            (858) 534-2098
>                                              Dept of Family/Preventive 
> Medicine
> E                     UC San Diego
>   La Jolla, San Diego 92093-0901
> 
>




More information about the R-help mailing list