Marco Cuturi - Sinkhorn Scaling for Optimal TransportSinkhorn Distances: Lightspeed Computation of Optimal TransportDescriptionWe propose in this work a way to approximate Optimal Transport (OT) distances (a.k.a Wasserstein or Earth Mover’s) using a regularized Optimal Transport (OT) problem. That problem can be solved with Sinkhorn’s algorithm. The accuracy of the approximation is parameterized by a regularization parameter . Computing this regularized OT problem results in two quantities: an upper bound on the actual OT distance, which we call the dual-Sinkhorn divergence, as well as a lower bound, which can be used for nearest neighbor search under the OT metric. Preliminary experimental evidence suggests that the upper bounds — the dual-Sinkhorn divergence — may, in fact, perform better than the OT distance in inference tasks. If you would like to use the EMD in your applications, but gave up doing so because of its computational cost, we suggest that you replace the EMD evaluations in your applications by the dual-Sinkhorn divergence, and benefit from its faster speed and nice computational features. To do so, you will need to tune the regularization parameter . As gets larger, the dual-Sinkhorn divergence converges to the OT distance. For large values, the computation of our bounds become more computationally expensive and numerically unstable as grows. Small values of have been shown to yield very good approximations (in the sense that our approximations perform better than the real OT distance). We suggest thus that you choose by cross-validation and start with a small value. A small value for means that the product of with (where is the ground metric) should not be too large. You need to make sure that which would be the threshold when numerical problems appear. Numerical instabilities arise when the elementwise exponential matrix has elements that are numerically because of insufficient numerical precision caused by a value that is too big. In that case Sinkhorn’s algorithm can blow up. ParallelizationA nice feature of these distances is that their computation is vectorized: the computation of a distances, whether from one histogram to many, or many to many, can be carried out simultaneously using elementary linear algebra operations. As such, these upper and lower bounds can be computed trivially on GPGPU’s. The implementation provided in Matlab below should also run on your GPU if you have the PCT toolbox. Simply instantiate all input variables on the GPU using the gpuArray command first. Implementation in Matlab [V0.2b, 21/04/15]The sinkhornTransport zip file contains an implementation that outputs the upper and lower bounds, as well as the smoothed optimal transport(s) if required. This zip file contains:
Here is what the output returned by this function should look like, with a screencapture below. >> testSinkhornTransport ***** Example when Computing distances of 1-vs-N histograms ****** Display Vector of Distances and Lower Bounds on EMD ***** Example when Computing N distances between N different pairs ****** Display Vector of Distances and Lower Bounds on EMD |