Optimal transport overview



Optimal transport, also known as Wasserstein distance or Earth Mover's Distance, is a mathematical framework for measuring the distance between two probability distributions. It is used to calculate the minimal cost of transforming one distribution into another by moving mass around in a way that preserves the overall amount of mass.

Optimal transport can be used to generate datasets by leveraging its ability to match and transform distributions. One way to do this is through a process called data augmentation, which involves creating new samples by transforming existing ones while preserving the underlying distribution.

Here's an example of how optimal transport can be used for data augmentation:

Start with a set of training samples.

Define a cost function that measures the dissimilarity between two samples. This can be done using various metrics, such as the Euclidean distance or a more complex distance metric.

Use optimal transport to match the distribution of the training samples to a desired target distribution. This involves finding the optimal transportation plan that minimizes the cost of transforming the training distribution into the target distribution.

Generate new samples by transforming the existing ones based on the transportation plan. This can be done by applying random perturbations or deformations to the existing samples based on the transportation plan.

Add the newly generated samples to the training set.

Repeat the process until enough samples have been generated.

By using optimal transport to match and transform distributions, data augmentation can be used to increase the size and diversity of datasets, which can help improve the performance of machine learning models. Additionally, optimal transport can be used to sample new data points from a given distribution, which can be useful for generative modeling tasks.