• Gromov-Wasserstein Learning for Structured Data Modeling

    3 PM - 6 PM, Feb. 23, 2022, PST, Virtually with AAAI [Slides]

  • The last few years have seen the rapid development of machine learning methods for modeling structured data coming from biology, chemistry, network science, natural language processing, and computer vision. Recently-developed tools and cutting-edge methodologies coming from the theory of optimal transport, especially the models and the algorithms based on the Gromov-Wasserstein (GW) distance and its variants, have proved to be particularly successful for these tasks. An impressive feature of these works is the application of new theoretical and computational techniques of Gromovized optimal transport for comparing probability distributions defined on spaces with complex structures, such as graphs, sets, kernels, Riemannian manifolds, and more metric spaces.

    This tutorial aims to introduce the machine learning community at large to Gromov-Wasserstein learning (GWL) --- a new machine learning framework that has been proven to be very effective in structured data modeling but is not yet very well-known in the community. In this tutorial, I introduce (i) the theoretical fundamentals of GWL, including the definition of GW distance, its properties of GW distance and its connections to other optimal transport distance; (ii) the computational methods of GW distance and its variants, the GW-based machine learning models and algorithms; (iii) the innovative downstream applications of GWL and the challenges of optimal transport and structured data modeling. The content above relates to several areas within the AAAI community, such as machine learning and its applications, nonconvex optimization, stochastic algorithms, and graph modeling and analysis. More details can be found in the slides attached below.

    Part 1

    I will first provide an introduction to the basic theory of optimal transport, starting from Wasserstein distance, Wasserstein barycenters, and their power on distribution matching and averaging. Beyond the basic theory, I will focus on the optimal transport between the distributions defined on incomparable spaces, and accordingly, I will introduce the Gromov-Wasserstein distance for metric measure spaces (mm-spaces) and the corresponding Gromov-Wasserstein barycenters, showing their feasibility and rationality for such challenging scenarios. Furthermore, I will consider the metric measure spaces with complex structures and define the GW distance and barycenters for the corresponding structured data like graphs (e.g., molecules, networks, and meshes). Finally, I will show the potentials of GW distance to graph matching and partitioning.

    Part 2

    I will elaborate on GW distance and its typical variants proposed in recent years. For the classic GW distance, I will introduce its typical optimization algorithms, e.g., conjugated gradient, proximal gradient, Bregman ADMM, and so on. Additionally, the variants for the acceleration and the extension of GW distance are introduced, including low-rank GW distance, fused GW distance, hierarchical GW distance, sliced GW distance, unbalanced GW distance, etc.

    Part 3

    In this part, I will leverage the GW-related distance to reformulate some existing machine learning methods and propose some new models and algorithms. In particular, we will introduce the applications of GW diatance to generative modeling (e.g., the GW-GAN for coupled generative models and the relationally-regularized Wasserstein autoencoder), graph representation method (e.g., GW factorization model), and graph generation (e.g., GW-based graphon estimator and graphon autoencoder). Finally, I will discuss some ongoing research and interesting directions in the study of GW distance-based machine learning.