Advances in Optimal Transport-based Machine Learning
9:00 AM - 12:30 PM, Aug. 19, 2023, UTC+8, Macau, IJCAI [Slides]
The last few years have seen the rapid development of machine learning methods for natural language processing, computer vision, and scientific discovery.
Recently-developed tools and cutting-edge methodologies from the theory of optimal transport (OT), especially the models and algorithms based on the optimal transport distance and its variants, have been particularly successful for these applications.
An impressive feature of these works is applying new theoretical fruits and computational techniques of various optimal transport distances for comparing probability measures defined on metric spaces with complex structures.
In the proposed tutorial, I plan to introduce the machine learning community to recent advances in computational optimal transport methods, their theoretical foundations, and the corresponding optimal transport-based learning paradigms, especially those proven to be very effective in challenging tasks but are not yet well-known. In particular, I will introduce (i) the theoretical and computational fundamentals of optimal transport, from the formulations of classic optimal transport distance and its variants to their computation and acceleration methods; (ii) the optimal transport-based approaches for high-dimensional data generation and structured data generation, respectively, covering all mainstream generative modeling paradigms (e.g., autoencoders and GANs) and classic nonparametric generative models (e.g., graphons); (iii) the applications of optimal transport to privacy-preserving machine learning, including robust multi-modal learning and distributed domain adaptation. The content above relates to several areas within the IJCAI community, such as statistical machine learning, optimization algorithms, and various cutting-edge machine learning applications, including multi-modal representation learning, structured data (like graphs and point clouds) modeling, and high-dimensional data generation.
Part 1: Computational Optimal Transport
I will first introduce the basic theory of optimal transport, starting from Wasserstein distance, Wasserstein barycenters, and their power on distribution matching and averaging. Beyond the basic theory, I will introduce typical optimization algorithms of the Wasserstein distance and its variants, e.g., Sinkhorn-scaling, proximal gradient, Bregman ADMM, etc. Additionally, the variants for the acceleration and the extension of Wasserstein distance are introduced, including low-rank Sinkhorn distance, sliced Wasserstein distance, etc.
Part 2: OT-based Generative Modeling
In this part, I will introduce mainstream generative modeling methods, including autoencoders (AEs) and generative adversarial networks (GANs), from the perspective of optimal transport. Wasserstein autoencoder, Wasserstein GAN, and their variants will be introduced and analyzed in-depth. Besides the generative models of high-dimensional data, I will further introduce the generative models of structured data like graphs based on the Gromovization of optimal transport. In particular, Gromov-Wasserstein (GW) distance and barycenters will be introduced. Based on the GW distance, I will introduce the learning paradigms of nonparametric and parametric graph generators, respectively.
Part 3: OT-based Privacy-preserving Machine Learning
In this part, I will introduce recent advances in privacy-preserving machine learning achieved by optimal transport-based methods. Some representative OT-based learning methods will be shown, including the OT-based robust multi-modal learning paradigms for unaligned data, e.g., the Gromov-Wasserstein multi-modal clustering method and the hierarchical optimal transport (HOT) multi-modal classifier. These methods extend classic multi-kernel fusion (MKF) and canonical component analysis (CCA) to unaligned multi-modal scenarios. Without using the correspondence among different modalities' samples, these methods can protect data privacy efficiently. Finally, I will introduce a decentralized optimization method of entropic optimal transport and apply it to distributed domain adaptation. This work explores the possibility of approximating Wasserstein distance in distributed systems and without data sharing.
Gromov-Wasserstein Learning for Structured Data Modeling
3 PM - 6 PM, Feb. 23, 2022, PST, Virtually with AAAI [Slides]
The last few years have seen the rapid development of machine learning methods for modeling structured data coming from biology, chemistry, network science, natural language processing, and computer vision.
Recently-developed tools and cutting-edge methodologies coming from the theory of optimal transport, especially the models and the algorithms based on the Gromov-Wasserstein (GW) distance and its variants, have proved to be particularly successful for these tasks.
An impressive feature of these works is the application of new theoretical and computational techniques of Gromovized optimal transport for comparing probability distributions defined on spaces with complex structures, such as graphs, sets, kernels, Riemannian manifolds, and more metric spaces.
This tutorial aims to introduce the machine learning community at large to Gromov-Wasserstein learning (GWL) --- a new machine learning framework that has been proven to be very effective in structured data modeling but is not yet very well-known in the community. In this tutorial, I introduce (i) the theoretical fundamentals of GWL, including the definition of GW distance, its properties of GW distance and its connections to other optimal transport distance; (ii) the computational methods of GW distance and its variants, the GW-based machine learning models and algorithms; (iii) the innovative downstream applications of GWL and the challenges of optimal transport and structured data modeling. The content above relates to several areas within the AAAI community, such as machine learning and its applications, nonconvex optimization, stochastic algorithms, and graph modeling and analysis. More details can be found in the slides attached above.
I will first provide an introduction to the basic theory of optimal transport, starting from Wasserstein distance, Wasserstein barycenters, and their power on distribution matching and averaging. Beyond the basic theory, I will focus on the optimal transport between the distributions defined on incomparable spaces, and accordingly, I will introduce the Gromov-Wasserstein distance for metric measure spaces (mm-spaces) and the corresponding Gromov-Wasserstein barycenters, showing their feasibility and rationality for such challenging scenarios. Furthermore, I will consider the metric measure spaces with complex structures and define the GW distance and barycenters for the corresponding structured data like graphs (e.g., molecules, networks, and meshes). Finally, I will show the potentials of GW distance to graph matching and partitioning.
I will elaborate on GW distance and its typical variants proposed in recent years. For the classic GW distance, I will introduce its typical optimization algorithms, e.g., conjugated gradient, proximal gradient, Bregman ADMM, and so on. Additionally, the variants for the acceleration and the extension of GW distance are introduced, including low-rank GW distance, fused GW distance, hierarchical GW distance, sliced GW distance, unbalanced GW distance, etc.
In this part, I will leverage the GW-related distance to reformulate some existing machine learning methods and propose some new models and algorithms. In particular, we will introduce the applications of GW diatance to generative modeling (e.g., the GW-GAN for coupled generative models and the relationally-regularized Wasserstein autoencoder), graph representation method (e.g., GW factorization model), and graph generation (e.g., GW-based graphon estimator and graphon autoencoder). Finally, I will discuss some ongoing research and interesting directions in the study of GW distance-based machine learning.
Modeling and Applications for Temporal Point Processes
Junchi Yan, Hongteng Xu, Liangda Li
Real-world entities' behaviors, associated with their side information, are often recorded over time as asynchronous event sequences.
Such event sequences are the basis of many practical applications, neural spiking train study, earth quack prediction, crime analysis, infectious disease diffusion forecasting, condition-based preventative maintenance, information retrieval and behavior-based network analysis and services, etc.
Temporal point process (TPP) is a principled mathematical tool for the modeling and learning of asynchronous event sequences, which captures the instantaneous happening rate of the events and the temporal dependency between historical and current events.
TPP provides us with an interpretable model to describe the generative mechanism of event sequences, which is beneficial for event prediction and causality analysis.
Recently, it has been shown that TPP has potentials to many machine learning and data science applications and can be combined with other cutting-edge machine learning techniques like deep learning, reinforcement learning, adversarial learning, and so on.
In the first part of the tutorial, we will start with an elementary introduction of TPP model, including the basic concepts of the model, the simulation method of event sequences; in the second part of the tutorial, we will introduce typical TPP models and their traditional learning methods; in the third part of the tutorial, we will discuss the recent progress on the modeling and learning of TPP, including neural network-based TPP models, generative adversarial networks (GANs) for TPP, and deep reinforcement learning of TPP; in the final part, we will talk about the practical application of TPP, including useful data augmentation methods for learning from imperfect observations, typical applications and examples like healthcare and industry maintenance, and existing open source toolboxes.
Invited Talks and Posters
Optimal Transport-driven Implicit Neural Network Design (Invited Talk at Chinese University of Hong Kong (Shenzhen), November 2023) [Slides]
Optimal Transport-driven Implicit Neural Network Design (Invited Talk at Shanghai Jiao Tong University, October 2023) [Slides]
Gromov-Wasserstein Factorization Model for Graph Representation (China OT-ML Seminar, August 2022) [Slides]
Gromov-Wasserstein Learning for Graph Modeling (RUC-KAUST Joint Workshop on Advances in AI, November 2021) [Slides]
Gromov-Wasserstein Factorization Model for Graph Clustering (OT-TDA Workshop, July 2020) [Slides]
Recent Developments in Learning Hawkes Processes (Indiana University-Purdue University Indianapolis, USA, November 2017) [Slides]
Learning Granger Causality for Hawkes Processes (Poster, ITA Workshop, February 2017)
Point Processes and Their Applications (Shanghai Jiao Tong University, December 2016) [Slides]
Active Manifold Learning via Gershgorin Circle Guided Sample Selection (ICRA, May 2015)