Tutorials

Advances in Optimal Transportbased Machine Learning
9:00 AM  12:30 PM, Aug. 19, 2023, UTC+8, Macau, IJCAI [Slides]
Hongteng Xu
The last few years have seen the rapid development of machine learning methods for natural language processing, computer vision, and scientific discovery.
Recentlydeveloped tools and cuttingedge methodologies from the theory of optimal transport (OT), especially the models and algorithms based on the optimal transport distance and its variants, have been particularly successful for these applications.
An impressive feature of these works is applying new theoretical fruits and computational techniques of various optimal transport distances for comparing probability measures defined on metric spaces with complex structures.
In the proposed tutorial, I plan to introduce the machine learning community to recent advances in computational optimal transport methods, their theoretical foundations, and the corresponding optimal transportbased learning paradigms, especially those proven to be very effective in challenging tasks but are not yet wellknown. In particular, I will introduce (i) the theoretical and computational fundamentals of optimal transport, from the formulations of classic optimal transport distance and its variants to their computation and acceleration methods; (ii) the optimal transportbased approaches for highdimensional data generation and structured data generation, respectively, covering all mainstream generative modeling paradigms (e.g., autoencoders and GANs) and classic nonparametric generative models (e.g., graphons); (iii) the applications of optimal transport to privacypreserving machine learning, including robust multimodal learning and distributed domain adaptation. The content above relates to several areas within the IJCAI community, such as statistical machine learning, optimization algorithms, and various cuttingedge machine learning applications, including multimodal representation learning, structured data (like graphs and point clouds) modeling, and highdimensional data generation.
Part 1: Computational Optimal Transport
I will first introduce the basic theory of optimal transport, starting from Wasserstein distance, Wasserstein barycenters, and their power on distribution matching and averaging. Beyond the basic theory, I will introduce typical optimization algorithms of the Wasserstein distance and its variants, e.g., Sinkhornscaling, proximal gradient, Bregman ADMM, etc. Additionally, the variants for the acceleration and the extension of Wasserstein distance are introduced, including lowrank Sinkhorn distance, sliced Wasserstein distance, etc.
Part 2: OTbased Generative Modeling
In this part, I will introduce mainstream generative modeling methods, including autoencoders (AEs) and generative adversarial networks (GANs), from the perspective of optimal transport. Wasserstein autoencoder, Wasserstein GAN, and their variants will be introduced and analyzed indepth. Besides the generative models of highdimensional data, I will further introduce the generative models of structured data like graphs based on the Gromovization of optimal transport. In particular, GromovWasserstein (GW) distance and barycenters will be introduced. Based on the GW distance, I will introduce the learning paradigms of nonparametric and parametric graph generators, respectively.
Part 3: OTbased Privacypreserving Machine Learning
In this part, I will introduce recent advances in privacypreserving machine learning achieved by optimal transportbased methods. Some representative OTbased learning methods will be shown, including the OTbased robust multimodal learning paradigms for unaligned data, e.g., the GromovWasserstein multimodal clustering method and the hierarchical optimal transport (HOT) multimodal classifier. These methods extend classic multikernel fusion (MKF) and canonical component analysis (CCA) to unaligned multimodal scenarios. Without using the correspondence among different modalities' samples, these methods can protect data privacy efficiently. Finally, I will introduce a decentralized optimization method of entropic optimal transport and apply it to distributed domain adaptation. This work explores the possibility of approximating Wasserstein distance in distributed systems and without data sharing.

GromovWasserstein Learning for Structured Data Modeling
3 PM  6 PM, Feb. 23, 2022, PST, Virtually with AAAI [Slides]
Hongteng Xu
The last few years have seen the rapid development of machine learning methods for modeling structured data coming from biology, chemistry, network science, natural language processing, and computer vision.
Recentlydeveloped tools and cuttingedge methodologies coming from the theory of optimal transport, especially the models and the algorithms based on the GromovWasserstein (GW) distance and its variants, have proved to be particularly successful for these tasks.
An impressive feature of these works is the application of new theoretical and computational techniques of Gromovized optimal transport for comparing probability distributions defined on spaces with complex structures, such as graphs, sets, kernels, Riemannian manifolds, and more metric spaces.
This tutorial aims to introduce the machine learning community at large to GromovWasserstein learning (GWL)  a new machine learning framework that has been proven to be very effective in structured data modeling but is not yet very wellknown in the community. In this tutorial, I introduce (i) the theoretical fundamentals of GWL, including the definition of GW distance, its properties of GW distance and its connections to other optimal transport distance; (ii) the computational methods of GW distance and its variants, the GWbased machine learning models and algorithms; (iii) the innovative downstream applications of GWL and the challenges of optimal transport and structured data modeling. The content above relates to several areas within the AAAI community, such as machine learning and its applications, nonconvex optimization, stochastic algorithms, and graph modeling and analysis. More details can be found in the slides attached above.
Part 1
I will first provide an introduction to the basic theory of optimal transport, starting from Wasserstein distance, Wasserstein barycenters, and their power on distribution matching and averaging. Beyond the basic theory, I will focus on the optimal transport between the distributions defined on incomparable spaces, and accordingly, I will introduce the GromovWasserstein distance for metric measure spaces (mmspaces) and the corresponding GromovWasserstein barycenters, showing their feasibility and rationality for such challenging scenarios. Furthermore, I will consider the metric measure spaces with complex structures and define the GW distance and barycenters for the corresponding structured data like graphs (e.g., molecules, networks, and meshes). Finally, I will show the potentials of GW distance to graph matching and partitioning.
Part 2
I will elaborate on GW distance and its typical variants proposed in recent years. For the classic GW distance, I will introduce its typical optimization algorithms, e.g., conjugated gradient, proximal gradient, Bregman ADMM, and so on. Additionally, the variants for the acceleration and the extension of GW distance are introduced, including lowrank GW distance, fused GW distance, hierarchical GW distance, sliced GW distance, unbalanced GW distance, etc.
Part 3
In this part, I will leverage the GWrelated distance to reformulate some existing machine learning methods and propose some new models and algorithms. In particular, we will introduce the applications of GW diatance to generative modeling (e.g., the GWGAN for coupled generative models and the relationallyregularized Wasserstein autoencoder), graph representation method (e.g., GW factorization model), and graph generation (e.g., GWbased graphon estimator and graphon autoencoder). Finally, I will discuss some ongoing research and interesting directions in the study of GW distancebased machine learning.

Modeling and Applications for Temporal Point Processes
8 AM  11 AM, Aug. 4, 2019, Anchorage, Alaska, USA, with KDD [Slides, Videos]
Junchi Yan, Hongteng Xu, Liangda Li
Realworld entities' behaviors, associated with their side information, are often recorded over time as asynchronous event sequences.
Such event sequences are the basis of many practical applications, neural spiking train study, earth quack prediction, crime analysis, infectious disease diffusion forecasting, conditionbased preventative maintenance, information retrieval and behaviorbased network analysis and services, etc.
Temporal point process (TPP) is a principled mathematical tool for the modeling and learning of asynchronous event sequences, which captures the instantaneous happening rate of the events and the temporal dependency between historical and current events.
TPP provides us with an interpretable model to describe the generative mechanism of event sequences, which is beneficial for event prediction and causality analysis.
Recently, it has been shown that TPP has potentials to many machine learning and data science applications and can be combined with other cuttingedge machine learning techniques like deep learning, reinforcement learning, adversarial learning, and so on.
In the first part of the tutorial, we will start with an elementary introduction of TPP model, including the basic concepts of the model, the simulation method of event sequences; in the second part of the tutorial, we will introduce typical TPP models and their traditional learning methods; in the third part of the tutorial, we will discuss the recent progress on the modeling and learning of TPP, including neural networkbased TPP models, generative adversarial networks (GANs) for TPP, and deep reinforcement learning of TPP; in the final part, we will talk about the practical application of TPP, including useful data augmentation methods for learning from imperfect observations, typical applications and examples like healthcare and industry maintenance, and existing open source toolboxes.
Invited Talks and Posters
GromovWasserstein Factorization Model for Graph Representation (China OTML Seminar, August 2022) [Slides]
GromovWasserstein Learning for Graph Modeling (RUCKAUST Joint Workshop on Advances in AI, November 2021) [Slides]
GromovWasserstein Factorization Model for Graph Clustering (OTTDA Workshop, July 2020) [Slides]
Recent Developments in Learning Hawkes Processes (Indiana UniversityPurdue University Indianapolis, USA, November 2017) [Slides]
Learning Granger Causality for Hawkes Processes (Poster, ITA Workshop, February 2017)
Point Processes and Their Applications (Shanghai Jiao Tong University, December 2016) [Slides]
Active Manifold Learning via Gershgorin Circle Guided Sample Selection (ICRA, May 2015)