Tutorials

GromovWasserstein Learning for Structured Data Modeling
3 PM  6 PM, Feb. 23, 2022, PST, Virtually with AAAI [Slides]
Hongteng Xu
The last few years have seen the rapid development of machine learning methods for modeling structured data coming from biology, chemistry, network science, natural language processing, and computer vision.
Recentlydeveloped tools and cuttingedge methodologies coming from the theory of optimal transport, especially the models and the algorithms based on the GromovWasserstein (GW) distance and its variants, have proved to be particularly successful for these tasks.
An impressive feature of these works is the application of new theoretical and computational techniques of Gromovized optimal transport for comparing probability distributions defined on spaces with complex structures, such as graphs, sets, kernels, Riemannian manifolds, and more metric spaces.
This tutorial aims to introduce the machine learning community at large to GromovWasserstein learning (GWL)  a new machine learning framework that has been proven to be very effective in structured data modeling but is not yet very wellknown in the community. In this tutorial, I introduce (i) the theoretical fundamentals of GWL, including the definition of GW distance, its properties of GW distance and its connections to other optimal transport distance; (ii) the computational methods of GW distance and its variants, the GWbased machine learning models and algorithms; (iii) the innovative downstream applications of GWL and the challenges of optimal transport and structured data modeling. The content above relates to several areas within the AAAI community, such as machine learning and its applications, nonconvex optimization, stochastic algorithms, and graph modeling and analysis. More details can be found in the slides attached below.
Part 1
I will first provide an introduction to the basic theory of optimal transport, starting from Wasserstein distance, Wasserstein barycenters, and their power on distribution matching and averaging. Beyond the basic theory, I will focus on the optimal transport between the distributions defined on incomparable spaces, and accordingly, I will introduce the GromovWasserstein distance for metric measure spaces (mmspaces) and the corresponding GromovWasserstein barycenters, showing their feasibility and rationality for such challenging scenarios. Furthermore, I will consider the metric measure spaces with complex structures and define the GW distance and barycenters for the corresponding structured data like graphs (e.g., molecules, networks, and meshes). Finally, I will show the potentials of GW distance to graph matching and partitioning.
Part 2
I will elaborate on GW distance and its typical variants proposed in recent years. For the classic GW distance, I will introduce its typical optimization algorithms, e.g., conjugated gradient, proximal gradient, Bregman ADMM, and so on. Additionally, the variants for the acceleration and the extension of GW distance are introduced, including lowrank GW distance, fused GW distance, hierarchical GW distance, sliced GW distance, unbalanced GW distance, etc.
Part 3
In this part, I will leverage the GWrelated distance to reformulate some existing machine learning methods and propose some new models and algorithms. In particular, we will introduce the applications of GW diatance to generative modeling (e.g., the GWGAN for coupled generative models and the relationallyregularized Wasserstein autoencoder), graph representation method (e.g., GW factorization model), and graph generation (e.g., GWbased graphon estimator and graphon autoencoder). Finally, I will discuss some ongoing research and interesting directions in the study of GW distancebased machine learning.

Modeling and Applications for Temporal Point Processes
8 AM  11 AM, Aug. 4, 2019, Anchorage, Alaska, USA, with KDD [Slides, Videos]
Junchi Yan, Hongteng Xu, Liangda Li
Realworld entities' behaviors, associated with their side information, are often recorded over time as asynchronous event sequences.
Such event sequences are the basis of many practical applications, neural spiking train study, earth quack prediction, crime analysis, infectious disease diffusion forecasting, conditionbased preventative maintenance, information retrieval and behaviorbased network analysis and services, etc.
Temporal point process (TPP) is a principled mathematical tool for the modeling and learning of asynchronous event sequences, which captures the instantaneous happening rate of the events and the temporal dependency between historical and current events.
TPP provides us with an interpretable model to describe the generative mechanism of event sequences, which is beneficial for event prediction and causality analysis.
Recently, it has been shown that TPP has potentials to many machine learning and data science applications and can be combined with other cuttingedge machine learning techniques like deep learning, reinforcement learning, adversarial learning, and so on.
In the first part of the tutorial, we will start with an elementary introduction of TPP model, including the basic concepts of the model, the simulation method of event sequences; in the second part of the tutorial, we will introduce typical TPP models and their traditional learning methods; in the third part of the tutorial, we will discuss the recent progress on the modeling and learning of TPP, including neural networkbased TPP models, generative adversarial networks (GANs) for TPP, and deep reinforcement learning of TPP; in the final part, we will talk about the practical application of TPP, including useful data augmentation methods for learning from imperfect observations, typical applications and examples like healthcare and industry maintenance, and existing open source toolboxes.
Invited Talks and Posters
GromovWasserstein Factorization Model for Graph Representation (China OTML Seminar, August 2022) [Slides]
GromovWasserstein Learning for Graph Modeling (RUCKAUST Joint Workshop on Advances in AI, November 2021) [Slides]
GromovWasserstein Factorization Model for Graph Clustering (OTTDA Workshop, July 2020) [Slides]
Recent Developments in Learning Hawkes Processes (Indiana UniversityPurdue University Indianapolis, USA, November 2017) [Slides]
Learning Granger Causality for Hawkes Processes (Poster, ITA Workshop, February 2017)
Point Processes and Their Applications (Shanghai Jiao Tong University, December 2016) [Slides]
Active Manifold Learning via Gershgorin Circle Guided Sample Selection (ICRA, May 2015)