Hongteng Xu | Research

LLM and AGI: Model Architecture Design and Applications (2023-Now)

Large language models (LLMs) like GPT and LLaMA series have shown strong generative power in NLP, which provide a potential way to artificial general intelligence. I am interested in the intrinsic mathematical mechanism behind the models and focusing on the design of next-generation model architecture, the compression, fusion, and adaptation of LLMs. In the aspect of applications, I am interested in AI4Math and AI4Science.

Shen Yuan*, Hongteng Xu*. Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks, arXiv [paper]
Shen Yuan, Haotian Liu, Hongteng Xu. Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation, arXiv [paper]

Computational Optimal Transport and Structured Data Modeling (2018-Now)

The theory of optimal transport helps us derive a series of interesting metrics for statistical distributions, such as Wasserstein distance, Gromov-Wasserstein distance and their variants. Essentially, the optimal transport defines a joint distribution of the entities in two domains, which can be used to solve many matching problems in the field of machine learning, e.g., domain adaptation, graph matching, recommendation systems, etc. Currently, I am studying advanced modeling and learning methods to achieve scalable and robust optimal transports between deterministic and/or stochastic entities. A systematic introduction about the proposed Gromov-Wasserstein learning can be found here.

Hongteng Xu, Jiachang Liu, Dixin Luo, Lawrence Carin. Representing Graphs via Gromov-Wasserstein Factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 [paper]
Hongteng Xu Dixin Luo, Lawrence Carin, Hongyuan Zha. Learning Graphons via Structured Gromov-Wasserstein Barycenters, AAAI Conference on Artificial Intelligence (AAAI), 2021 [paper, code]
Hongteng Xu. Gromov-Wasserstein Factorization Models for Graph Clustering, AAAI Conference on Artificial Intelligence (AAAI), 2020 [paper, code]
Hongteng Xu, Dixin Luo, Lawrence Carin. Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching, The Conference on Neural Information and Processing System (NeurIPS), 2019 [paper, code]
Hongteng Xu, Dixin Luo, Hongyuan Zha, Lawrence Carin. Gromov-Wasserstein Learning for Graph Matching and Node Embedding, The International Conference on Machine Learning (ICML), 2019 [paper, code]

Hypercomplex-based Machine Learning and Geometric Data Modeling (2018-Now)

For structured data like 3D point clouds, 3D meshes, and 3D molecules, we often want to obtain their representations in SE(3)-equivariant or SE(3)-invariant ways. Facing to these requirements, I developed a series of hypercomplex-based machine learning methods. In particular, I am interested in developing quaternion-based neural networks and leveraging their disentangled rotation-equivariance and rotation-invariance to model skeletons and molecules.

Shaofei Qin, Xuan Zhang, Hongteng Xu, Yi Xu. Fast Quaternion Product Units for Learning Disentangled Representations in SO(3), IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 [paper]
Xuan Zhang, Shaofei Qin, Yi Xu, and Hongteng Xu. Quaternion Product Units for Deep Learning on 3D Rotation Groups, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020 [paper, code]
Xuanyu Zhu*, Yi Xu*, Hongteng Xu*, Changjian Chen. Quaternion Convolutional Neural Networks, The European Conference on Computer Vision (ECCV), 2018 [paper]

Stochastic Point Processes on Graphs (2015-2023)

Real-world interactions among multiple entities, such as user behaviors in social networks, job hunting and hopping, and diseases and their complications, often exhibit self-triggering and mutually-triggering patterns. Temporal point processes, especially Hawkes processes and correcting processes, have a capability to capture the triggering patterns quantitatively. I focus on developing cutting-edge modeling and learning techniques for point process-based sequential data analysis, e.g., the Granger causality analysis and the clustering analysis of event sequences, the combination of deep learning and point processes, the robust predictive learning of point processes from imperfect observations.

Qingmei Wang, Minjie Cheng, Shen Yuan, Hongteng Xu. Hierarchical Contrastive Learning for Temporal Point Processes, AAAI Conference on Artificial Intelligence (AAAI), 2023
Hongteng Xu, Dixin Luo, Xu Chen and Lawrence Carin. Benefits from Superposed Hawkes Processes, The 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 2018 [paper]
Hongteng Xu and Hongyuan Zha. A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering, Annual Conference on Neural Information Processing Systems (NeurIPS), 2017 [paper, code]
Hongteng Xu, Mehrdad Farajtabar, Hongyuan Zha. Learning Granger Causality for Hawkes Processes, International Conference on Machine Learning (ICML), 2016 [paper, code]
Hongteng Xu. PoPPy: A Point Process Toolbox Based on PyTorch, arXiv [paper, code]
Hongteng Xu, Hongyuan Zha. THAP: A Matlab Toolkit for Learning with Hawkes Processes, arXiv [paper, code]

Manifold Landmarking and Denoising (2013-2016)

In many practical applications, high-dimensional data often have low-dimensional structure, which can be represented effectively in a low-dimensional latent space. Manifold learning aims to learn the mapping between the ambient space and the latent space explicitly or implicitly. My research work in this direction includes manifold-based data synthesis, clustering, landmarking methods and their applications to computer vision.

Hongteng Xu, Licheng Yu, Mark Davenport, Hongyuan Zha. A Unified Framework for Manifold Landmarking, IEEE Transactions on Signal Processing (TSP), 2018 [paper]
Hongteng Xu, Yang Zhou, Weiyao Lin, Hongyuan Zha. Unsupervised Trajectory Clustering via Adaptive Multi-Kernel-based Shrinkage, International Conference on Computer Vision (ICCV), 2015 [paper, code]
Hongteng Xu, Hongyuan Zha, Ren-Cang Li, Mark A. Davenport. Active Manifold Learning via Gershgorin Circle Guided Sample Selection, The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015 [paper, code]

Fractal-based Image Modeling and Analysis (2012-2017)

Fractal is a “Mathematical monster” that is unmeasurable in the measure space of its meta geometry. Fractal analysis has been widely used in computer vision, especially in texture image processing and texture analysis. The key concept of fractal-based image model is the fractal dimension, which is invariant to bi-Lipschitz transformation of image, and thus capable of representing intrinsic structural information of image robustly. I developed an image model based on local fractal analysis and applied it to low-level and middle-level vision tasks.

Hongteng Xu, Junchi Yan, Nils Persson, Weiyao Lin and Hongyuan Zha. Fractal Dimension Invariant Filtering and Its CNN-based Implementation, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2017 [paper, code]

Hongteng Xu, Guangtao Zhai, Xiaokang Yang. Single Image Super-resolution with Detail Enhancement based on Local Fractal Analysis of Gradient, IEEE Transactions on Circuit Systems for Video Technology (CSVT), 2013 [paper, code]

Structured Data-oriented Machine Learning and Applications