2026 3rd International Conference on Advanced Image Processing Technology (AIPT 2026)

Prof. Zhonglong Zheng

Zhejiang Normal University, China

Zhonglong Zheng, PhD, Professor and Doctoral Supervisor, serves as Dean of the College of Computer Science (College of Artificial Intelligence), Zhejiang Normal University. He received his doctoral degree from Shanghai Jiao Tong University in 2005, and has been a visiting scholar at Zhejiang University and the University of California, USA. His research interests mainly include machine learning, computer vision, intelligent Internet of Things, and blockchain technology. He has presided over more than 20 research projects, including programs funded by the National Natural Science Foundation of China as well as key major provincial and ministerial projects. He has published over 100 papers in IEEE/ACM transactions and CCF Rank A/B conferences. He has won one First Prize of Zhejiang Provincial Science and Technology Progress Award and oneFirst Prize of Zhejiang Provincial Higher Education Teaching Achievement Award.

Speech Title: Robust Object Detection for UAV Remote Sensing Images in Complex Environments

Abstract: Visible-Infrared (RGB-IR) Unmanned Aerial Vehicle (UAV) object detection integrates complementary cues from visible and infrared sensors, offering broad application potential. However, due to sensor parallax, it still faces the challenge of weak spatial misalignment, which significantly limits its performance in UAV-based object detection. Existing methods emphasize strict alignment, overlooking spectral heterogeneity under varying illumination. To address these issues, we propose the Illumination Guided Implicit Alignment Network (IGIANet) to mitigate modality heterogeneity without explicit alignment. Specifically, we integrate three novel modules. First, we propose an illuminationguided frequency modulation module that adaptively allocates fusion weights to visible and infrared features based on global illumination estimation, effectively alleviating modality imbalance under varying lighting conditions. Second, we introduce a frequency-guided cross-modality differential enhancement module, which computes differential cues across frequency domains to enhance complementary information and highlight weakly aligned and low-contrast regions. Finally,we introduce an implicit alignment-driven dynamic fusion module that actively estimates offsets and generates dynamic, position-adaptive fusion kernels to align and fuse modalities. Extensive experiments demonstrate that IGIANet outperforms state-of-the-art models on various benchmarks.

杨杰.jpg

Prof. Jie Yang

Shanghai Jiao Tong University, China

Jie Yang received a bachelor’s degree in Automatic Control in Shanghai Jiao Tong University (SJTU), where a master’s degree in Pattern Recognition & Intelligent System was achieved three years later. In 1994, he received Ph.D. at Department of Computer Science, University of Hamburg, Germany. Now he is the Professor and Director of Institute of Image Processing and Pattern recognition in Shanghai Jiao Tong University. He is the principal investigator of more than 30 national and ministry scientific research projects in image processing, pattern recognition, data mining, and artificial intelligence. He has published six books，more than five hundreds of articles in national or international academic journals and conferences. Google citation over 27500，H-index 85. Up to now, he has supervised 5 postdoctoral, 46 doctors and 70 masters, awarded six research achievement prizes from ministry of Education, China and Shanghai municipality. He has owned 48 patents. Three Ph.D. dissertation he supervised was evaluated as “National Best Ph.D. Dissertation” in 2009, in 2017, in 2019. He has been chairman and keynote speaker of more than 10 international conferences. He is selected in the list of 2025 World Top 2% Career-long Impact Scientists issued by Stanford University and Elsevier.

Title: Researches on the Defenses and Out-of-Distribution Detection in Trustworthy Deep Learning

Abstract: The rapid advancement of deep learning has had a transformative effect on the development of technology and society across a multitude of sectors. In safety-critical contexts, the potential for neural network models to produce unreliable outputs in response to “malicious” or “unanticipated” inputs poses a severe risk. This talk delves into the output reliability from neural network models within the domain of trustworthy deep learning. 1) inputs that involve pixel perturbations, exemplified by adversarial examples,w.r.t the task of adversarial and certified robustness; 2) inputs that represent distribution shifts, exemplified by Out-of-Distribution (OoD) data, w.r.t the task of out-of-distribution detection. We introduce a novel strategy of model augmentation, adopt a multi-head neural network structure, and pose diversity constraints related to adversarial robustness into the model parameters.We adopt a multi-head neural network structure, use the ensemble of multiple heads in place of the ensemble of multiple neural networks,which significantly reduces the computational load in both training and certification phases. We propose that the non-linearity in InD and OoD data hinders PCA from learning a subspace that fully embodies their diversities. We propose a mode ensemble method that not only enhances detection performance but also significantly reduces the performance variance among independent modes.We propose performing linear dimension reduction on the gradient using a designated subspace that comprises principal components.

Papers about above topics have been published in TPAMI, IJCV, PR, NIPS in recent years.

范鹤鹤.png

Researcher Hehe Fan

Zhejiang University, China

Hehe Fan is a ZJU100 Young Professor at the School of Artificial Intelligence, Zhejiang University, and a recipient of the National Young Talent Program (2022). He received his Ph.D. from the University of Technology Sydney. Previously, he served as a Postdoctoral Research Fellow at the School of Computing, National University of Singapore, and as a Research Assistant at Carnegie Mellon University.

His research interests include computer vision, large models, embodied AI, and AI for Science & Engineering. He has served as an Area Chair for ACM MM and IEEE ICIP. He has published over 70 papers in top-tier journals and conferences such as TPAMI, IJCV, NeurIPS, ICLR, ICML, CVPR, and ICCV. Additionally, he has won three world championships in computer vision-related competitions and was named an Honorary Scholar of the 2024 Intel China Academic Talent Program.

Speech Title:Realizing General Embodied Intelligence Based on Hierarchical Architecture and Unified Protocols.

Abstract: Current Embodied AI (EI) is trapped in a dilemma of "reinventing the wheel," high costs, and ecological fragmentation resulting from the widespread adoption of the "vertical integration" model. To overcome technical barriers where the incompatibility rate of key module interfaces exceeds 90% , this study leverages the successful experience of computer system abstraction layers to propose a layered decoupling architecture. This framework consists of an Embodied Instruction Compiler for task planning based on GPT-4 , a Unified Embodied Machine Instruction Set (EMIS) serving as a standardized software-hardware interaction protocol , and a Diffusion-based motion executor for generating robust motion sequences. Practical demonstration via the BiBo prototype system and its InfiniEVA simulation evaluation system proves that establishing unified protocols and hierarchical systems can effectively decouple software from hardware, thereby accelerating the large-scale deployment of general-purpose and reusable Embodied AI systems.

叶茫.png

Prof. Mang Ye

Wuhan University

Mang Ye is a professor at the School of Computer Science, Wuhan University, and the chair of the Department of Intelligent Science. He was selected for the National Overseas High-Level Young Talent Program in 2021 and has been recognized as a Highly Cited Researcher by Clarivate. His long-term research focuses on multimodal computing, medical artificial intelligence, and related areas. As first or corresponding author, he has published more than 100 CCF-A papers, with over 15,000 citations on Google Scholar and a single paper cited more than 2,500 times. He serves as an editorial board member for CCF-A journals such as IEEE TIP and IEEE TIFS, and has held academic roles including area chair for conferences such as CVPR, ICLR, NeurIPS, ICML, and AAAI. He has led more than 10 research projects, including the NSFC–Hong Kong Joint Fund and key R&D programs of the Ministry of Science and Technology. He has been continuously listed among Stanford’s “World’s Top 2% Scientists” and has received honors such as Baidu AI Young Chinese Scholar.

Title:Multimodal Large Language Models: Continual Learning and Safe Tuning

Abstract:As Multimodal Large Language Models (MLLMs) demonstrate exceptional capabilities in understanding content across various modalities such as text and images, they have become a focal point of cutting-edge research in artificial intelligence. However, two primary challenges constrain their deployment and expansion in the dynamic real world. On one hand, models often forget previously learned knowledge when acquiring new information, which necessitates the ability for continual learning, much like humans. On the other hand, the vast number of parameters and substantial computational resource requirements present significant obstacles for adapting these models to different application scenarios, highlighting the critical importance of efficient tuning. This report will introduce our latest advancements in addressing these two challenges and provide an outlook on future research directions, aiming to offer insights for building the next generation of more flexible, scalable, and cost-effective artificial intelligence systems.