Ziyang Xu (徐子扬)

I am currently pursuing my first-year Ph.D. degree within the School of Electronic Information and Communications (EIC) at Huazhong University of Science and Technology (HUST, 华中科技大学), benefitting from the guidance of Professors Xinggang Wang and Wenyu Liu. Prior to this, I received my M.E. degree in Information and Communication Engineering from EIC, HUST (华中科技大学) in 2025, and my B.E. degree in Information Engineering from the School of Information Engineering (IE), Wuhan University of Technology (WHUT, 武汉理工大学) in 2022. My previous research includes image inpainting, multi-frame generation, video object detection, and representation learning and memory modeling of visual features. Currently, my research interests focus on generative AI.

Email  |  Google Scholar  |  Github  |  Bilibili  |  X  |  RedNote

Latest Update: June 24, 2026

profile photo

Highlight

As of June 24, 2026, my open-source code on GitHub has received 900+ stars in total, with a single project reaching 570+ stars; my highest single-paper journal impact factor has reached 52.5.
My image inpainting research series, including PixelHacker and Moebius, both achieved the No. 1 daily ranking on Hugging Face.
An early quantized image inpainting algorithm that I explored with VIVO AI Lab before Moebius/PixelHacker has been deployed in the photo albums of the VIVO X200 and iQOO 11 series phones, under AI Retouching → AI Eraser. Feel free to try it when taking photos outdoors ~ 😉
My multi-frame generation × medical DSA (Digital Subtraction Angiography) research series, represented by GenDSAv2 and including MoSt-DSA, GaraMoSt, GenDSA, and GenDSAv2, has been approved as part of a National Key R&D Program of China (China's highest level R&D project) and has entered large-scale clinical validation (Chinese Clinical Trial Registry: ChiCTR2400084789). More than 100,000 patients worldwide undergo DSA procedures every day; rigorous randomized controlled trial has shown that GenDSAv2 can help reduce radiation exposure for doctors and patients by two-thirds.

Research

My published research works revolve around the fields of Visual Generation (Image Inpainting, Multi-Frame Generation, Video Generation), Visual Perception (Video Object Detection), and AI for Science. Representative papers are highlighted. There are also some unpublished papers related to representation learning and memory modeling of visual features, so stay tuned. Currently, my research interests focus on generative AI. From a long-term and idealistic perspective, I'm very interested in Artificial General Intelligence.

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

European Conference on Computer Vision (ECCV), 2026
Paper / Code / Project Page

Kangsheng Duan*, Ziyang Xu*,†, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang
* Equal Contribution, Project Leader
This work was done when Ziyang Xu and Kangsheng Duan were interning at VIVO AI Lab.

Moebius continues our previous exploration in PixelHacker and asks a simple question beyond the scaling-law race: is making models larger the only path toward high-quality image inpainting? Targeting general-purpose object removal and image inpainting, Moebius jointly optimizes compact architecture design and knowledge distillation to break the usual trade-off among small parameter count, fast inference, and high-fidelity generation. With only 0.22B parameters, it matches or surpasses 10B-level industrial SOTA models such as FLUX.1-Fill-Dev across six natural and portrait benchmarks, while delivering more than 15x faster total inference. This work suggests that when the task is well defined, inpainting models can be smarter, lighter, and faster.

Generative AI-based Low-Dose Digital Subtraction Angiography for Intraoperative Radiation Dose Reduction: a Randomized Controlled Trial

Nature Medicine, 2025 (IF 52.5)
Paper / Code

Huangxuan Zhao*, Yaowei Bai*, Lei Chen*, Jinqiang Ma*, Yu Lei*, Tao Sun, Linxia Wu, Ruiheng Zhang, Ziyang Xu, Xiaoyun Liang, Yi Li, Yan Huang, Yun Feng, Cheng Hong, Zhongrong Miao, Lin Long, Haidong Zhu, Jiahe Zheng, Lin Fan, Zhuting Fang, Peng Dong, Lefei Zhang, Xiaoyu Han, Bin Wang, Bin Liang, Xiangwen Xia, Xuefeng Kan, Chengcheng Zhu, Bo Du, Xinggang Wang, Chuansheng Zheng
* Equal Contribution

Through retrospective analyses and iterative system upgrades, we developed GenDSA-v2, which is, to the best of our knowledge, the largest and most comprehensive low-dose imaging system to date. Leveraging a multi-center cohort of 50,000 patients from 86 hospitals worldwide, collected in collaboration with major international vendors (GE, Philips, and Siemens), we demonstrate that GenDSA-v2 achieves a threefold reduction in radiation dose without compromising diagnostic or interventional performance. Its clinical effectiveness and safety were further validated through animal studies and prospective evaluations, including cross-over observational trials and randomized controlled studies, confirming that GenDSA-v2 maintains sufficient image fidelity to visualize lesions as small as 1 mm.

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Conference on Neural Information Processing Systems (NeurIPS), 2025
Paper / Code / Project Page

Xiangyu Guo*, Zhanqian Wu*, Kaixin Xiong*, Ziyang Xu, Lijun Zhou, Gangwei Xu, Shaoqing Xu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wenyu Liu, Xinggang Wang
* Equal Contribution

Genesis is a unified framework for joint generation of multi-view driving videos and LiDAR sequences with spatio-temporal and cross-modal consistency. Genesis employs a two-stage architecture that integrates a DiT-based video diffusion model with 3D-VAE encoding, and a BEV-aware LiDAR generator with NeRF-based rendering and adaptive sampling. Both modalities are directly coupled through a shared latent space, enabling coherent evolution across visual and geometric domains. Extensive experiments on the nuScenes benchmark demonstrate that Genesis achieves SOTA performance across video and LiDAR metrics (FVD 16.95, FID 4.24, Chamfer 0.611), and benefits downstream tasks including segmentation and 3D detection, validating the semantic fidelity and practical utility of the generated data.

PixelHacker: Image Inpainting with Structural and Semantic Consistency

arXiv preprint, 2025
Paper / Code / Project Page

Ziyang Xu, Kangsheng Duan, Xiaolei Shen, Zhifeng Ding, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang
This work was done when Ziyang Xu and Kangsheng Duan were interning at VIVO AI Lab.

PixelHacker is a purely visual and general inpainting framework. Its core lies in a new paradigm named Latent Categories Guidance (LCG), which introduces category-level latent semantic distribution to guide latent space for structural and semantic consistent denoising. PixelHacker achieves 0.86B parameters—only 7% and 11% of Flux.1-Fill-dev and SD3.5 Large-Inpainting—running at 47 ms/step (512×512, single L40S GPU), over 3× faster than these baselines while maintaining SOTA performance across seven benchmarks on both natural and portrait scenes. Representatively, PixelHacker achieves FID 0.82 / LPIPS 0.088 on Places2 with 13% and 11% improvements, and FID 6.35 / LPIPS 0.229 on FFHQ with 43% and 15% improvements. This work establishes a principled and interpretable foundation for efficient, consistent, and general-purpose image inpainting.

GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images

AAAI Conference on Artificial Intelligence (AAAI), 2025
Paper / Code

Ziyang Xu, Huangxuan Zhao, Wenyu Liu, Xinggang Wang

Compared to our last job MoSt-DSA, GaraMoSt adds multi-granularity motion and structural feature modeling and modifies the overall Pipeline into a highly parallel design, which greatly improves the accuracy and reduces high-frequency and low-frequency noise under the same inference time level (for interpolating 3 frames, only increasing by 0.005s). Comprehensive beyond the SOTA natural scene, and DSA scene methods.

XS-VID: An Extremely Small Video Object Detection Dataset

arXiv preprint, 2024
Paper / Code(YOLOFT) / Dataset(XS-VID)

Jiahao Guo, Ziyang Xu, Lianjun Wu, Fei Gao, Wenyu Liu, Xinggang Wang

XS-VID dataset comprises aerial data from various periods and scenes, and extensively collects three types of objects with smaller pixel areas: extremely small (0-12^2), relatively small (12^2-20^2), and generally small (20^2-32^2). XS-VID offers unprecedented breadth and depth in covering and quantifying minuscule objects, significantly enriching the scene and object diversity in the dataset. YOLOFT enhances local feature associations and integrates temporal motion features, significantly improving the accuracy and stability of Small Video Object Detection.

Large-scale Pretrained Frame Generative Model Enables Real-Time Low-Dose DSA Imaging: an AI System Development and Multicenter Validation Study

Med (Cell Press), 2024 (IF 17)
Paper / Code

Huangxuan Zhao*, Ziyang Xu*, Linxia Wu*, Lei Chen*, Ziwei Cui, Jinqiang Ma, Tao Sun, Yu Lei, Nan Wang, Hongyao Hu, Yiqing Tan, Wei Lu, Wenzhong Yang, Kaibing Liao, Gaojun Teng, Xiaoyun Liang, Yi Li, Congcong Feng, Xiaoyu Han, P.Matthijs van der Sluij, Charles B.L.M. Majoie, Wim H. van Zwam, Yun Feng, Theo van Walsum, Aad van der Lugt, Wenyu Liu, Xuefeng Kan, Ruisheng Su, Weihua Zhang, Xinggang Wang, Chuansheng Zheng
* equal contribution

GenDSA is a large-scale pretrained multi-frame generative model-based real-time and low-dose DSA imaging system, which pre-trained, fine-tuned and tested on ten million of images from 35 hospitals. Suitable for most DSA scanning protocols, GenDSA could reduce the DSA frame rate (i.e., radiation dose) to 1/3 and generates video that was virtually identical to clinically available protocols. Videos generated by GenDSA reach a comparable level to the full-sampled videos, both in terms of overall quality (4.905 vs 4.935) and lesion assessment (4.825 vs 4.860), which fully demonstrated the potential of GenDSA for clinical applications.

MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images

European Conference on Artificial Intelligence (ECAI), 2024
Paper / Code

Ziyang Xu, Huangxuan Zhao, Ziwei Cui, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

MoSt-DSA is the first work that uses deep learning for DSA frame interpolation, comprehensively achieving SOTA in accuracy, speed, visual effect, and memory usage. Meanwhile, MoSt-DSA is also the first method that directly achieves any number of interpolations at any time steps with just one forward pass during both training and testing. If applied clinically, MoSt-DSA can significantly reduce the DSA radiation dose received by doctors and patients when applied clinically, lowering it by 50%, 67%, and 75% when interpolating 1 to 3 frames, respectively.

Internship

Research Intern, VIVO AI Lab
2024.10~2025.08, Hangzhou, China
Research on end-side multi-modal content understanding and generation

Competition Experience

🚩: Team leader; ranking first
🚩 The 20th Innovation Cup College Students Extracurricular Academic Science and Technology Works Competition (创新杯大学生课外学术科技作品竞赛), Special Prize (特等奖) in the Mathematical Information Group (数理信息组), 2020.
🏆 We get the only special prize (唯一特等奖) in the Mathematical Information Group.
🚩 Huawei Cup 19th China Graduate Mathematical Modeling Competition (华为杯中国研究生数学建模竞赛), National Second Prize (国家级二等奖), 2022.
🥈 Top 13% (前13%) of all contestants.
🚩 Artificial Intelligence Track of the 14th China College Student Computer Design Competition (中国大学生计算机设计大赛人工智能赛道), National Third Prize (国家级三等奖), 2021.
🚩 National College Student Mathematical Modeling Competition (全国大学生数学建模竞赛), Second Prize in Hubei Province (湖北省二等奖), 2020.
National College Student Integrated Circuit Innovation and Entrepreneurship Competition (全国大学生集成电路创新创业大赛), Third Prize in Hubei Province (湖北省三等奖), 2021.

Selected Honors & Awards

National Scholarship for Master Student (国家硕士生奖学金), 2024.
🏆 The most prestigious honor for university students in China, awarded to only 0.2% of candidates nationwide.
Served as team leader and led a provincial undergraduate innovation and entrepreneurship training project, "Research and Implementation of a Video-Based Vehicle-Assisted Driving Device"; received Excellent Completion (优秀结题) in 2021.
Outstanding Master's Graduate (优秀硕士毕业生), 2025.
Excellent Master's Student (三好硕士生), 2024.
Outstanding Undergraduate Graduate (优秀本科毕业生), 2022.
Excellent Undergraduate Student (三好本科生), 2021.

Academic Services

Reviewer of ECCV 2026/2024.
Reviewer of CVPR 2026.
Reviewer of IJCV 2024.
Reviewer of Image and Vision Computing, 2026.


Thanks to jonbarron's website template.