Hello, I am Yushen Zuo, currently a research intern in TACO group, TAMU working on Agentic Image Restoration under the guidance of Prof. Zhengzhong Tu and in close collaboration with Renjie Li.

Before that, I was a research assistant at The Hong Kong Polytechnic University (PolyU), working under the guidance of Prof. Kenneth K. M. Lam and in close collaboration with Jun Xiao.

Prior to this, I was an Applied Scientist at Microsoft. Before that, I was interned at Microsoft Research Asia and Tencent Youtu Lab. I hold a master’s degree from Tsinghua University and a bachelor’s degree from Xidian University.

I am actively seeking PhD and research job opportunities worldwide. My research areas including image / video generation; vision-language models; low-level vision; Agentic AI; object detection and segmentation; 3D vision.

Here is my CV. My email: zuoyushen12@gmail.com

Google citation:

🔥 News

2025.09: 🎉🎉 Our paper 4KAgent: Agentic Any Image to 4K Super-Resolution is accepted by NeurIPS 2025 (Code, Project Page and Huggingface Page).
2025.06: 🎉🎉 Our paper Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks is accepted by ICCV 2025 (Code and Huggingface Page).
2025.03: 🎉🎉 1st place in NTIRE 2025 Challenge on Short-form UGC Image Super-Resolution (4x) in CVPR 2025.
2025.01: 🎉🎉 Join Texas A&M University (TAMU) as a research intern, supervisor: Prof. Zhengzhong Tu.
2025.01: 🎉🎉 Our paper See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization is accpeted by ICASSP 2025.
2024.08: 🎉🎉 2nd place in AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content in ECCV 2024 and our method Fast Sequential Motion Diffusion (FSMD) is selected to present in the summary paper.
2024.08: 🎉🎉 Our paper Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning is accpeted by AI for Visual Arts Workshop and Challenges (AI4VA) in ECCV 2024.
2024.04: 🎉🎉 Join The Hong Kong Polytechnic University (PolyU) as a research assistant, supervisor: Prof. Kenneth K. M. Lam.
2022.07: 🎉🎉 Join Microsoft as an applied scientist and focus on recommendation system and large language model application in Bing.
2022.06: 🎉🎉 ‘Stars-of-tomorrow’ award of MSRA Internship Program.
2021.07: 🎉🎉 Join Microsoft Research Asia (MSRA) as a research intern in Multi-Modal Interaction (MMI) Group (directed by Dr. Qiang Huo) and cooperate with Azure OCR team for multi-directional table detection in PDF image.
2021.03: 🎉🎉 Rank 10 / 60 in NTIRE 2021 Challenge on Image Deblurring in CVPR 2021 and our method Visual Token Transformer for Image Restoration is selected to present in the summary paper.
2019.07: 🎉🎉 Our paper Low-resolution palmprint image denoising by generative adversarial networks is accepted by Neurocomputing 2019.

📝 Publications

NeurIPS 2025

4KAgent: Agentic Any Image to 4K Super-Resolution

Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

Project Page

ICCV 2025

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Jiawei Wang*, Yushen Zuo*, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng, Kin-Man Lam

(* denotes equal contribution)

ICASSP 2025

See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization

Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam

AI4VA@ECCV 2024

Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning

Yushen Zuo, Jun Xiao, Kin-Chung Chan, Rongkang Dong, Cuixin Yang, Zongqi HE, Hao Xie, Kin-Man Lam

AIM@ECCV 2024

AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Marcos V. Conde, Zhijun Lei, Wen Li, Christos Bampis, Ioannis Katsavounidis, Radu Timofte, Yushen Zuo et al.

CVPRW 2021

NTIRE 2021 Challenge on Image Deblurring

Seungjun Nah, Sanghyun Son, Suyoung Lee, Radu Timofte, Kyoung Mu Lee, Yushen Zuo et al.

Neurocomputing 2019

Low-resolution palmprint image denoising by generative adversarial networks

Shengjie Chen, Shuo Chen, Zhenhua Guo, Yushen Zuo

💻 Work and Research Experience

2024.04 - 2025.05, Research Assistant, The Hong Kong Polytechnic University (PolyU)
- Artificial Intelligence and Signal Processing Laboratory
  - Accelerated Diffusion for Image Processing (e.g., Style Transfer, Image Translation)
    - Focus on the stylization of multi-view images in 3D scenes and proposed OSDiffST, a novel style transfer method based on a one-step diffusion model.
    - Incorporate LoRA adapters to rapidly adapt the pre-trained diffusion model for style transfer. Propose a vision condition module for efficient style information extraction and injection.
    - Use two additional loss functions to align color distribution and improve structural similarity for enhancing visual quality and maintaining multi-view consistency across images from different viewpoints after stylization.
    - Research paper is accepted by the AI for Visual Arts Workshop and Challenges (AI4VA) in ECCV 2024.
    - We are now expanding our approach by designing new adapter and applying our framework to more image processing task (e.g., image translation).
  - Efficient Video Super-Resolution
    - Focus on real-time video super resolution.
    - Proposed Fast Sequential Motion Diffusion (FSMD) to achieve real time video super resolution.
    - 2nd place in AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content in ECCV 2024.
  - Novel view synthesis under sparse view with 3D Gaussian Splatting
    - Focus on enhancing 3D Gaussian Splatting for novel view synthesis under sparse view based on local depth and semantic regularization.
    - Our research paper is accepted by ICASSP 2025.
  - Image Processing and Diffusion in vision-language model safety and defense
    - Explores VLM safety and the effectiveness of diffusion in defense.
    - Research paper is planned to submit to CVPR 2025.
2022.08 - 2024.03, Applied Scientist, Microsoft
- Bing News - Recommendation system
  - Explainable AI
    - Use SHAP to calculate feature contribution to ranking score for a better explanation of model’s output.
    - Show users why he/she sees this recommended content based on the recall path with a mapping method.
    - Applied to all Bing News channels (e.g. Edge homepage), while collecting user’s feedback to modify the mapping method.
  - Dynamic quota allocation
    - Train a classification model to determine whether a news recommendation request is triggered by user or by prerender/other backend tasks based on request’s features and the corresponding user’s engagement features. (Result: AUC > 0.8 in test dataset built on Bing News Recommendation database)
    - Based on the result of classification model, reduce the quota of each recall path in Ranker for requests predicted to be `Not User-trigger’ to reduce computational cost.
    - Product performance: Reduce 20% computing resources usage without losing performance.
- Bing Whole Page - Large Language Model Application
  - Answer triggering in Bing Search - Real Estate Related
    - Use LLM (GPT-3.5) to label challenging samples from web result and get 1.3M new training samples.
    - Train answer triggering model based on new training set augmented with samples by LLM labeling.
    - Recall in test dateset improved from 0.54 to 0.73.
    - Product performance: 3% increase in answer trigger rate (answer triggers Bing real estate application) in Bing search, and 4.1K gain in DAU (Daily Active Users) of Bing real estate application.
2021.07 - 2022.07, Research Intern, Multi-Modal Interaction (MMI) Group, Microsoft Research Asia
- Rotated object detection (multi-directional table detection in PDF image)
  - Design an anchor-free two-stage detector for rotated object detection.
  - Design sequence-invariant loss and relative-offset for rotated object detector training.
  - Stable performance under different image rotation angles in production dataset (F-score fluctuation < 0.02).
  - Achieve state-of-the-art performance in production dataset and contribute to Azure OCR API (3B monthly activate user).
  - ‘Stars-of-tomorrow’ award of MSRA Internship Program.
2020.10 - 2021.05, Research Intern, Tencent Youtu Lab
- UniInst: Detection free and NMS free instance segmentation
  - Instance-aware One-to-one Assignment: Use Hungarian matching to assign the best matching feature point to the target as positive point according to the classification score and segmentation mask accuracy.
  - MaskIOU Branch: During training, learn to predict the IOU of the generated Mask. During inference, multiply it’s IOU prediction for generated masks with the classification score as the final confidence.
  - SOTA mask AP on COCO test-dev 2017 dataset and OCHuman dataset.
  - Patent: CN114332457A[P]
2020.05 - 2021.06, Postgraduate, Tsinghua University
- Visual Token Transformer for Image Restoration
  - First attempt to use visual token-based transformer in image restoration.
  - Neural network learn to divide images into different groups and map them to visual tokens without manual rules.
  - Design transformer block based on visual token to extract the non-local/multi-scale self-similarity of image.
  - Token-based transformer reduces computation cost from $O(n^{2})$ to $O(n)$ compared to vanilla transformer with comparable image restoration performance.
  - Included in NTIRE 2021 Challenge on Image Deblurring in CVPR 2021. (10 / 60)
  - Project report (Applied in various low-level vision tasks): Visual Token Transformer for Image Restoration.pdf.
2019.01 - 2019.06, Postgraduate, Tsinghua University
- Low Resolution Palmprint Image Denoising
  - Palmprint recognition methods are sensitive to image noise and need an effective denoising algorithm.
  - First attempt at end-to-end denoising of low-resolution palmprint images by neural networks.
  - Design a generative adversarial network (GAN)-based model to address multiple types of noise in palmprint image and reserve more orientation information with Gabor loss in training.
  - Collect Data from PolyU palmprint database and IITD database to build train/test dataset and generate noisy image by adding different types of noise.
  - Outperforms existing state-of-the-art methods in both image denoising quality and palmprint recognition accuracy in test dataset with different types of noise. Average EER (equal error rate) of palmprint recognition decreased from 10.841% to 1.532% after denoising.

🎖 Honors and Awards

2025.03 CVPR 2025 NTIRE Challenge on Short-form UGC Image Super-Resolution - 1st place
2024.08 AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content - 2nd place
2022.06 ‘Stars-of-tomorrow’ award of Microsoft Research Asia Intern Program
2021.03 CVPR 2021 NTIRE Image Deblurring Challenge - Track1. Low Resolution (10 / 60)
2021.01 Kaggle NFL 1st and Future - Impact Detection, Silver medal (23 / 459)
2020.12 Champion of the 1st Ocean Target Detection International Challenge (1 / 151)
2018.05 Meritorious winner in Interdisciplinary Contest in Modeling (ICM)

📖 Educations

2019.06 - 2022.06, Tsinghua University.
2015.09 - 2019.06, Xidian University.