Hello, I am Yushen Zuo, currently a research assistant at The Hong Kong Polytechnic University (PolyU), working under the guidance of Prof. Kenneth K. M. Lam and in close collaboration with Jun Xiao.
Concurrently, I am also a remote intern in TACO group, TAMU working on Agentic Image Restoration under the guidance of Zhengzhong Tu and in close collaboration with Renjie Li.
Prior to this, I was an Applied Scientist at Microsoft. Before that, I was interned at Microsoft Research Asia and Tencent Youtu Lab. I hold a master’s degree from Tsinghua University and a bachelor’s degree from Xidian University.
I am actively seeking PhD opportunities worldwide. My research areas include low-level vision; image generation; object detection and segmentation; 3D vision; vision-language model safety; Agentic AI; Test time scaling in Text-to-Image framework.
Here is my CV.
🔥 Ongoing Research Project
[1] Agentic Image Restoration: Leveraging agentic systems to address complex image restoration tasks. (Collaborators: Zhengzhong Tu, Renjie Li, TACO group, TAMU)
[2] Test Time Scaling in Advanced Text-to-Image framework: Exploring test-time scaling strategies within cutting-edge text-to-image generation frameworks. (Collaborators: Zhimin Li, Hunyuan-DiT team, Tencent)
🔥 News
- 2025.03: 🎉🎉 1st place in NTIRE 2025 Challenge on Short-form UGC Image Super-Resolution (4x) in CVPR 2025.
- 2025.03: 🎉🎉 Our paper Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks is submitted to ICCV 2025.
- 2025.01: 🎉🎉 Our paper on enhancing 3D Gaussian splatting for novel view synthesis is submitted to IEEE Transactions on Visualization and Computer Graphics (TVCG).
- 2025.01: 🎉🎉 Our paper See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization is accpeted by ICASSP 2025.
- 2024.08: 🎉🎉 2nd place in AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content in ECCV 2024 and our method Fast Sequential Motion Diffusion (FSMD) is selected to present in the summary paper.
- 2024.08: 🎉🎉 Our paper Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning is accpeted by AI for Visual Arts Workshop and Challenges (AI4VA) in ECCV 2024.
- 2024.04: 🎉🎉 Join The Hong Kong Polytechnic University (PolyU) as a research assistant.
- 2022.07: 🎉🎉 Join Microsoft as an applied scientist and focus on recommendation system and large language model application in Bing.
- 2022.06: 🎉🎉 ‘Stars-of-tomorrow’ award of MSRA Internship Program.
- 2021.07: 🎉🎉 Join Microsoft Research Asia (MSRA) as a research intern in Multi-Modal Interaction (MMI) Group (directed by Dr. Qiang Huo) and cooperate with Azure OCR team for multi-directional table detection in PDF image.
- 2021.03: 🎉🎉 Rank 10 / 60 in NTIRE 2021 Challenge on Image Deblurring in CVPR 2021 and our method Visual Token Transformer for Image Restoration is selected to present in the summary paper.
- 2019.07: 🎉🎉 Our paper Low-resolution palmprint image denoising by generative adversarial networks is accepted by Neurocomputing 2019.
📝 Publications

Jiawei Wang*, Yushen Zuo*, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng, Kin-Man Lam
(* denotes equal contribution)

Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam

Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning
Yushen Zuo, Jun Xiao, Kin-Chung Chan, Rongkang Dong, Cuixin Yang, Zongqi HE, Hao Xie, Kin-Man Lam

AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content
Marcos V. Conde, Zhijun Lei, Wen Li, Christos Bampis, Ioannis Katsavounidis, Radu Timofte, Yushen Zuo et al.

NTIRE 2021 Challenge on Image Deblurring
Seungjun Nah, Sanghyun Son, Suyoung Lee, Radu Timofte, Kyoung Mu Lee, Yushen Zuo et al.

Low-resolution palmprint image denoising by generative adversarial networks
Shengjie Chen, Shuo Chen, Zhenhua Guo, Yushen Zuo
💻 Work and Research Experience
- 2024.04 - Now, Research Assistant, The Hong Kong Polytechnic University (PolyU)
- Artificial Intelligence and Signal Processing Laboratory
- Accelerated Diffusion for Image Processing (e.g., Style Transfer, Image Translation)
- Focus on the stylization of multi-view images in 3D scenes and proposed OSDiffST, a novel style transfer method based on a one-step diffusion model.
- Incorporate LoRA adapters to rapidly adapt the pre-trained diffusion model for style transfer. Propose a vision condition module for efficient style information extraction and injection.
- Use two additional loss functions to align color distribution and improve structural similarity for enhancing visual quality and maintaining multi-view consistency across images from different viewpoints after stylization.
- Research paper is accepted by the AI for Visual Arts Workshop and Challenges (AI4VA) in ECCV 2024.
- We are now expanding our approach by designing new adapter and applying our framework to more image processing task (e.g., image translation).
- Efficient Video Super-Resolution
- Focus on real-time video super resolution.
- Proposed Fast Sequential Motion Diffusion (FSMD) to achieve real time video super resolution.
- 2nd place in AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content in ECCV 2024.
- Novel view synthesis under sparse view with 3D Gaussian Splatting
- Focus on enhancing 3D Gaussian Splatting for novel view synthesis under sparse view based on local depth and semantic regularization.
- Our research paper is accepted by ICASSP 2025.
- Image Processing and Diffusion in vision-language model safety and defense
- Explores VLM safety and the effectiveness of diffusion in defense.
- Research paper is planned to submit to CVPR 2025.
- Accelerated Diffusion for Image Processing (e.g., Style Transfer, Image Translation)
- Artificial Intelligence and Signal Processing Laboratory
- 2022.08 - 2024.03, Applied Scientist, Microsoft
- Bing News - Recommendation system
- Explainable AI
- Use SHAP to calculate feature contribution to ranking score for a better explanation of model’s output.
- Show users why he/she sees this recommended content based on the recall path with a mapping method.
- Applied to all Bing News channels (e.g. Edge homepage), while collecting user’s feedback to modify the mapping method.
- Dynamic quota allocation
- Train a classification model to determine whether a news recommendation request is triggered by user or by prerender/other backend tasks based on request’s features and the corresponding user’s engagement features. (Result: AUC > 0.8 in test dataset built on Bing News Recommendation database)
- Based on the result of classification model, reduce the quota of each recall path in Ranker for requests predicted to be `Not User-trigger’ to reduce computational cost.
- Product performance: Reduce 20% computing resources usage without losing performance.
- Explainable AI
- Bing Whole Page - Large Language Model Application
- Answer triggering in Bing Search - Real Estate Related
- Use LLM (GPT-3.5) to label challenging samples from web result and get 1.3M new training samples.
- Train answer triggering model based on new training set augmented with samples by LLM labeling.
- Recall in test dateset improved from 0.54 to 0.73.
- Product performance: 3% increase in answer trigger rate (answer triggers Bing real estate application) in Bing search, and 4.1K gain in DAU (Daily Active Users) of Bing real estate application.
- Answer triggering in Bing Search - Real Estate Related
- Bing News - Recommendation system
- 2021.07 - 2022.07, Research Intern, Multi-Modal Interaction (MMI) Group, Microsoft Research Asia
- Rotated object detection (multi-directional table detection in PDF image)
- Design an anchor-free two-stage detector for rotated object detection.
- Design sequence-invariant loss and relative-offset for rotated object detector training.
- Stable performance under different image rotation angles in production dataset (F-score fluctuation < 0.02).
- Achieve state-of-the-art performance in production dataset and contribute to Azure OCR API (3B monthly activate user).
- ‘Stars-of-tomorrow’ award of MSRA Internship Program.
- Rotated object detection (multi-directional table detection in PDF image)
- 2020.10 - 2021.05, Research Intern, Tencent Youtu Lab
- UniInst: Detection free and NMS free instance segmentation
- Instance-aware One-to-one Assignment: Use Hungarian matching to assign the best matching feature point to the target as positive point according to the classification score and segmentation mask accuracy.
- MaskIOU Branch: During training, learn to predict the IOU of the generated Mask. During inference, multiply it’s IOU prediction for generated masks with the classification score as the final confidence.
- SOTA mask AP on COCO test-dev 2017 dataset and OCHuman dataset.
- Patent: CN114332457A[P]
- UniInst: Detection free and NMS free instance segmentation
- 2020.05 - 2021.06, Postgraduate, Tsinghua University
- Visual Token Transformer for Image Restoration
- First attempt to use visual token-based transformer in image restoration.
- Neural network learn to divide images into different groups and map them to visual tokens without manual rules.
- Design transformer block based on visual token to extract the non-local/multi-scale self-similarity of image.
- Token-based transformer reduces computation cost from $O(n^{2})$ to $O(n)$ compared to vanilla transformer with comparable image restoration performance.
- Included in NTIRE 2021 Challenge on Image Deblurring in CVPR 2021. (10 / 60)
- Project report (Applied in various low-level vision tasks): Visual Token Transformer for Image Restoration.pdf.
- Visual Token Transformer for Image Restoration
- 2019.01 - 2019.06, Postgraduate, Tsinghua University
- Low Resolution Palmprint Image Denoising
- Palmprint recognition methods are sensitive to image noise and need an effective denoising algorithm.
- First attempt at end-to-end denoising of low-resolution palmprint images by neural networks.
- Design a generative adversarial network (GAN)-based model to address multiple types of noise in palmprint image and reserve more orientation information with Gabor loss in training.
- Collect Data from PolyU palmprint database and IITD database to build train/test dataset and generate noisy image by adding different types of noise.
- Outperforms existing state-of-the-art methods in both image denoising quality and palmprint recognition accuracy in test dataset with different types of noise. Average EER (equal error rate) of palmprint recognition decreased from 10.841% to 1.532% after denoising.
- Low Resolution Palmprint Image Denoising
🎖 Honors and Awards
- 2024.08 AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content - 2nd place.
- 2022.06 ‘Stars-of-tomorrow’ award of Microsoft Research Asia Intern Program.
- 2021.03 CVPR 2021 NTIRE Image Deblurring Challenge - Track1. Low Resolution (10 / 60)
- 2021.01 Kaggle NFL 1st and Future - Impact Detection, Silver medal (23 / 459)
- 2020.12 Champion of the 1st Ocean Target Detection International Challenge (1 / 151)
- 2018.05 Meritorious winner in Interdisciplinary Contest in Modeling (ICM)
📖 Educations
- 2019.06 - 2022.06, Tsinghua University.
- 2015.09 - 2019.06, Xidian University.