Hello, I am Yushen Zuo, currently a research assistant at The Hong Kong Polytechnic University (PolyU), working under the guidance of Prof. Kenneth K. M. Lam and in close collaboration with Jun Xiao.

Concurrently, I am also a remote intern in TACO group, TAMU working on Agentic Image Restoration under the guidance of Zhengzhong Tu and in close collaboration with Renjie Li.

Prior to this, I was an Applied Scientist at Microsoft. Before that, I was interned at Microsoft Research Asia and Tencent Youtu Lab. I hold a master’s degree from Tsinghua University and a bachelor’s degree from Xidian University.

I am actively seeking PhD opportunities worldwide. My research areas include low-level vision; image generation; object detection and segmentation; 3D vision; vision-language model safety; Agentic AI; Test time scaling in Text-to-Image framework.

Here is my CV.

Google citation:

🔥 Ongoing Research Project

[1] Agentic Image Restoration: Leveraging agentic systems to address complex image restoration tasks. (Collaborators: Zhengzhong Tu, Renjie Li, TACO group, TAMU)

[2] Test Time Scaling in Advanced Text-to-Image framework: Exploring test-time scaling strategies within cutting-edge text-to-image generation frameworks. (Collaborators: Zhimin Li, Hunyuan-DiT team, Tencent)

🔥 News

📝 Publications

ICCV 2025 Submission
sym

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Jiawei Wang*, Yushen Zuo*, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng, Kin-Man Lam

(* denotes equal contribution)

ICASSP 2025
sym

See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization

Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam

AI4VA@ECCV 2024
sym

Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning

Yushen Zuo, Jun Xiao, Kin-Chung Chan, Rongkang Dong, Cuixin Yang, Zongqi HE, Hao Xie, Kin-Man Lam

AIM@ECCV 2024
sym

AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Marcos V. Conde, Zhijun Lei, Wen Li, Christos Bampis, Ioannis Katsavounidis, Radu Timofte, Yushen Zuo et al.

CVPRW 2021
sym

NTIRE 2021 Challenge on Image Deblurring

Seungjun Nah, Sanghyun Son, Suyoung Lee, Radu Timofte, Kyoung Mu Lee, Yushen Zuo et al.

Neurocomputing 2019
sym

Low-resolution palmprint image denoising by generative adversarial networks

Shengjie Chen, Shuo Chen, Zhenhua Guo, Yushen Zuo

💻 Work and Research Experience

  • 2024.04 - Now, Research Assistant, The Hong Kong Polytechnic University (PolyU)
    • Artificial Intelligence and Signal Processing Laboratory
      • Accelerated Diffusion for Image Processing (e.g., Style Transfer, Image Translation)
        • Focus on the stylization of multi-view images in 3D scenes and proposed OSDiffST, a novel style transfer method based on a one-step diffusion model.
        • Incorporate LoRA adapters to rapidly adapt the pre-trained diffusion model for style transfer. Propose a vision condition module for efficient style information extraction and injection.
        • Use two additional loss functions to align color distribution and improve structural similarity for enhancing visual quality and maintaining multi-view consistency across images from different viewpoints after stylization.
        • Research paper is accepted by the AI for Visual Arts Workshop and Challenges (AI4VA) in ECCV 2024.
        • We are now expanding our approach by designing new adapter and applying our framework to more image processing task (e.g., image translation).
      • Efficient Video Super-Resolution
      • Novel view synthesis under sparse view with 3D Gaussian Splatting
        • Focus on enhancing 3D Gaussian Splatting for novel view synthesis under sparse view based on local depth and semantic regularization.
        • Our research paper is accepted by ICASSP 2025.
      • Image Processing and Diffusion in vision-language model safety and defense
        • Explores VLM safety and the effectiveness of diffusion in defense.
        • Research paper is planned to submit to CVPR 2025.
  • 2022.08 - 2024.03, Applied Scientist, Microsoft
    • Bing News - Recommendation system
      • Explainable AI
        • Use SHAP to calculate feature contribution to ranking score for a better explanation of model’s output.
        • Show users why he/she sees this recommended content based on the recall path with a mapping method.
        • Applied to all Bing News channels (e.g. Edge homepage), while collecting user’s feedback to modify the mapping method.
      • Dynamic quota allocation
        • Train a classification model to determine whether a news recommendation request is triggered by user or by prerender/other backend tasks based on request’s features and the corresponding user’s engagement features. (Result: AUC > 0.8 in test dataset built on Bing News Recommendation database)
        • Based on the result of classification model, reduce the quota of each recall path in Ranker for requests predicted to be `Not User-trigger’ to reduce computational cost.
        • Product performance: Reduce 20% computing resources usage without losing performance.
    • Bing Whole Page - Large Language Model Application
      • Answer triggering in Bing Search - Real Estate Related
        • Use LLM (GPT-3.5) to label challenging samples from web result and get 1.3M new training samples.
        • Train answer triggering model based on new training set augmented with samples by LLM labeling.
        • Recall in test dateset improved from 0.54 to 0.73.
        • Product performance: 3% increase in answer trigger rate (answer triggers Bing real estate application) in Bing search, and 4.1K gain in DAU (Daily Active Users) of Bing real estate application.
  • 2021.07 - 2022.07, Research Intern, Multi-Modal Interaction (MMI) Group, Microsoft Research Asia
    • Rotated object detection (multi-directional table detection in PDF image)
      • Design an anchor-free two-stage detector for rotated object detection.
      • Design sequence-invariant loss and relative-offset for rotated object detector training.
      • Stable performance under different image rotation angles in production dataset (F-score fluctuation < 0.02).
      • Achieve state-of-the-art performance in production dataset and contribute to Azure OCR API (3B monthly activate user).
      • ‘Stars-of-tomorrow’ award of MSRA Internship Program.
  • 2020.10 - 2021.05, Research Intern, Tencent Youtu Lab
    • UniInst: Detection free and NMS free instance segmentation
      • Instance-aware One-to-one Assignment: Use Hungarian matching to assign the best matching feature point to the target as positive point according to the classification score and segmentation mask accuracy.
      • MaskIOU Branch: During training, learn to predict the IOU of the generated Mask. During inference, multiply it’s IOU prediction for generated masks with the classification score as the final confidence.
      • SOTA mask AP on COCO test-dev 2017 dataset and OCHuman dataset.
      • Patent: CN114332457A[P]
  • 2020.05 - 2021.06, Postgraduate, Tsinghua University
    • Visual Token Transformer for Image Restoration
      • First attempt to use visual token-based transformer in image restoration.
      • Neural network learn to divide images into different groups and map them to visual tokens without manual rules.
      • Design transformer block based on visual token to extract the non-local/multi-scale self-similarity of image.
      • Token-based transformer reduces computation cost from $O(n^{2})$ to $O(n)$ compared to vanilla transformer with comparable image restoration performance.
      • Included in NTIRE 2021 Challenge on Image Deblurring in CVPR 2021. (10 / 60)
      • Project report (Applied in various low-level vision tasks): Visual Token Transformer for Image Restoration.pdf.
  • 2019.01 - 2019.06, Postgraduate, Tsinghua University
    • Low Resolution Palmprint Image Denoising
      • Palmprint recognition methods are sensitive to image noise and need an effective denoising algorithm.
      • First attempt at end-to-end denoising of low-resolution palmprint images by neural networks.
      • Design a generative adversarial network (GAN)-based model to address multiple types of noise in palmprint image and reserve more orientation information with Gabor loss in training.
      • Collect Data from PolyU palmprint database and IITD database to build train/test dataset and generate noisy image by adding different types of noise.
      • Outperforms existing state-of-the-art methods in both image denoising quality and palmprint recognition accuracy in test dataset with different types of noise. Average EER (equal error rate) of palmprint recognition decreased from 10.841% to 1.532% after denoising.

🎖 Honors and Awards

  • 2024.08 AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content - 2nd place.
  • 2022.06 ‘Stars-of-tomorrow’ award of Microsoft Research Asia Intern Program.
  • 2021.03 CVPR 2021 NTIRE Image Deblurring Challenge - Track1. Low Resolution (10 / 60)
  • 2021.01 Kaggle NFL 1st and Future - Impact Detection, Silver medal (23 / 459)
  • 2020.12 Champion of the 1st Ocean Target Detection International Challenge (1 / 151)
  • 2018.05 Meritorious winner in Interdisciplinary Contest in Modeling (ICM)

📖 Educations

  • 2019.06 - 2022.06, Tsinghua University.
  • 2015.09 - 2019.06, Xidian University.