Sheng Cheng

I'm a PhD student at ASU. I'm working with Yezhou Yang, and co-advised by Yi Ren. I closely collaborate with Maitreya Patel, Changhoon Kim, Deqian Kong. Previously, I received my M.Eng. in Electrical Engineering from the University of Illinois at Urbana-Champaign, working with Ruoyu Sun, and received B.S. from Huazhong University of Science and Technology.

Research

I'm interested in computer vision and machine learning. My focus lies in the areas of vision & language (particularly in Text-to-Image generation), domain generalization & robustness, and AI in Science.

Publications

TripletCLIP
TripletCLIP: Improving Compositional Reasoning of CLIP via Vision-Language Negatives

Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang

NeurIPS 2024

Project Page / arXiv / Code

We enhance CLIP models by generating "hard" negative captions and images to improve their compositional reasoning ability.

Precision or Recall?
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model

Sheng Cheng, Maitreya Patel, Yezhou Yang

EMNLP 2024, Findings

arXiv / Code

We analyze the impact of precision and recall in human-annotated and synthetic captions on the training text-to-image models.

Latent Space Energy-based Neural ODEs
Latent Space Energy-based Neural ODEs

Sheng Cheng*, Deqian Kong*, Jianwen Xie, Kookjin Lee, Ying Nian Wu‡, Yezhou Yang‡

Preprint, 2024

arXiv

Integrating energy-based prior model with Neural ODEs for latent space continuous-time sequence data modeling, training using MLE with MCMC instead of inference network.

Revising Text-to-Image Prior
Revising Text-to-Image Prior for Improved Text Conditioned Image Generations

Maitreya Patel, Changhoon Kim, Sheng Cheng, Chitta Baral, Yezhou Yang

CVPR 2024

Project Page / Demo / arXiv / Code / Media coverage ( Twitter of AK, MarkTechPost, MultiPlatformAI, Video discussion, Paper Digest)

Improving the Parameter and Data Efficiency of the Text-to-Image Priors for UnCLIP Family Models with contrastive loss.

WOUAF
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang

CVPR 2024

Project Page / Demo / arXiv

Enabling the Integration of up to 32-bit (~4 billion) fingerprints into Text-to-Image Diffusion Models without loss in image quality.

Self-supervised Learning
Self-supervised Learning to Discover Physical Objects and Predict Their Interactions from Raw Videos

Sheng Cheng, Yezhou Yang, Yang Jiao, Yi Ren

NeurIPS AI4Science workshop, 2023

arXiv

Jointly learning to discover physical objects and predict their dynamics in the videos for physical environment.

Adversarial Bayesian Augmentation
Adversarial Bayesian Augmentation for Single-Source Domain Generalization

Sheng Cheng, Tejas Gokhale, Yezhou Yang

ICCV 2023

arXiv / Code

Adversarial Learning + Bayesian neural network for single-source domain generalization.

SSR-GNNs
SSR-GNNs: Stroke-based Sketch Representation with Graph Neural Networks

Sheng Cheng, Yi Ren, Yezhou Yang

CVPR Sketch workshop, 2022

arXiv / Code

Transformation invariant sketch recognition by decomposing to strokes and composing by graph neural network.

Data-Driven Learning
Data-Driven Learning of Three-Point Correlation Functions as Microstructure Representations

Sheng Cheng, Yang Jiao, Yi Ren

Acta Materialia, 2022

arXiv / Code

Learning the microstructure representation by 3-point correlation functions.

Evaluating Robustness
Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks

Yutian Pang, Sheng Cheng, Jueming Hu, Yongming Liu

CVPR Adversarial Machine Learning workshop, 2021

arXiv

Evaluating the robustness gain of Bayesian neural networks on image classification tasks.

Work Experience

Service & Honors