Publications

2025

Conference paper

AtomNet: Designing Tiny Models from Operators Under Extreme MCU Constraints

Proceedings of the AAAI Conference on Artificial Intelligence
Conference paper

Tool Playgrounds: A Comprehensive and Analyzable Benchmark for LLM Tool Invocation

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference paper

Robust long-tailed recognition with distribution-aware adversarial example generation

Neural Networks
Journal article

Pushing the Limit of Post-Training Quantization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference paper

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design
Conference paper

Past-Future Scheduler for LLM Serving under SLA Guarantees

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
Conference paper

OMNIBAL: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance

Proceedings of the 42nd International Conference on Machine Learning (ICML)
Conference paper

HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

Proceedings of the 42nd International Conference on Machine Learning (ICML)
Conference paper

DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models

Proceedings of the 42nd International Conference on Machine Learning (ICML)

2024

Conference paper

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Conference paper

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection

Proceedings of the AAAI Conference on Artificial Intelligence
Conference paper

Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

Proceedings of the AAAI Conference on Artificial Intelligence
Journal article

Rectify representation bias in vision-language models for long-tailed recognition

Neural Networks
Conference paper

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

The Twelfth International Conference on Learning Representations
Conference paper

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models

ACM Multimedia 2024
Conference paper

PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis

Proceedings of the 53rd International Conference on Parallel Processing
Conference paper

Compressing Large Language Models by Joint Sparsification and Quantization

Forty-first International Conference on Machine Learning

2023

Conference paper

Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Conference paper

Lossy and Lossless (L2) Post-training Model Size Compression

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference paper

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

Annealing-Based Label-Transfer Learning for Open World Object Detection

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

Proceedings of Machine Learning and Systems
Conference paper

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Proceedings of the 52nd International Conference on Parallel Processing
Journal article

Discrepant Semantic Diffusion Boosts Transfer Learning Robustness

Electronics

2022

Journal article

Distribution-Sensitive Information Retention for Accurate Binary Neural Network

International Journal of Computer Vision
Conference paper

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Thirty-Sixth Conference on Neural Information Processing Systems
Conference paper

QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

International Conference on Learning Representations
Conference paper

NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database

51 International Conference on Parallel Processing - ICPP
Conference paper

Generating Transferable Adversarial Examples against Vision Transformers

Proceedings of the 30th ACM International Conference on Multimedia

2021

Conference paper

Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference paper

MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference paper

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks
Conference paper

A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks Calibration

Proceedings of the 38th International Conference on Machine Learning
Conference paper

Diversifying Sample Generation for Accurate Data-Free Quantization

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Arxiv
Conference paper

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

International Conference on Learning Representations

2020

Conference paper

Towards Unified INT8 Training for Convolutional Neural Network

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

Forward and Backward Information Retention for Accurate Binary Neural Networks

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper

Balanced Binary Neural Networks with Gated Residual

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Conference paper

DMS: Differentiable Dimension Search for Binary Neural Networks

ICLR 2020 NAS Workshop
Conference paper

Extremely Low-Bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures

49th International Conference on Parallel Processing - ICPP
Manuscript

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Journal article

Binary neural networks: A survey

Pattern Recognition

2019

Conference paper

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

The IEEE International Conference on Computer Vision (ICCV)