Publications
2025
Conference paper
AtomNet: Designing Tiny Models from Operators Under Extreme MCU Constraints
Proceedings of the AAAI Conference on Artificial Intelligence
Conference paper
Tool Playgrounds: A Comprehensive and Analyzable Benchmark for LLM Tool Invocation
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference paper
Robust long-tailed recognition with distribution-aware adversarial example generation
Neural Networks
Journal article
Pushing the Limit of Post-Training Quantization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference paper
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design
Conference paper
Past-Future Scheduler for LLM Serving under SLA Guarantees
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
Conference paper
OMNIBAL: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance
Proceedings of the 42nd International Conference on Machine Learning (ICML)
Conference paper
HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
Proceedings of the 42nd International Conference on Machine Learning (ICML)
Conference paper
DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models
Proceedings of the 42nd International Conference on Machine Learning (ICML)
2024
Conference paper
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Conference paper
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection
Proceedings of the AAAI Conference on Artificial Intelligence
Conference paper
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
Proceedings of the AAAI Conference on Artificial Intelligence
Journal article
Rectify representation bias in vision-language models for long-tailed recognition
Neural Networks
Conference paper
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
The Twelfth International Conference on Learning Representations
Conference paper
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
ACM Multimedia 2024
Conference paper
PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis
Proceedings of the 53rd International Conference on Parallel Processing
Conference paper
Compressing Large Language Models by Joint Sparsification and Quantization
Forty-first International Conference on Machine Learning
2023
Conference paper
Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Conference paper
Lossy and Lossless (L2) Post-training Model Size Compression
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference paper
Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
Annealing-Based Label-Transfer Learning for Open World Object Detection
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
Proceedings of Machine Learning and Systems
Conference paper
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs
Proceedings of the 52nd International Conference on Parallel Processing
2022
Journal article
Distribution-Sensitive Information Retention for Accurate Binary Neural Network
International Journal of Computer Vision
Conference paper
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Thirty-Sixth Conference on Neural Information Processing Systems
Conference paper
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
International Conference on Learning Representations
Conference paper
NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database
51 International Conference on Parallel Processing - ICPP
Conference paper
Generating Transferable Adversarial Examples against Vision Transformers
Proceedings of the 30th ACM International Conference on Multimedia
2021
Conference paper
Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference paper
MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Conference paper
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks
Conference paper
A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks Calibration
Proceedings of the 38th International Conference on Machine Learning
Conference paper
Diversifying Sample Generation for Accurate Data-Free Quantization
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
International Conference on Learning Representations
2020
Conference paper
Towards Unified INT8 Training for Convolutional Neural Network
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
Forward and Backward Information Retention for Accurate Binary Neural Networks
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference paper
Balanced Binary Neural Networks with Gated Residual
International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Conference paper
DMS: Differentiable Dimension Search for Binary Neural Networks
ICLR 2020 NAS Workshop
Conference paper
Extremely Low-Bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures
49th International Conference on Parallel Processing - ICPP
2019
Conference paper
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
The IEEE International Conference on Computer Vision (ICCV)