Paper-Conference

Compressing Large Language Models by Joint Sparsification and Quantization

Jan 1, 2024

Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling

Dec 1, 2023

Lossy and Lossless (L2) Post-training Model Size Compression

Oct 1, 2023

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

Jun 1, 2023

Annealing-Based Label-Transfer Learning for Open World Object Detection

Jun 1, 2023

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

Jan 1, 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Jan 1, 2023

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Sep 27, 2022

QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

Jan 1, 2022

NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database

Jan 1, 2022