Compressing Large Language Models by Joint Sparsification and Quantization

Jan 1, 2024·

Jinyang Guo

,

Jianyu Wu

,

Zining Wang

,

Jiaheng Liu

,

Ge Yang

,

Yifu Ding

Ruihao Gong

Ruihao Gong

,

Haotong Qin

,

Xianglong Liu

· 0 min read

Type

Conference paper

Publication

Forty-first International Conference on Machine Learning

Last updated on Jan 1, 2024

Ruihao Gong

Authors

← TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models Jun 1, 2024

PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis Jan 1, 2024 →