Compressing Large Language Models by Joint Sparsification and QuantizationJan 1, 2024ยทJinyang Guo,Jianyu Wu,Zining Wang,Jiaheng Liu,Ge Yang,Yifu DingRuihao Gong,Haotong Qin,Xianglong Liuยท 0 min read Cite URLTypeConference paperPublicationForty-first International Conference on Machine LearningLast updated on Jan 1, 2024 AuthorsRuihao Gong ← TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models Jun 1, 2024PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis Jan 1, 2024 →