Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding

Jan 1, 2025·
Siqi Wang
,
Hailong Yang
,
Xuezhu Wang
,
Tongxuan Liu
,
Pengbo Wang
,
Yufan Xu
,
Xuning Liang
,
Kejie Ma
,
Tianyu Feng
,
Xin You
Ruihao Gong
Ruihao Gong
,
Rui Wang
,
Zhongzhi Luan
,
Yi Liu
,
Depei Qian
· 0 min read
Type
Publication
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis