Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding
Jan 1, 2025·,,,,,,,,,
,,,,·
0 min read
Siqi Wang
Hailong Yang
Xuezhu Wang
Tongxuan Liu
Pengbo Wang
Yufan Xu
Xuning Liang
Kejie Ma
Tianyu Feng
Xin You
Ruihao Gong
Rui Wang
Zhongzhi Luan
Yi Liu
Depei Qian
Type
Publication
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis