Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding

Jan 1, 2025·

Siqi Wang

,

Hailong Yang

,

Xuezhu Wang

,

Tongxuan Liu

,

Pengbo Wang

,

Yufan Xu

,

Xuning Liang

,

Kejie Ma

,

Tianyu Feng

,

Xin You

Ruihao Gong

Ruihao Gong

,

Rui Wang

,

Zhongzhi Luan

,

Yi Liu

,

Depei Qian

· 0 min read

Type

Conference paper

Publication

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Last updated on Jan 1, 2025

Ruihao Gong

Authors

← Tool Playgrounds: A Comprehensive and Analyzable Benchmark for LLM Tool Invocation Jan 1, 2025

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit Nov 1, 2024 →