Paper-Conference

TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems

Jan 1, 2026

PiLLM: Resource-efficient LLM Inference Using Workload Prediction

Jan 1, 2026

Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Aug 1, 2025

AtomNet: Designing Tiny Models from Operators Under Extreme MCU Constraints

Apr 1, 2025

Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding

Jan 1, 2025

Tool Playgrounds: A Comprehensive and Analyzable Benchmark for LLM Tool Invocation

Jan 1, 2025

Robust long-tailed recognition with distribution-aware adversarial example generation

Jan 1, 2025

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

Jan 1, 2025

Past-Future Scheduler for LLM Serving under SLA Guarantees

Jan 1, 2025

OMNIBAL: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance

Jan 1, 2025