Paper-Conference

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

Jun 1, 2026

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

Jun 1, 2026

TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems

Jan 1, 2026

SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation

Jan 1, 2026

QVGen: Pushing the Limit of Quantized Video Generative Models

Jan 1, 2026

Post-Training Quantization for Video Matting

Jan 1, 2026

PiLLM: Resource-efficient LLM Inference Using Workload Prediction

Jan 1, 2026

Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

Jan 1, 2026

OmniFit: Bridging Modalities via Layer-Adaptive Token Compression for Omnimodal Large Language Models

Jan 1, 2026

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Jan 1, 2026