Ruihao Gong
  • Bio
  • Papers
  • Projects
  • Award
  • Recent & Upcoming Talks
    • Example Talk
  • Publications
    • LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation
    • LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
    • MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
    • Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing
    • Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training
    • Incremental BPE Tokenization
    • Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
    • OmniFit: Bridging Modalities via Layer-Adaptive Token Compression for Omnimodal Large Language Models
    • Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals
    • PiLLM: Resource-efficient LLM Inference Using Workload Prediction
    • Post-Training Quantization for Video Matting
    • QVGen: Pushing the Limit of Quantized Video Generative Models
    • SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation
    • TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems
    • Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation
    • AtomNet: Designing Tiny Models from Operators Under Extreme MCU Constraints
    • A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
    • DA-KD: Difficulty-Aware Knowledge Distillation for Efficient Large Language Models
    • HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
    • Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
    • OMNIBAL: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance
    • Past-Future Scheduler for LLM Serving under SLA Guarantees
    • ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
    • Pushing the Limit of Post-Training Quantization
    • Robust long-tailed recognition with distribution-aware adversarial example generation
    • Temporal Feature Matters: A Framework for Diffusion Model Quantization
    • Tool Playgrounds: A Comprehensive and Analyzable Benchmark for LLM Tool Invocation
    • Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding
    • LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
    • Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
    • Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection
    • TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
    • Compressing Large Language Models by Joint Sparsification and Quantization
    • PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis
    • PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
    • QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
    • Rectify representation bias in vision-language models for long-tailed recognition
    • Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling
    • Lossy and Lossless (L2) Post-training Model Size Compression
    • Annealing-Based Label-Transfer Learning for Open World Object Detection
    • Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization
    • Discrepant Semantic Diffusion Boosts Transfer Learning Robustness
    • Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs
    • SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
    • Distribution-Sensitive Information Retention for Accurate Binary Neural Network
    • Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
    • Generating Transferable Adversarial Examples against Vision Transformers
    • NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database
    • QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
    • MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing
    • Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search
    • A Free Lunch From ANN: Towards Efficient, Accurate Spiking Neural Networks Calibration
    • MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
    • Diversifying Sample Generation for Accurate Data-Free Quantization
    • BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
    • RobustART: Benchmarking Robustness on Architecture Design and Training Techniques
    • Forward and Backward Information Retention for Accurate Binary Neural Networks
    • Rotation Consistent Margin Loss for Efficient Low-bit Face Recognition
    • Towards Unified INT8 Training for Convolutional Neural Network
    • Balanced Binary Neural Networks with Gated Residual
    • DMS: Differentiable Dimension Search for Binary Neural Networks
    • Binary neural networks: A survey
    • Efficient Bitwidth Search for Practical Mixed Precision Neural Network
    • Extremely Low-Bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures
    • Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
  • Projects
    • LightX2V
    • LLMC
    • LightLLM
    • LPCV 2023 Winner Solution
    • LPCV 2021 Winner Solution of FPGA Track
    • MQBench
  • Blog
    • 🎉 Easily create your own simple yet highly customizable blog
    • 🧠 Sharpen your thinking with a second brain
    • 📈 Communicate your results effectively with the best data visualizations
    • 👩🏼‍🏫 Teach academic courses
    • ✅ Manage your projects
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Jan 1, 2026·
Chengtao Lv
,
Yumeng Shi
,
Yushi Huang
Ruihao Gong
Ruihao Gong
,
Shen Ren
,
Wenya Wang
· 0 min read
Cite URL
Type
Conference paper
Publication
International Conference on Machine Learning (ICML)
Last updated on Jan 1, 2026
Ruihao Gong
Authors
Ruihao Gong

← Incremental BPE Tokenization Jan 1, 2026
OmniFit: Bridging Modalities via Layer-Adaptive Token Compression for Omnimodal Large Language Models Jan 1, 2026 →

© 2026 Ruihao Gong. This work is licensed under CC BY NC ND 4.0