HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression

Yujin Wang, Shunan Dong, Yichen You, Huazhong Yang, Yongpan Liu, Hongyang Jia

Conference on Machine Learning and Systems 2025 · Day 3 · Session 5: LLM Training and Fine-Tuning

The talk "HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression" by Yujin Wang and colleagues from Tsinghua University addresses a critical bottleneck in the on-device fine-tuning of large language models (LLMs): the prohibitive memory consumption of buffer activations. While parameter-efficient fine-tuning (PEFT) methods like **LoRA** (Low-Rank Adaptation) and **QLoRA** have significantly reduced memory overhead associated with model weights and optimizer states, the memory footprint of activations required for backward propagation has emerged as the new dominant burden. HyC-LoRA proposes a novel, systematic approach to compress these buffer activations through a **hybrid compression mechanism** tailored to different activation types and distributions.

AI review

HyC-LoRA tackles a real and underappreciated problem — buffer activation memory during LoRA fine-tuning — with a technically coherent solution involving hybrid quantization, structured outlier handling, and LoRA-aware reorder computing. The engineering is legitimate and the results are credible. But this write-up reads like an expanded abstract rather than a talk review: it's all claims and architecture diagrams described in prose, with no code, no reproducibility path, and no clear signal that anyone outside the Tsinghua lab can actually run this. Solid systems work, but the gap between 'we…