Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization
Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar …
y.-c.-chen