Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

Jun 1, 2026ยท
Y.-C. Chen
,
C. P. Lee
Ze-Wei Liou
Ze-Wei Liou
,
N. Verma
ยท 0 min read
Abstract
Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms. We geometrically substantiate this by analyzing the coordination of projection weights: $W_K$ contrastively amplifies the vector, $W_Q$ aligns semantic tokens toward it, and $W_V$ projects it into the spectral null-space. Furthermore, we reveal that the model actively preserves these structural biases against Rotary Positional Embedding (RoPE) perturbations by localizing them in zones of rotational stability utilizing low-frequency bands and coherent channel pairs. Leveraging this, we propose INSERTQUANT, a post-training quantization (PTQ) framework that clamps spikes and restores their function via pre-computed template vectors.
Type
Publication
arXiv preprint arXiv:2606.02288
publications
Ze-Wei Liou
Authors
PhD student @ Princeton

๐Ÿ“ข Check out my new blog post! ReplaySSM

I am a first-year PhD student at Princeton, advised by Prof. Tri Dao. My research focuses on ML Systems.

My official name is Ze-Wei Liou, but I also go by the name “Johnny.” I did my undergrad in EE at National Taiwan University. Feel free to reach out!