Blog

KV Cache Optimization

5 min read

Token selection, post-hoc compression, and architectural redesigns including MLA, GQA, and MQA