The Market Just Sold Micron on a Paper It Did Not Read
Eight days ago, Micron beat EPS estimates by 43%. Revenue nearly tripled year over year. Then Google published a research paper about compressing one subcategory of memory, and the stock dropped 5.7% in a single session. The market did not do the arithmetic. We did a deeper dive into this, and on Micron’s earnings in our AI/Semiconductor Research Community (Get 33% Off With This Link)
Google’s TurboQuant paper, presented at ICLR 2026, describes a method to compress something called KV cache, the working memory AI models use during inference, down to 3 bits. That yields a 6x reduction in KV cache memory. Financial media ran with it, and some even framed it as a 50% reduction in total memory costs. Memory stocks sold off across the board. Micron, Western Digital, SanDisk, even Seagate, a hard drive company with zero connection to KV cache.
The problem is that the paper never made that claim. KV cache is one slice of the total GPU memory. It does not touch model weights. It does not touch training. It does not touch activations. The paper tested only on small models, 8 billion parameters or fewer. No results exist for the 70B+ frontier models or mixture-of-experts architectures that drive the bulk of hyperscaler spending. [The headline wrote a check that the paper cannot cash.]
Here is where the math breaks the narrative. Micron CEO Sanjay Mehrotra, speaking on the Q2 2026 earnings call on March 18, said:
“Without more memory, without faster memory, AI just cannot scale up. Just look at from last year to this year, the DRAM requirement in the advanced AI accelerators has now doubled.”
DRAM content per accelerator is doubling every year. A one-time 6x compression of one memory subcategory does not offset 2x annual content growth that compounds year after year.
Think about that for a second. Even if you compress the KV cache by 6x today, the total DRAM each accelerator needs will be larger next year than it is right now. The compression buys you headroom. It does not reverse the trajectory.
Micron VP Satya Kumar added that systems shipping this year will have
“triple the amount of LPDDR compared to what we had last year.”
Cutting time to first token by 98%, he said, requires adding more memory, not less. Meanwhile, all three major HBM suppliers, Micron, SK Hynix, and Samsung, independently confirmed that HBM is sold out through 2026 with demand exceeding supply. Samsung told investors:
“Major customer demand for HBM in 2026 still exceeds available supply from us.”
That is three companies controlling nearly 100% of the market, all saying the same thing.
Lam Research CFO Douglas Bettinger put a dollar figure on it at the Cantor Fitzgerald Conference in March:
“WFE this year, $135 billion, up from $110 billion last year. It’s KV cache driving the need for DRAM.”
The equipment supplier building the factories sees KV cache as the primary demand driver. Not a risk. A driver. We read 24 management quotes from 11 companies over the past 90 days. Zero described memory compression as a risk to demand. Not one. The market invented a narrative that no industry participant supports. [That should tell you something.] Our AI/Semiconductor Research Community (Get 33% Off With This Link)
For investors watching the memory sector, the question is not whether KV cache compression works. It probably does, on a small scale. The question is whether it changes the supply-demand picture for 2026 and 2027. On a 2026 basis, it does not. HBM capacity is committed under contract. Equipment orders totaling $135 billion are locked in.
Even Jensen Huang at GTC 2026 said this about KV cache
"The large language models are going to get larger and larger and larger. It's going to generate more and more tokens more quickly, so it could think more quickly, but it also has to access memory. It's going to pound on memory really hard. KV Cache, structured data, cuDF, unstructured data, cuVS. It's going to be pounding on the storage system really, really hard, which is the reason why we reinvented the storage system."
A company reinventing the storage system is still placing massive orders of HBM and other memory solutions
SK Hynix called KV cache offloading
“essential to ensure smooth inference services.”
Astera Labs reported customers exploring CXL-based memory expansion specifically for KV cache applications. KV cache is creating new memory markets, not destroying existing ones. No hyperscaler, not Google, not Meta, not Microsoft, has told investors that KV cache compression reduces their infrastructure spend. Google created TurboQuant and has not mentioned it in a single investor forum.
The real signal for Micron investors is not TurboQuant. It is the capex raise to $25 billion or more, up from roughly $20 billion prior. That is real money funding HBM expansion. And the Summit Insights downgrade from Buy to Hold, one analyst against a consensus of 38 buys versus 2 sells. [The capex question deserves more attention than it got.]
Here is what could make this wrong. If independent researchers replicate TurboQuant’s zero accuracy loss on frontier-scale models, 70B parameters and above, the timeline to production adoption accelerates. What the synthesis estimates as 6 to 12 months could compress to 3 to 6 months. If hyperscalers deploy rapidly and cut HBM orders for 2027, the “sold out through 2026” defense has a built-in expiration date. Layer that on top of aggressive supply expansion from all three memory makers, and 2027 could look very different. The capex cycle turning is a real risk. TurboQuant on its own is not.
The market reacted to a headline about memory compression without checking what the paper actually compresses. The arithmetic says DRAM content growth absorbs it. Every management team in the supply chain says demand exceeds supply. That disconnect between narrative and data is the kind of gap this analysis exists to close.
---
If you found this useful, this is the surface.
[Join the community, 33% off at checkout](whatthechiphappened.com)
Get 15% OFF FISCAL.AI — WHERE CHARTS LIKE THIS CAN BE FOUND —



