• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Google's TurboQuant AI algorithm could slash memory usage by 6X

sleepeeg3

Supreme [H]ardness
Joined
Mar 4, 2004
Messages
5,797
Micron’s stock came under heavy selling pressure after Google Research unveiled TurboQuant, a quantization algorithm that can reduce large language model memory requirements by up to six times without accuracy loss. For a company thriving on AI-driven demand for high-bandwidth memory, this efficiency leap is seen as potentially undermining future sales volumes.

Why efficiency gains could reshape demand outlook​


Analysts warn that if AI developers can achieve the same performance with one-sixth the hardware, Micron’s pricing power and the ‘memory wall’ that has fueled its growth could erode. The risk is a cooling in the current AI infrastructure build-out phase, leading to oversupply if capacity expansion plans remain unchanged. While SK Hynix’s planned U.S. listing could divert investor funds in the short term, the larger threat is structural demand compression from such technological advances.
https://www.msn.com/en-us/news/other/google-ai-breakthrough-slams-memory-stocks/gm-GM5705FF34

Memory prices to decrease?
 
Or 6-10 millions effective token window (and the lower price) create new use case and enough new demands to make up for it... (from one shot whole small scene video generation that programming agents that can hold whole api/headers of all parts of large project together)
 
Last edited:
Are you telling me the prices were hiked so high that Google decided to invest R&D into reducing reliance on memory?
They always has been (same for Nvidia), lower bit quantisation was a big one of course lot of work went from 16bits to lower (8 to mixed that can go down very low), google is very often the one behind them, for example:

https://arxiv.org/pdf/2305.13245, https://www.ibm.com/think/topics/grouped-query-attention
https://www.searchenginejournal.com/google-infini-attention/514869/

they rely on cache a lot, paging and a long list of things, price of running models did not go down by a factor of 1,000 without a lot of those (~200x in just the last 3 years).
 
So even if this works out it won't get a cheaper DRAM.
because the supply of dram goes on ram for gpu and system ram using same wafer/same line it is quite interconnected, but this save memory on what ohld the kv cache which can spill into the system ram (and even flash on bluefield dpus, https://www.blocksandfiles.com/2026...ntext-memory-extension-infrastructure/4090541)

they use a tier system of memory because of how big they get and how hot/cold part of the context get:
4090661.jpg


this is for super computer
 
Oh, you think this means they use 6x less memory? No, it means 6x more slop produced!
I'm being optimistic, here. :) You're probably, right. Now you will be able to run the latest and great models on a quantized RTX 6000 Pro.
 
Now you will be able to run the latest and great models on a quantized RTX 6000 Pro.
it is only reducing k-cache memory footprint, not the weight/model itself and how much memory bandwith you need for speed at first (you can keep up the speed more as context growth too), it will be more letting you running similar model to now but with more interesting context size, people were often running on very small context window locally.
 

DRAM Manufacturer Stock Prices Dip Over Google TurboQuant Announcement

by btarunr Today, 07:30 Discuss (16 Comments)
Stock prices of DRAM manufacturers dipped by as much as 19% over the past 5 days, over the March 24 announcement of Google TurboQuant, a new technology that Google claims will reduce the memory footprint of AI models by a factor of 6, and improve inference speeds by a factor of 8. As of this writing, Micron Technology (NASDAQ: MU) dipped 19.5% over the last 5 days. Over in Korea, SK Hynix saw its stock price drop by 6%, while Samsung Electronics saw a dip by 5%.

TurboQuant is an advanced quantization algorithm developed by Google that delivers massive data compression for LLMs and vector search engines. It effectively tackles memory bottlenecks in the key-value cache and accelerates similarity lookups without sacrificing model accuracy. TurboQuant achieves this efficiency by combining two novel techniques: PolarQuant, which simplifies data geometry using polar coordinates to eliminate traditional memory overhead, and Quantized Johnson-Lindenstrauss (QJL), a 1-bit mathematical error-checker. Capable of compressing the key-value cache to just 3 bits without requiring fine-tuning, TurboQuant enables up to 8x faster runtimes on GPUs, establishing a new standard for AI efficiency.”
 
Been pretty hyped about this tech, glad to see it making progress. I'm hoping we see neural texture compression for games soon, its the only way I see the Steam Machine being relevant with only 8GB VRAM.
 
Been pretty hyped about this tech, glad to see it making progress. I'm hoping we see neural texture compression for games soon, its the only way I see the Steam Machine being relevant with only 8GB VRAM.

One issue with steam machine could being rdna 3 without the latest cooperative vectors official support and rdna 5 being apparently a big departure, would not be surprised if those things are not backported by them.

https://wccftech.com/amd-unveils-ra...st announced three,On Future RDNA GPUs & SoCs

Universal compression seem to be next-gen RDNA/ps6 talk for the moment.
 
Back
Top