
Ahead of the highly anticipated launch of its v4 model, DeepSeek has published research that could fundamentally reshape how large language models handle knowledge, and potentially sidestep the hardware constraints hampering Chinese AI development.
In a paper co-authored by DeepSeek CEO Liang Wenfeng, the research introduces “Engram”, a method that allows language models to retrieve knowledge through direct lookup rather than wasteful computation.
DeepSeek’s work matters because Chinese AI labs are looking for algorithmic efficiency, as they are running out of room to scale using brute-force compute due to US export controls on GPUs.
The paper explains that much of the GPU budget is spent reconstructing information that could be retrieved directly from memory or caches. As the authors put it, “Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation.”
While Mixture-of-Experts (MoE) models—such as Mistral’s Mixtral 8x7B, DeepSeek’s R1, and OpenAI’s GPT-4—achieve efficiency by activating only a subset of the neural network to process inputs, DeepSeek’s approach introduces conditional memory, allowing models to capture local, stereotyped patterns instantly. For example, for an LLM to process a static fact, such as querying for “New York City,” the model is forced to simulate retrieval through multiple layers of processing. Instead, Engram directly retrieves the city’s representation from a lookup table in real-time.
“Language modelling entails two qualitatively different sub-tasks: compositional reasoning and knowledge retrieval,” the authors write. “While the former demands deep, dynamic computation, a substantial portion of text—such as named entities and formulaic patterns—is local, static, and highly stereotyped.”
Conditional Memory Changes the Efficiency Equation
This architectural shift improves efficiency by offloading static patterns to lookup tables, freeing up high-bandwidth memory (HBM) that would otherwise be occupied by neural parameters performing redundant reconstruction.
The memory tables themselves can be stored in cheaper host memory and prefetched asynchronously. As hash-based lookups are deterministic, the system knows exactly what data will be needed and can reduce communication latency.
Compared to an identically sized MoE baseline model, Engram delivers solid gains on several knowledge, reasoning, and coding benchmarks.

These results suggest that reducing redundant computation does not merely preserve reasoning performance; it can also enhance it by reallocating compute towards genuinely dynamic tasks.
Why China’s Compute Gap Makes Engram Urgent
In an interview with Bloomberg, Justin Lin, head of Alibaba’s Qwen series, said there was less than a 20% chance that any Chinese company would leapfrog OpenAI or Anthropic with fundamental breakthroughs over the next three to five years. The report also quoted Tang Jie, founder and chief AI scientist of Z.ai, who warned that the gap between China and the US may be widening.
Both cited limited computing resources, US export controls on chips, and restrictions on adjacent equipment and software used in chip design and manufacturing as key constraints.
Gavin Leech, an AI researcher and co-author of The Scaling Era: An Oral History of AI, 2019–2025, put numbers to the problem. “This year, at the country level, they had 5–10x less compute than the Western labs, and so their models are probably undertrained,” he tells AIM.
Leech also points to growing gaps between headline benchmark scores and real-world robustness in Chinese LLMs. He argues that many perform well on older benchmarks where training overlap may exist, but struggle on refreshed tests that better probe generalisation—an issue he attributes to compute scarcity rather than architectural weakness.
Architecture as the Remaining Lever
Chinese AI companies have pursued three broad responses to the compute constraint: acquiring more foreign chips, developing domestic alternatives, and redesigning architectures. DeepSeek is betting on the third.
However, access to foreign GPUs remains contested.
At CES 2026, NVIDIA CEO Jensen Huang said customer demand in China was “high, quite high, and very high.” A Bloomberg report suggested Alibaba and ByteDance have explored orders exceeding two lakh H200 GPUs, though these plans remain subject to regulatory approval from Beijing, which reportedly issues permits only under “special circumstances,” such as academic research.
Domestic chips offer limited relief. While vendors such as Huawei, Cambricon, MetaX, and Iluvatar CoreX have made progress, their accelerators still lag NVIDIA’s high-end GPUs in raw compute throughput and memory bandwidth—constraints that most directly determine large-model training efficiency.
These gaps are compounded by Chinese fabs’ lack of access to advanced lithography equipment.
That leaves architectural innovation. Former OpenAI executive Yao Shunyu, now at Tencent, has urged the industry to focus on bottlenecks such as long-term memory and self-learning rather than scale alone.
Engram responds directly to this challenge by reducing reliance on expensive HBM through efficient, lookup-based memory systems.
It is among several core research initiatives DeepSeek has published over the past year, reflecting a deliberate focus on architectural and training innovations rather than incremental scaling alone.
While DeepSeek R1 disrupted both benchmarks and NVIDIA’s market cap, it brought GRPO (Group Relative Policy Optimisation), a new reinforcement-learning algorithm that improves reasoning capabilities with greater efficiency, effectively unlocking a new paradigm for AI architectures.
The biggest testament, which shows DeepSeek’s research paid off, is how it dominates open-weights model usage today. The model involves downloading a pre-trained AI model’s parameters and running it locally for customised applications.

Last November, DeepSeek published research on a model that achieved gold-medal-level performance at the International Math Olympiad 2025. The work addressed a growing concern in reasoning and math benchmarks, namely that many models arrive at correct answers without sound or inspectable reasoning. DeepSeek trained a dedicated verifier that scored proof quality rather than answers, and used it to guide a separate proof generator. The generator was rewarded only when it identified and corrected its own mistakes.
Earlier, the company also introduced V3.2-Exp, an experimental model designed to push long-context capabilities while keeping efficiency central, with 3.5x lower prefill costs and up to 10x cheaper decoding during inference for a 128k context window.
The pattern is consistent: DeepSeek is building architectures that extract more from less.
The post Decoding DeepSeek’s Solution to China’s Compute Shortage appeared first on Analytics India Magazine.