Recently, researchers from the School of Computer Science & Technology, Soochow University, released a research paper titled ‘MemLong: Memory-Augmented Retrieval for Long Text Modeling’, where they successfully extended an LLM’s context window from 2k to 80k tokens on two-year-old desktop-grade NVIDIA 3090 GPU.
This opens new horizons for users with limited hardware access and who still want to use AI applications locally on their computers.
Another highlight of this study was fine-tuning a 3 billion parameter version of MemLong on 0.5 billion tokens, which requires only eight 3090 GPUs for eight hours, showcasing high efficiency in resource use. This means it is not only useful for users who want to run AI applications but also for developers who want to train their models over midrange hardware.
Bensen Hsu, the founder of OpenRead, mentioned that the method was designed to enhance the capabilities of long-context language modelling by utilising an external retriever for historical information retrieval.
“The key idea is to store past contexts and knowledge in a non-trainable memory bank,” he added, suggesting how this approach uses a memory bank to store past contexts and knowledge that doesn’t change during training and ensures that the stored information remains consistent over time.
What Does MemLong Do Differently?
By freezing the lower layers of the model and only fine-tuning the upper layers, MemLong reduces computational costs. This approach allows for efficient training, requiring fewer resources while maintaining high performance.
“Keeping the lower layers fixed ensures that the foundational features remain consistent, which stabilises the training process. This stability is crucial for MemLong as it fine-tunes only the upper layers, ensuring that the model’s performance remains robust and reliable,” said Tiya Vaj, a research scholar in NLP, in her recent Medium post, suggesting why freezing lower layers is helpful.
Unlike previous models that suffer from distribution shifts when storing information in memory, MemLong maintains consistent information distribution, ensuring reliable performance across different contexts.
A Reddit user said that deep learning models which achieve impressive results on benchmarks can exhibit surprisingly poor real-world performance, usually due to distribution shifts. They also said that dealing with distribution shifts is a huge problem and that DL models can often end up learning spurious correlations.
This explains why having consistent information distributed across different contexts is important.
The research paper also mentioned that MemLong outperforms state-of-the-art models in long-context tasks, achieving up to a 10.2 percentage point improvement over models like OpenLLaMA in retrieval-augmented in-context learning tasks.
Another key part of this research is operating on semantic-level relevant chunks, enabling more coherent long-text modelling. By processing text at the chunk level, MemLong can maintain semantic coherence across long sequences, which is essential for tasks like document summarisation and dialogue systems.
Are We Still Lost in the Middle?
In 2022, there was a research paper titled ‘Lost in the Middle: How Language Models Use Long Context’. It suggested that the performance of LLMs can degrade significantly when relevant information is positioned in the middle of long contexts. So, how does MemLong solve this problem?
Instead of directly handling large text inputs, MemLong uses a retrieval mechanism to fetch historical information as key-value (K-V) pairs.
Apart from that, MemLong maintains consistent information distribution by using a fixed pre-trained model and a frozen retriever. This consistency prevents the distribution shifts that can occur when models are trained to handle long contexts, which was a concern with the previous models.
While increasing a context window from 2k to 80k is a big deal, especially on a years-old desktop-grade GPU, it will be interesting to see how MemLong performs over enterprise hardware.
The post Now You Can Train LLMs on a Two-Year-Old Desktop-Grade NVIDIA 3090 GPU appeared first on AIM.