‘Next Token Prediction’ Might Make PyTorch-like Frameworks Redundant

‘Next Token Prediction’ Might Make PyTorch-like Frameworks Redundant in the Future

LLMs are becoming the default solution for many of the problems that businesses and researchers face. Even when it comes to domains outside language and text, people have been experimenting with LLMs for guessing and predicting the next token, which sparks an interesting conversation around the need for other tools like PyTorch, as LLMs can just do the whole job in the future.

Interestingly, according to Andrej Karpathy, these models are more like tools designed to predict the next piece of a sequence, whether that’s words, images, or other types of information. This next token prediction framework can be a universal tool for solving a wide variety of problems, beyond text.

“If that is the case, it’s also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time,” said Karpathy.

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.
They…

— Andrej Karpathy (@karpathy) September 14, 2024

LLMs are not really just “language experts” anymore. According to Karpathy, The “language” part has become historical because these models were first trained to predict the next word in a sentence, but in reality, they can work on any kind of data that’s broken down into little pieces, called tokens.

Think of LLMs like a super-smart guessing game. This means, for example, if you’re building a car, a house, or an animal using Legos, you’re just putting blocks together, LLMs are like that, they don’t care if the tokens (blocks) represent words, images, or even molecules—they just focus on predicting what the next block should be based on what’s already there.

Another example can be Protein prediction models like AlphaFold and ESMFold, which are built on top of generative language models. Calling such intricate models just LLMs seems to be unjust for what they are capable of.

“What I’ve seen though is that the word “language” is misleading people to think LLMs are restrained to text applications,” said Karpathy in another thread.

“I don’t think this is true but I think it’s half true” – Karpathy

Probably, the name should change. “Definitely needs a new name. ‘Multimodal LLM’ is extra silly, as the first word contradicts the third word,” replied Elon Musk.

Meanwhile, Yann LeCun is more concerned about why this doesn’t make sense for all the types of problems. “It only works with discretized outputs (discrete symbols) and only makes sense with symbol sequences with a natural order (not images). Text, DNA, proteins, musical scores, etc. are discrete or easily discretized,” said LeCun.

For something like images, which are continuous and don’t naturally have a strict sequence of discrete symbols (each pixel doesn’t follow a clear ‘order’ like text), LLMs don’t work as naturally. To use an LLM for images, you would first need to somehow convert the image into discrete chunks (like dividing the image into small patches), but this doesn’t follow the same natural order that exists in text or DNA.

Agreeing with Karpathy, and a little with LeCun, Gary Marcus said that statistical modelling of token streams works well if reasoning or planning isn’t required.

Last month, Eliezer Yudkowsky also said that predicting the next token can solve almost all the well-posed problems. “literally any well-posed problem is isomorphic to ‘predict the next token of the answer’,” he said.

“Throwing an LLM at it”

The idea that many problems can be reduced to a token-stream prediction model is intriguing, especially since domains like images, audio, and even molecules can be broken down into sequences of tokens. This suggests that a unified approach like LLMs could handle diverse tasks, reducing the need for highly specialised architectures, such as PyTorch.

However, frameworks like PyTorch provide more than just flexibility in creating neural network models. They allow for a variety of deep learning operations that aren’t necessarily relevant for LLMs but are critical for other areas like reinforcement learning, generative models, and non-sequential tasks.

While it’s true that LLMs could dominate many applications, not every problem is best framed as “next token prediction.” We may see a simplification or specialisation of deep learning frameworks to accommodate the increasing dominance of LLM-based models. Still, the complete redundancy of frameworks like PyTorch might be too extreme of a prediction.

Probably, OpenAI’s newest model o1 gives a sense of why LLMs would be able to solve a lot of problems outside of the realm that is currently considered achievable. With the reasoning tokens in place, the model can go beyond just ‘predicting’ the next token, and giving reasons for why it did so.

PyTorch and similar frameworks may not become redundant but could evolve to become more focused on token-based models, while still offering tools for more diverse problems outside that paradigm.

Though, currently only in language or text format, the capabilities might extend beyond it soon. Calling LLMs as LLMs might be underrepresenting their capabilities. Moreover, “it just predicts the next token” is a thought-terminating cliche.

The post ‘Next Token Prediction’ Might Make PyTorch-like Frameworks Redundant appeared first on AIM.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...