Cerebras, the AI hardware and inference solution provider, announced a new technique called CePO – Cerebras Planning and Optimization that ‘drastically’ improves the reasoning capabilities of Meta’s Llama models.
Cerbras uses the much-coveted test time computation technique on the Llama 3.3 70B model, outperforming the Llama 3.1 405B model across several benchmarks while ‘maintaining interactive speeds of 100 tokens per second’. Cerebras has also unveiled a detailed technical documentation, outlining the capabilities of CePO.
“While models like OpenAI o1 and Alibaba QwQ have demonstrated the power of additional computation at inference time, CePO brings these capabilities to Llama – the world’s most popular open-source LLM family,” said Cerebras in the announcement.
Cerebras also compared its technique with GPT-4 Turbo and Claude 3.5 Sonnet, and it achieved ‘comparable performance’ in most benchmarks. However, there isn’t any comparison being made among the industry-leading reasoning model – OpenAI’s o1.
For example, the Llama 3.3 70B model scored 53.3% on the GPQA benchmark, whereas the o1 model scored a higher 76%. While OpenAI hasn’t revealed the number of parameters in the o1 model, it surely has, and significantly more than, 70B parameters.
“By bringing these capabilities to the Llama family of models, we’re democratizing access to sophisticated reasoning techniques previously limited to closed commercial systems,” said Andrew Feldman, CEO and Co-founder of Cerebras Systems.
Cerebras is also going to open-source the CePO framework. The company also aims to develop more ‘advanced prompting frameworks that leverage comparative reasoning’ and synthetic datasets that are optimised for inference time computing.
Cerebras is using the latest edition of Meta’s Llama, the Llama 3.3. Meta announced the model only a few days ago. According to Meta, the model delivers ‘leading performance’ in synthetic data generation, and the model also supports an expanded context length of 128k tokens.
A few days ago, Meta also unveiled a new ‘Chain of Continuous Thought’ technique, or COCONUT, that overcomes the limitations of the Chain of Thought, or CoT technique, where the explicit reasoning process is generated in natural language tokens.
Instead of making the model convert its internal thinking into words after each step, COCONUT uses its internal thinking as a starting point for the subsequent step.
Reasoning models are the next big thing in the ecosystem today. While OpenAI just unveiled the full version of the o1 model, they also have strong competition from the East. China’s DeepSeek R1 Lite supposedly offers better reasoning capability versus the o1 and is also available as an open-source model.
The post Cerebras CePo Brings Test Time Computation to Llama appeared first on Analytics India Magazine.