OpenAI Finally Cracks the Math Code

After sucking at math forever, OpenAI today announced that it has come up with a new technique called ‘process supervision,’ which improves mathematical reasoning.

This new method involves rewarding each step of reasoning as opposed to rewarding the correct final answer as that happens in ‘outcome supervision.’ Process supervision is said to boost performance, and also bring in alignment by training the model to produce a ‘chain-of-thought’ model closer to human thinking.

Reduces Hallucinations

With process supervision, OpenAI believes that hallucinations would be minimised to an extent. It said that each step in the process receives precise supervision which will lead to better outcomes.

Further, OpenAI said that the approach of rewarding aligned processes will lead to more ‘interpretable reasoning’ as the model is encouraged to follow a human-approved process. Simply put, interpretable reasoning, focuses on creating transparent models with clear explanations for their outputs.

As opposed to outcome supervision which may reward unaligned processes that are hard to detect, process supervision will not face such problems. The company has also said that addressing hallucinations is a critical step towards building aligned AGI.

ChatGPT is the closest model we have that looks like AGI, but still suffers from hallucination and tends to write plausible-looking bs with uncanny confidence. It could be misleading and even dangerous. Can we fix it? There is a simple & intuitive remedy:🧵 pic.twitter.com/LtzgxLbnGf

— Jim Fan (@DrJimFan) December 5, 2022

Improves Math Results

Source: OpenAI

OpenAI, in its blog post said that it used the MATH test set to evaluate process-supervised and outcome-supervised reward models. Further, it said that multiple solutions for each problem were generated and the solution ranked highest by each reward model was picked. It was observed that not only did process-supervised reward models outperform outcome-supervised ones, but the performance gap widened as more solutions to a problem were considered indicating the reliability of the model.

By reducing hallucinations and bringing more alignment in their models, OpenAI is striving to inch closer to near perfection of chatbots with this method. However, the scope of these results for applying to other domains beyond math is still unknown. OpenAI believes that if process supervision is applied to other domains as well, we may get a method that is superior and more aligned than outcome supervision.

The post OpenAI Finally Cracks the Math Code appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...