After sucking at math forever, OpenAI today announced that it has come up with a new technique called ‘process supervision,’ which improves mathematical reasoning.
This new method involves rewarding each step of reasoning as opposed to rewarding the correct final answer as that happens in ‘outcome supervision.’ Process supervision is said to boost performance, and also bring in alignment by training the model to produce a ‘chain-of-thought’ model closer to human thinking.
Reduces Hallucinations
With process supervision, OpenAI believes that hallucinations would be minimised to an extent. It said that each step in the process receives precise supervision which will lead to better outcomes.
Further, OpenAI said that the approach of rewarding aligned processes will lead to more ‘interpretable reasoning’ as the model is encouraged to follow a human-approved process. Simply put, interpretable reasoning, focuses on creating transparent models with clear explanations for their outputs.
As opposed to outcome supervision which may reward unaligned processes that are hard to detect, process supervision will not face such problems. The company has also said that addressing hallucinations is a critical step towards building aligned AGI.
Improves Math Results
Source: OpenAI
OpenAI, in its blog post said that it used the MATH test set to evaluate process-supervised and outcome-supervised reward models. Further, it said that multiple solutions for each problem were generated and the solution ranked highest by each reward model was picked. It was observed that not only did process-supervised reward models outperform outcome-supervised ones, but the performance gap widened as more solutions to a problem were considered indicating the reliability of the model.
By reducing hallucinations and bringing more alignment in their models, OpenAI is striving to inch closer to near perfection of chatbots with this method. However, the scope of these results for applying to other domains beyond math is still unknown. OpenAI believes that if process supervision is applied to other domains as well, we may get a method that is superior and more aligned than outcome supervision.
The post OpenAI Finally Cracks the Math Code appeared first on Analytics India Magazine.