Stanford University recently published a paper titled ‘Can LLMs Generate Novel Research Ideas?’ The study found that ideas generated by LLMs (large language models) were rated significantly more novel than those from human experts.
To reach this conclusion, over 100 NLP researchers were asked to come up with new ideas and review both LLM-generated and human-generated ideas without getting to know their source. The results showed that LLM ideas were considered more innovative (with statistical significance, p < 0.05), although they were rated slightly lower in terms of feasibility.
The approach is similar to Japanese AI startup Sakana AI’s AI Scientist, which automates the entire research lifecycle. It generates novel research ideas, writes necessary code, executes experiments, summarises results, visualises data, and presents its findings in a complete scientific manuscript.
Interestingly, the startup claimed that each idea is implemented and developed into a full paper at approximately $15 per paper.
Generating new ideas is relatively easy for LLMs, thanks to their extensive training on large datasets and ability to combine various concepts. However, they continue to face challenges with advanced reasoning.
Meanwhile, OpenAI is preparing to release its new model, Strawberry, which is expected to offer improved reasoning capabilities.
Chai Discovery, a biology startup founded by a former OpenAI employee, recently introduced Chai-1, an advanced foundation model that predicts molecular structures crucial for drug discovery. Innovations like these show that LLMs are close to driving significant research breakthroughs.
“The ability of LLMs to combine concepts from vast datasets in ways not typically thought of by humans can lead to ideas that are considered more novel. This might be because LLMs aren’t constrained by the same cognitive biases or conventional thinking patterns that humans have,” said DigitalVibes.ai founder Anthony Scaffeo.
He added that LLMs can make connections across different fields or unrelated data points, which might not be intuitive or immediately obvious to human experts.
“My student’s comment on the paper about LLMs generating more novel research ideas than humans is making the rounds. I think this says more about NLP researchers than about LLMs. Ouch,” joked Subbarao Kambhampati, professor of computer science at Arizona State University.
“I am not gonna let no LLM beat me in generating novel NLP research ideas!,” he quipped. Interestingly, Kambhampati has been quite vocal about LLMs being bad at reasoning and planning.
He said that models like GPT-3, GPT-3.5, and GPT-4 are poor at planning and reasoning, which he believes involves time and action. According to him, these models struggle with transitive and deductive closure, with the latter involving the more complex task of deducing new facts from the existing ones.
Can They Really Do Research?
Today several researchers have not experimented much with LLMs to generate new novel ideas, instead they have been predominantly using it to review the research papers.
Apparently, Meta AI chief Yann LeCun argues that while LLMs cannot reason and plan, they are still a good tool for reviewing papers. “Reviewers (as in human reviewers) should be able to use the tools they want to help them write reviews. The quality of their reviews should be assessed based on the result, not the process,” he said.
Meta AI launched Galactica, an LLM for research, in November 2022, just weeks before ChatGPT. However, it was taken down after three days due to criticism over generating misleading or offensive information. LeCun remains unhappy about it to this day.
However, not everyone agrees with LeCun.
“AI-generated reviews of scientific papers are increasing, vacuous, and need to be stopped quickly. They reduce the author’s trust in the review process. Proposal: someone who is judged to have submitted such a review is banned from submitting to the same conference/journal for two years,” said Micheal Black, director, Max Planck Institute for Intelligent Systems.
Adding to this perspective, Mukur Gupta, an applied scientist at Apple, recounts his frustrating experience with an LLM-generated review.
“I love AI as an assistant. But after getting an LLM-generated review for my NeurIPS paper last month (which was total crap and useless), I’m a little skeptical about AI discovering true novelty,” said Gupta.
He explained that LLMs could be a game-changer for interdisciplinary research or for uncovering new problems in fields—where human experts, limited by their working memory and attention span, may struggle to grasp more than a handful of domains.
“LLMs, with their ever-expanding knowledge base, offer the potential for cross-pollination of ideas.”
“But when it comes to deep, niche, and fundamental breakthroughs, I’m not buying it—hence my disappointment with that NeurIPS review,” he added.
Use AI to Brainstorm, Not Write
Lately, there has been a growing trend of researchers using LLMs to write papers. According to recent data, the use of the term ‘delve’ in the abstracts gradually increased through 2022, jumped noticeably in 2023 (when ChatGPT became widely available), and has continued to rise in 2024.
The future of research should be a collaboration between humans and LLMs to generate truly innovative ideas. According to Stanford’s paper, human ideas often prioritise feasibility and effectiveness over novelty and excitement, which can limit their creativity.
On the other hand, LLMs struggle to judge the quality of ideas. By combining the strengths of both humans and LLMs, we can pave the way for exciting research.
The post After Software Engineers, LLMs Are Coming After AI Researchers appeared first on AIM.