In March this year, Cognition Labs released Devin, dubbed the ‘world’s first AI software engineer’. To put it briefly, Devin is a tool for developing code and building projects based on natural language inputs and user requests. It sounds similar to what Cursor and GitHub Copilot would do if they functioned as agents.
Over the last few months, we’ve seen a rise in AI code companion tools, from big names like Anthropic and GitHub Copilot to startups like Cursor. Devin, however, has been silent and done little to give competitors a run for their money.
This makes one wonder if Devin was incorrectly branded as the world’s first ‘AI software engineer’, a label that sparked a competitive frenzy among developers.
Developers have often faced the wrath of several unfortunate capitalistic consequences, like layoffs, hiring freezes, and startups filing for bankruptcy. An AI tool with an outright intent to replace them is the last thing they would need.
If You’re Good Enough, You Have Got to Show Up
While platforms like GitHub Copilot and Cursor let users access their capabilities in the real world, Devin requires users to submit a request to its team to receive access to the tool. It is still not open for public use. Portraying a challenge even before users tried the tool’s actual capabilities may not have worked to Devin’s advantage.
Cursor offers a free preview of its AI tool and lets users try the full version once they opt for a paid subscription. Developers worldwide quickly used the platform’s capabilities, and soon, Cursor garnered widespread praise from the AI community.
Cursor is a commendable example of a startup sending shivers down the spine of some of the well-established names in the industry, even if it was just for a short while.
Moreover, the bar for these companies is rising with every new development. In a podcast episode with computer scientist Lex Fridman, Cursor co-founder Michael Truell said, “You can wax poetic about moats and brand that, and this is our advantage. But I think, in the end, if you stop innovating on the product, you will lose.”
While the odds of success are already low in an ultra-competitive market, Cognition AI will have to release Devin soon if they are to stand any chance.
Devin is Capable, But Let Down by False Promises
Devin’s capabilities are still quite impressive, considering the initial claims over its potential. However, just a few months after its launch, users were quick to debunk its claims.
Carl Brown, of the YouTube channel Internet of Bugs, was able to detect several deceptive, misleading aspects within Devin, particularly regarding its claims of successfully performing tasks on Upwork.
The task demanded generating detailed instructions to set up a machine-learning model on AWS EC2. However, Devin’s output wasn’t aligned with any of its clients’ needs on the Upwork task.
“All I had to do to replicate Devin’s results was get an environment set up on a cloud instance with the right hardware and run two commands with the right paths,” said Carl.
Referring to Devin’s workflow, which involves fixing issues inside the files it created, rather than doing it with the original repository, he added, “All of this stuff makes it look like Devin did a bunch of work.”
Keeping aside false promises, we’ve seen time and again that it is incredibly hard for startups to compete against the giants in the world of AI.
For example, GitHub’s latest updates to the Copilot and Copilot Workspace are very likely to eat up competition. GitHub offers top-notch AI-based coding features within GitHub Copilot, a platform that’s home to a massive group of developers today.
If you are coding on a hobbyist level, OpenAI’s Canvas and Claude’s Artifacts also get the job done without the need for another AI app. To compel users to move out of their native environments, one needs to offer an extraordinary advantage while building similar features.
While this may be one of the reasons why interest in Devin AI has gradually faded, another key factor is that users can’t actually use it.
o1 Doesn’t Mean Much for Coding
In the latest update posted in September, Devin published test results with OpenAI’s o1-preview, and o1-mini. The results show impressive evaluation scores, and Cognition AI expects to boost Devin’s capabilities once it is integrated with o1.
That said, OpenAI’s o1 isn’t the best model to opt for tasks related to coding and programming. Sure, the models can reason but Claude 3.5 Sonnet is majorly regarded as the best model for coding.
This is also indicative of GitHub’s announcement to integrate Claude 3.5 Sonnet inside GitHub Copilot. If Cognition AI wants to increase its chances of creating a huge impact, perhaps turning its head towards Claude 3.5 Sonnet is a good bet. Claude 3.5 Sonnet can be used inside Cursor AI as well.
If there’s anything we’ve learnt about the AI ecosystem, it is not to write anyone off. For what it is worth, Devin may actually prove to be an impressive tool once it is out in the public.
Moreover, the man behind Cognition AI is a modern-day genius. Harvard graduate Scott Wu is a three-time gold medal winner at the International Olympiad of Informatics and held the third position in 2021’s edition of the prestigious Google Code Jam competition.
If Cognition AI plays its cards right, it might shake up some of the existing AI code companions today. This would be worth hoping for. For all things considered, the real winner is the user.
The post Where the Hell is Devin? appeared first on Analytics India Magazine.