Why OpenAI’s Codex is Not as Good as Devin or Replit

If you happen to’re a software program engineer, indie hacker, or startup founder who’s spent the final yr tooling round with AI brokers like Replit’s Ghostwriter, Cognition’s Devin, or Lovable’s sensible terminals—nicely, OpenAI simply entered the sport, once more.

Over the weekend, OpenAI rolled out Codex, a cloud-based software program engineering agent that appears suspiciously like the way forward for dev work. It’s obtainable beginning for ChatGPT Professional, Group, and Enterprise customers at $200 a month, whereas it might take some time for the Plus customers to get entry.

Greg Brockman, co-founder of OpenAI, stated in the course of the dwell analysis preview that Codex is their guess on vibe coding. This comes simply days after OpenAI introduced its acquisition of Windsurf for $3 billion. Windsurf, a synthetic intelligence-assisted coding instrument previously referred to as Codeium, can also be a direct competitor to Cursor, which was additionally backed by OpenAI.

OpenAI is Vibing, However With out Web

Codex isn’t one other glorified autocomplete. It’s a multi-agent dev assistant that runs coding duties in parallel, inside sandboxed environments preloaded along with your repo, which sounds just like Devin, however OpenAI argues that it’s not.

In the course of the launch preview with Brockman, Katy Shi, one of many researchers at OpenAI, stated, “Codex is as reliable, if no more reliable than my coworkers.” Shi added that she may entry her coworkers’ logs without having to speak to them.

Shi meant that with Codex, builders can do work like writing new options, debugging, writing exams, or proposing pull requests—and it’ll do all of that whereas exhibiting you terminal logs, check outputs, and commit historical past, so that you don’t must belief it blindly.

This basically means GitHub PRs will be drafted, examined, and defined by a bot that lives inside ChatGPT, making it presumably higher than Devin.

However whereas Codex acts as an agent operating coding duties within the background on the cloud, Replit permits builders to deploy apps, whereas Devin is an end-to-end software program engineer.

Codex nonetheless has different limitations, and on this case, fairly huge ones. It’s not linked to the web, which makes it not a perfect selection over Devin. That is the most important criticism presently of the discharge and the explanation builders will not be adopting it of their workflow. Devin can also be in early entry.

It additionally wants well-scoped duties. It generally fails exams or will get confused. And it gained’t but deal with sprawling architectural selections by itself. However for repeatable engineering chores, it’s surprisingly succesful—and clear.

OpenAI conveniently calls this a analysis preview. Possibly the group will join it to the Web quickly. The ambitions are something however modest.

Codex is powered by codex-1, a variant of OpenAI’s o3 mannequin explicitly tuned for software program engineering. It was educated with reinforcement studying on hundreds of actual coding duties, making it eerily good at mimicking human dev kinds, coding conventions, and PR etiquette.

Devin, Cursor, Replit—Watch Your Backs

“Codex will increase the worth of being technical. If you happen to can describe exactly what you need to construct, you may get a large quantity accomplished in parallel,” posted Josh Tobin from OpenAI. “That’s basically a technical talent.”

However Cognition just lately introduced an replace to Devin, providing a brand new agent-native IDE expertise. Devin 2.0 helps a number of parallel cases, every with an interactive cloud-based IDE.

Moreover, the newest replace permits builders to take management whereas offering collaborative and absolutely automated approaches. Moreover, it permits builders to refine code and run exams inside the IDE.

Cognition AI additionally introduced extra options for Devin, together with Interactive Planning, Devin Search, and Devin Wiki. That is the place OpenAI’s Codex falls behind.

Inside ChatGPT, Codex is accessed through a sidebar. You create duties with prompts, click on “Code” to generate adjustments, or “Ask” to question your codebase. Very totally different from Cursor’s “tab tab tab” fashions, however just like Lovable and Replit.

Every activity will get its personal remoted atmosphere, the place Codex can edit information, run linters, check harnesses, and sort checkers. Relying on the complexity, finishing a activity can take wherever from 1 to half-hour. You possibly can monitor its progress in actual time.

It’s no coincidence that Codex appears to be desirous to eat the lunches of brokers like Devin, Cursor, and Replit’s AI instruments. All these startups have been vying to turn into the default AI coding companion. However with Codex, OpenAI is utilizing its distribution benefit—ChatGPT is already in hundreds of thousands of builders’ workflows.

As Santiago Valdarrama joked: “Actually everyone seems to be freaking out over Codex like they didn’t do the very same factor for Devin, Cursor, DeepSeek, and each GPT drop since 2.0… VCs will congratulate themselves and write posts about how Codex will allow the subsequent trillion-dollar market… till the subsequent shitty autocomplete drops.”

Codex is Good Sufficient for Now

Regardless of the sarcasm, there’s fact to the cycle. However Codex will not be autocomplete. At OpenAI itself, engineers are utilizing Codex to dump annoying chores like renaming variables, writing exams, and fixing bugs. “By lowering context-switching and surfacing forgotten to-dos, Codex helps engineers ship sooner and keep targeted on what issues most,” the corporate writes.

This man actually exhibits how you can construct an AI enterprise with Codex in 12 minspic.twitter.com/rQP3my0v6R

— Aadit Sheth (@aaditsh) Could 17, 2025

Codex isn’t being inbuilt a vacuum. Early testers like Cisco, Temporal, Superhuman, and Kodiak Robotics are already utilizing it.

Cisco is testing it throughout its engineering groups to speed up product improvement. Temporal makes use of it to debug, scaffold options, and keep in movement by offloading background work.

Superhuman has even let product managers use Codex to put in writing code, with engineers stepping in just for opinions. Kodiak, which builds autonomous driving tech, is utilizing it to enhance check protection and debug instruments and apparently to navigate obscure components of its stack.

Codex isn’t simply caught in ChatGPT both. OpenAI quietly launched Codex CLI final month—a terminal-based coding agent you possibly can run regionally. It brings the identical fashions (o3 and o4-mini) into your dev atmosphere.

Now, they’ve added codex-mini-latest, a light-weight model of codex-1 optimised for snappier Q&A and sooner enhancing contained in the CLI. OpenAI is handing out $5–$50 in free API credit for Codex CLI for Plus and Professional customers. No excuses to not strive it.

“We think about a future the place builders drive the work they need to personal and delegate the remaining to brokers,” OpenAI wrote. Builders must know what you need to construct, however you could by no means have to put in writing boilerplate once more.

Codex doesn’t kill Replit, Devin, or Lovable in a single day. However it does one thing rather more harmful—it units a brand new commonplace, however with out the web. Multi-agent, cloud-based, verifiable, and built-in into ChatGPT.

It’s the baseline now. Everybody else must catch up.

The put up Why OpenAI’s Codex is Not as Good as Devin or Replit appeared first on Analytics India Journal.