I examined DeepSeek’s R1 and V3 coding abilities — and we’re not all doomed (but)

DeepSeek exploded into the world's consciousness this previous weekend. It stands out for 3 highly effective causes:

It's an AI chatbot from China, reasonably than the US
It's open supply.
It makes use of vastly much less infrastructure than the large AI instruments we've been .

Given the US authorities's considerations over TikTok and potential Chinese language authorities involvement in that code, a brand new AI rising from China is certain to generate consideration. ZDNET's Radhika Rajkumar did a deep dive into these points in her article Why China's DeepSeek might burst our AI bubble.

Additionally: The perfect AI for coding in 2025 (and what to not use)

On this article, we're avoiding politics. As an alternative, I'm placing each DeepSeek V3 and DeekSeek R1 by the identical set of AI coding assessments I've thrown at 10 different massive language fashions. In response to DeepSeek itself:

Select V3 for duties requiring depth and accuracy (e.g., fixing superior math issues, producing complicated code).
Select R1 for latency-sensitive, high-volume purposes (e.g., buyer help automation, fundamental textual content processing).

You may select between R1 and V3 by clicking the little button within the chat interface. If the button is blue, you're utilizing R1.

The brief reply is that this: spectacular, however clearly not excellent. Let's dig in.

Take a look at 1: Writing a WordPress plugin

This take a look at was really my first take a look at of ChatGPT's programming prowess, approach again within the day. My spouse wanted a plugin for WordPress that will assist her run an involvement gadget for her on-line group.

Additionally: How to use ChatGPT to write code: What it does well and what it doesn't

Her wants have been pretty easy. It wanted to soak up an inventory of names, one identify per line. It then needed to type the names, and if there have been duplicate names, separate them so that they weren't listed side-by-side.

I didn't actually have time to code it for her, so I made a decision to offer the AI the problem on a whim. To my big shock, it labored.

Since then, it's been my first take a look at for AIs when evaluating their programming abilities. It requires the AI to know methods to arrange code for the WordPress framework and observe prompts clearly sufficient to create each the person interface and program logic.

Solely about half of the AIs I've examined can absolutely go this take a look at. Now, nonetheless, we are able to add yet another to the winner's circle.

DeepSeek V3 created each the person interface and program logic precisely as specified. As for DeepSeek R1, nicely that's an attention-grabbing case. The "reasoning" side of R1 triggered the AI to spit out 4502 phrases of research earlier than sharing the code.

The UI seemed totally different, with a lot wider enter areas. Nonetheless, each the UI and logic labored, so R1 additionally passes this take a look at.

To date, DeepSeek V3 and R1 each handed certainly one of 4 assessments.

Take a look at 2: Rewriting a string operate

A person complained that he was unable to enter {dollars} and cents right into a donation entry discipline. As written, my code solely allowed {dollars}. So, the take a look at includes giving the AI the routine that I wrote and asking it to rewrite it to permit for each {dollars} and cents

Additionally: My favourite ChatGPT function simply received far more highly effective

Often, this ends in the AI producing some common expression validation code. DeepSeek did generate code that works, though there’s room for enchancment. The code that DeepSeek V2 wrote was unnecessarily lengthy and repetitious whereas the reasoning earlier than producing the code in R1 was additionally very lengthy.

My greatest concern is that each fashions of the DeepSeek validation ensures validation as much as 2 decimal locations, but when a really massive quantity is entered (like 0.30000000000000004), using parseFloat doesn't have specific rounding data. The R1 mannequin additionally used JavaScript's Quantity conversion with out checking for edge case inputs. If unhealthy information comes again from an earlier a part of the common expression or a non-string makes it into that conversion, the code would crash.

It's odd, as a result of R1 did current a really good listing of assessments to validate in opposition to:

So right here, we’ve a cut up choice. I'm giving the purpose to DeepSeek V3 as a result of neither of those points its code produced would trigger this system to interrupt when run by a person and would generate the anticipated outcomes. However, I’ve to offer a fail to R1 as a result of if one thing that's not a string someway will get into the Quantity operate, a crash will ensue.

And that provides DeepSeek V3 two wins out of 4, however DeepSeek R1 just one win out of 4 to this point.

Take a look at 3: Discovering an annoying bug

This can be a take a look at created once I had a really annoying bug that I had issue monitoring down. As soon as once more, I made a decision to see if ChatGPT might deal with it, which it did.

The problem is that the reply isn't apparent. Really, the problem is that there’s an apparent reply, primarily based on the error message. However the apparent reply is the improper reply. This not solely caught me, however it repeatedly catches a number of the AIs.

Additionally: Are ChatGPT Plus or Pro worth it? Here's how they compare to the free version

Fixing this bug requires understanding how particular API calls inside WordPress work, having the ability to see past the error message to the code itself, after which realizing the place to search out the bug.

Each DeepSeek V3 and R1 handed this one with almost equivalent solutions, bringing us to 3 out of 4 wins for V3 and two out of 4 wins for R1. That already places DeepSeek forward of Gemini, Copilot, Claude, and Meta.

Will DeepSeek rating a house run for V3? Let's discover out.

Take a look at 4: Writing a script

And one other one bites the mud. This can be a difficult take a look at as a result of it requires the AI to grasp the interaction between three environments: AppleScript, the Chrome object mannequin, and a Mac scripting device known as Keyboard Maestro.

I might have known as this an unfair take a look at as a result of Keyboard Maestro just isn’t a mainstream programming device. However ChatGPT dealt with the take a look at simply, understanding precisely what a part of the issue is dealt with by every device.

Additionally: How ChatGPT scanned 170k traces of code in seconds, saving me hours of labor

Sadly, neither DeepSeek V3 or R1 had this stage of data. Neither mannequin knew that it wanted to separate the duty between directions to Keyboard Maestro and Chrome. It additionally had pretty weak data of AppleScript, writing customized routines for AppleScript which are native to the language.

Weirdly, the R1 mannequin failed as nicely as a result of it made a bunch of incorrect assumptions. It assumed {that a} entrance window all the time exists, which is unquestionably not the case. It additionally made the idea that the presently entrance working program would all the time be Chrome, reasonably than explicitly checking to see if Chrome was working.

This leaves DeepSeek V3 with three right assessments and one fail and DeepSeek R1 with two right assessments and two fails.

Closing ideas

I discovered that DeepSeek's insistence on utilizing a public cloud electronic mail deal with like gmail.com (reasonably than my regular electronic mail deal with with my company area) was annoying. It additionally had plenty of responsiveness fails that made doing these assessments take longer than I might have preferred.

I wasn't certain I'd have the ability to write this text as a result of, for a lot of the day, I received this error when attempting to enroll:

DeepSeek's on-line companies have just lately confronted large-scale malicious assaults. To make sure continued service, registration is briefly restricted to +86 telephone numbers. Present customers can log in as common. Thanks on your understanding and help.

Then, I received in and was in a position to run the assessments.

DeepSeek appears to be overly loquacious by way of the code it generates. The AppleScript code in Take a look at 4 was each improper and excessively lengthy. The common expression code in Take a look at 2 was right in V3, however it might have been written in a approach that made it far more maintainable. It failed in R1.

Additionally: If ChatGPT produces AI-generated code for your app, who does it really belong to?

I'm positively impressed that DeepSeek V3 beat out Gemini, Copilot, and Meta. However it seems to be on the outdated GPT-3.5 stage, which suggests there's positively room for enchancment. I used to be disenchanted with the outcomes for the R1 mannequin. Given the selection, I'd nonetheless select ChatGPT as my programming code helper.

That stated, for a brand-new device working on a lot decrease infrastructure than the opposite instruments, this could possibly be an AI to look at.

What do you suppose? Have you ever tried DeepSeek? Are you utilizing any AIs for programming help? Tell us within the feedback beneath.

You may observe my day-to-day venture updates on social media. Be sure you subscribe to my weekly replace e-newsletter, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Take a look at 1: Writing a WordPress plugin

Take a look at 2: Rewriting a string operate

Take a look at 3: Discovering an annoying bug

Take a look at 4: Writing a script

Closing ideas

Featured