I put GitHub Copilot’s AI to the take a look at — its blended success at coding baffled me

The factor I discover most baffling in regards to the programming exams I've been operating is that instruments based mostly on the identical massive language mannequin are inclined to carry out fairly in another way.

Additionally: The most effective AI for coding in 2025 (and what to not use)

For instance, ChatGPT, Perplexity, and GitHub Copilot are all based mostly on the GPT-4 mannequin from OpenAI. However, as I'll present you beneath, whereas ChatGPT and Perplexity's professional plans carried out excellently, GitHub Copilot failed as usually because it succeeded.

I examined GitHub Copilot embedded inside a VS Code occasion. I'll clarify tips on how to set that up and use GitHub Copilot in an upcoming step-by-step article. However first, let's run by means of the exams.

If you wish to understand how I take a look at and the prompts for every particular person take a look at, be at liberty to learn how I take a look at an AI chatbot's coding capacity.

TL;DR: GitHub Copilot handed two and failed two.

Take a look at 1: Writing a WordPress Plugin

So, this failed miserably. This was my first take a look at, so I can't inform but whether or not GitHub Copilot is horrible at writing code or whether or not the context through which one interacts with it’s limiting to the purpose the place it may well't meet this requirement.

Let me clarify.

This take a look at includes asking the AI to create a totally purposeful WordPress plugin, full with admin interface components and operational logic. The plugin takes in a set of names, types them, and, if there are duplicates, separates the duplicates in order that they're not facet by facet.

Additionally: I tested DeepSeek's R1 and V3 coding skills — and we're not all doomed (yet)

This was a real-world software that my spouse wanted as a part of an involvement system she runs on her very lively Fb group as a part of her digital items e-commerce enterprise.

A lot of the different AIs handed this take a look at, a minimum of partly. 5 of the ten AI fashions examined handed the take a look at utterly. Three of them handed a part of the take a look at. Two (together with Microsoft Copilot) failed utterly.

The factor is, I gave GitHub Copilot the identical immediate I give all of them, nevertheless it solely wrote PHP code. To be clear, this downside will be solved solely utilizing PHP code. However some AIs like to incorporate some JavaScript for the interactive options. GitHub Copilot included code for utilizing JavaScript however by no means really generated the JavaScript that it tried to make use of.

What's worse, after I created a JavaScript file and, from inside the JavaScript file, tried to get GitHub Copilot to run the immediate, it gave me one other PHP script, which additionally referenced a JavaScript file.

As you’ll be able to see beneath, inside the randomizer.js file, it tried to enqueue (principally to usher in to run) the randomizer.js file, and the code it wrote was PHP, not JavaScript.

Take a look at 2: Rewriting a string perform

This take a look at is pretty easy. I wrote a perform that was supposed to check for {dollars} and cents however wound up solely testing for integers ({dollars}). The take a look at asks the AI to repair the code.

GitHub Copilot did rework the code, however there have been a bunch of issues with the code it produced.

It assumed a string worth was all the time a string worth. If it was empty, the code would break.
The revised common expression code would break if a decimal level (i.e., "3.") was entered, if a number one decimal level (i.e., ".3") was entered, or if main zeros had been included (i.e., "00.30").

For one thing that was supposed to check whether or not forex was entered accurately, failing with code that might crash on edge instances isn’t acceptable.

So, now we have one other fail.

Take a look at 3: Discovering an annoying bug

GitHub Copilot acquired this proper. That is one other take a look at pulled from my real-life coding escapades. What made this bug so annoying (and tough to determine) is that the error message isn't immediately associated to the precise downside.

Additionally: I put DeepSeek AI's coding skills to the test — here's where it fell apart

The bug is type of the coder equal of a trick query. Fixing it requires understanding how particular API calls within the WordPress framework work after which making use of that information to the bug in query.

Microsoft Copilot, Gemini, and Meta Code Llama all failed this take a look at. However GitHub Copilot solved it accurately.

Take a look at 4: Writing a script

Right here, too, GitHub Copilot succeeded the place Microsoft Copilot failed. The problem right here is that I'm testing the AI's capacity to create a script that is aware of about coding in AppleScript, the Chrome object mannequin, and a bit Mac-only third-party coding utility referred to as Keyboard Maestro.

Additionally: X's Grok did surprisingly well in my AI coding tests

To move this take a look at, the AI has to have the ability to acknowledge that every one three coding environments want consideration after which tailor particular person traces of code to every of these environments.

Remaining ideas

Provided that GitHub Copilot makes use of GPT-4, I discover the truth that it failed half of the exams discouraging. GitHub is nearly the preferred supply administration surroundings on the planet, and one would hope that the AI coding help was moderately dependable.

As with all issues AI, I'm positive efficiency will get higher. Let's keep tuned and verify again in a couple of months to see if the AI is more practical at the moment.

Do you employ an AI to assist with coding? What AI do you like? Have you ever tried GitHub Copilot? Tell us within the feedback beneath.

You’ll be able to observe my day-to-day venture updates on social media. Be sure you subscribe to my weekly replace e-newsletter, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Take a look at 1: Writing a WordPress Plugin

Take a look at 2: Rewriting a string perform

Take a look at 3: Discovering an annoying bug

Take a look at 4: Writing a script

Remaining ideas

Synthetic Intelligence