I spent hours testing ChatGPT Duties — and its refusal to observe instructions was mildly terrifying

Duties is a brand new beta characteristic for the paid-for variations of ChatGPT. This characteristic lets you schedule a immediate to run at a sure time. On this article, I'll clarify that characteristic. Then I'll take you thru the extremely irritating means of attempting to get ChatGPT to do what you need it to do utilizing Duties.

Additionally: The 5 largest errors individuals make when prompting an AI

I hesitate to anthropomorphize the AI, however on this spherical of testing, ChatGPT has been singularly uncooperative. Reasonably than whining about it right here, let's first dig into this new characteristic.

How duties work in ChatGPT

Duties are prompts which are triggered at a given time limit. They’ll happen as soon as or repeat. For instance, you may say, "At 10:30 a.m. tomorrow, inform me the present climate," and ChatGPT will course of the immediate "inform me the present climate" at 10:30 a.m. tomorrow and both show a browser notification (you probably have that enabled) and/or ship you an e mail.

To allow duties, you want a Plus (or higher) paid account to ChatGPT, and also you'll want to pick out the GPT-4o with scheduled duties mannequin. It additionally wouldn't harm to have therapist.

When you're in that mannequin, you may invoke the scheduling of duties in your immediate with one thing just like the "at" assertion or "schedule a activity" prefix. It looks as if ChatGPT does a good job of deciphering something that suggests a future time request as a activity.

Additionally: Why the 'Carry Your Personal AI' development might imply large hassle for enterprise leaders

I used to be capable of assign a activity in each the Mac app and the browser interface, however I used to be solely capable of see and handle current duties within the browser interface. Beneath the profile image on the proper of the display, you may choose Duties from the drop-down menu.

That brings you to a duties display the place you may see the duties you've scheduled and people which were accomplished.

Hovering over the time will reveal a pencil and three dots. Pause prevents a activity from working however leaves it out there to you. Delete removes it.

The pencil provides you an edit display that permits you to revise the duty earlier than it subsequent runs.

Right here you may rename the duty, edit the immediate, and alter its scheduling.

So far as I can inform, these options form of work pretty nicely in beta. I had one activity that by no means executed, and one other one which executed ten hours after it was imagined to, however most of them appear to have run as anticipated. I used to be capable of change the schedule and alter the immediate, so these options labored as nicely.

Gateway drug to agentic AI

At first look, including duties to ChatGPT appears pretty uninteresting. In any case, we've had very full and succesful activity managers for years. In truth, since ChatGPT Duties can solely notify you through a browser notification or an e mail, it's far much less useful than, say, a activity supervisor that reminds you to get white spray paint if you pull into the ironmongery store car parking zone.

However whereas Duties in ChatGPT does significantly lower than full-featured activity managers, it could actually additionally do extra. It could actually run an AI immediate. Which means it could actually take pretty clever motion routinely at a particular time or instances sooner or later.

Proper now, the motion is proscribed. It could actually course of a immediate, however its solely output is an e mail or browser notification. Nonetheless, it provides us an thought about how intelligence might be embedded right into a timed motion with what could be pretty little effort.

Additionally: Managing AI brokers as workers is the problem of 2025, says Goldman Sachs CIO

Besides, as I discussed earlier than, ChatGPT has been misbehaving throughout this complete experiment, which implies I spent greater than a day attempting to get the AI to cooperate.

See, right here's the factor. To display this, I didn't need to give ChatGPT a easy reminder to current. I wished to have it do one thing solely an AI might do, to point out how an AI performing a activity at a given time can be a substantial worth add over a scripting course of or simply line-item duties.

I do count on this to get higher over time. However for now, wow. After a day of this, I'm cranky!

Trying to get a every day information briefing

We've mentioned it earlier than and we'll focus on it once more. AIs wish to make stuff up. Additionally they observe instructions within the sense that they'll reply to prompts in ways in which appear authoritative and assured however are utterly or subtly improper.

I eat a variety of information. Each morning, I scan a ton of web sites and information sources to get a really feel for what's taking place on the earth. That is completely different from digging into press releases to see if there are any bulletins I need to take note of. What I would like very first thing is to get a taste for what's taking place on the market, what's large, and what might both catch the attention of my consideration or one thing I ought to pay attention to.

Additionally: The most effective AI for coding in 2025 (and what to not use)

In relation to ChatGPT Duties, I believed combining the agent service with ChatGPT internet looking had promise for this goal. It has promise. It simply refuses to do what I would like.

I attempted to get ChatGPT to offer me present information tales and sources. Generally, it simply made them up. Generally, it gave me sources and tales from a yr in the past. Generally it cited tales that supposedly got here from one website however got here from utterly completely different websites. Some hyperlinks that stated they had been about one matter really pointed someplace solely completely different.

And I actually tried. I attempted to get ChatGPT to validate its sources. I attempted to get it to double-check its work. I attempted to slim down its selections or present extra clear and particular directions. I labored it.

Additionally: I purchased an iPhone 16 for its AI options, however I haven't used them even as soon as — right here's why

My conclusion is that this: ChatGPT is ready to search the net. And it is ready to discover some subjects. However in order for you immediately's information and also you need it verifiable (by way of it being an precise story with an precise hyperlink), ChatGPT will not be prepared for prime time.

Producing a customized climate briefing

My subsequent try was to get a every day climate briefing. Once more, I wished one thing greater than only a fast climate report. I’ve a climate widget on my desktop and might see the climate particulars at any time when I would like.

As an alternative, I wished ChatGPT so as to add some worth to the climate. I wished it to attract an image representing the climate on the time the immediate was executed.

Additionally: Is immediate engineering a 'fad' hindering AI progress?

Earlier than trying to assign a immediate to a future time, I first labored by means of and refined the principle immediate itself. That is vital. Be sure you have a immediate that works earlier than unleashing it on the scheduling agent.

I wished a properly formatted briefing, together with that consultant image. After a variety of refinement rounds, right here's what I acquired.

Good, huh? That's the state capitol constructing right here in Salem, Oregon. Right here is the immediate I used to create this custom-made climate briefing.

Carry out the next steps strictly and output outcomes sequentially:

Print a line containing the textual content: 'Your every day climate transient' in heading 2 daring letters.
Generate a DALL-E picture that visually represents immediately's climate in Salem, Oregon. The picture ought to embrace parts related to the climate (e.g., rain, sunny skies) and a recognizable landmark just like the Oregon State Capitol. Instantly show the picture.
Print a heading: 'In the present day's climate' adopted by the climate situation and temperature for Salem, Oregon, immediately.
Print a heading: 'Dawn/sundown' adopted by the dawn and sundown instances for Salem, Oregon, immediately
Print a heading: 'Air high quality' adopted by the air high quality for Salem, Oregon, immediately
Print a heading: 'Advisories' adopted by any advisories for Salem, Oregon, immediately. If there aren’t any advisories, show 'No advisories immediately'
Print a heading: 'Commute' adopted by any suggestions for commuting in Salem, Oregon, immediately, significantly primarily based on weather-related points.
Print a heading: 'Outside actions' adopted by any suggestions for out of doors actions in Salem, Oregon, primarily based on immediately's climate

Don’t proceed to the following step till the earlier one is full. All the time retry picture era if it fails.

It took me couple of hours to get ChatGPT to do that reliably. Be aware the primary line, the place I'm telling it to "carry out the steps strictly" and "output outcomes sequentially." The usage of "strictly" was really really useful by ChatGPT once I requested it why it wasn't really following the instructions.

I ran right into a bunch of issues attempting to get the image to generate. Step 2 clearly says to make use of DALL-E. I discovered that "visually represents" satisfied the AI to make use of present circumstances with the theme to supply a newly created picture. I additionally had it embrace a landmark, as a result of all the opposite photographs it generated had been largely of small cities with large timber, like this one.

It additionally confused Celsius and Fahrenheit. 36 levels C would have been virtually 97 levels F. For a chilly January day, that's a mistake. And, in fact, "droize." Though, I’ve to say, dwelling in Oregon, the climate right here actually does really feel like "droize." So factors to DALL-E for making up a phrase that basically does symbolize the way it feels on the market.

Lastly, I had a tough time at all times getting ChatGPT to generate the image in any respect. I discovered the ultimate instruction of "Don’t proceed to the following step till the earlier one is full. All the time retry picture era if it fails," appeared to beat the issue.

Additionally: 15 methods AI saved me time at work in 2024 — and the way I plan to make use of it in 2025

So, by this time, I had a immediate that labored reliably in ChatGPT. It was time to unleash it as a scheduled activity.

Agentifying the duty

To do that, all I did was add "At 9:30am immediately" to the start of the immediate. To make it repeat, simply substitute "immediately" with "every single day."

Then, proper on time, there was an e mail in my inbox.

I clicked View message and acquired the output on the left. Discover that it says 50 levels — however our native temps didn't get above 40 immediately. Nonetheless, it's a pleasant image.

Additionally discover that the AI determined so as to add the phrase "step" with every step quantity to every part of my beforehand good customized output. I did a second run with the very same immediate and acquired the model on the proper.

I then spent the following three hours attempting to persuade ChatGPT to not embrace the steps. Generally I acquired an image. Generally I didn't. Generally I acquired a full forecast, different instances I didn't. As soon as, I simply acquired again the complete immediate. As soon as I simply acquired again the topic of the e-mail message, however no particulars.

So, yeah…

Not prepared for prime time

To be truthful, OpenAI does label this characteristic as beta. And boy-oh-boy, is it beta. On one hand, the thought of an AI agent with the ability to do issues like draw a consultant image of a sure set of knowledge appears intriguing. Then again, an AI agent that refuses to observe instructions and goes off on all kinds of tangents appears terrifying.

A minimum of with non-AI algorithms, if our code goes off the rails, it's our fault as programmers. However in the case of AI-based brokers, you actually can't topic your agentic operations to finish take a look at suites as a result of the AI will carry out in another way primarily based on the information it will get, the part of the moon, and its temper. That's an exaggeration, however most likely not by a lot.

Now we have many of the items to do that. Because the AIs get higher and higher (we will solely hope, proper?) we should always be capable to launch little brokers that assemble a every day briefing.

However AI brokers that management machines, the Web of Issues, safety, weapons, and different worrisome real-world operations? I'm unsure I'm going to be behind that till we will show we’ve got rather more full management over the AIs than we're seeing right here.

In any other case, a immediate like "management my house surroundings so I can sleep by means of the evening" might nicely end result within the AIs killing us whereas we sleep as their manner of enthusiastically following our instructions.

I actually want tech would cease giving me that squidgy feeling behind my neck. What about you? Are you wanting ahead to attempting out ChatGPT Duties or are you extra satisfied than ever that we should always go dwell in a yurt within the woods? Tell us within the feedback beneath.

You possibly can observe my day-to-day undertaking updates on social media. Make sure you subscribe to my weekly replace e-newsletter, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.