Anthropic’s ‘Computer Use’ Finally Gets a Voice

Recently, Hume AI, the voice-to-voice ‘conversational’ assistant, explored a new capability powered by Claude’s Computer Use. Anthropic’s coveted, yet controversial, feature that can autonomously control your computer will now recognise your voice commands as it does so.

Hume AI made this possible by coupling with their Empathetic Voice Interface (EVI) 2, deployed on a Replit template. It also continues to build on EVI’s moat, which it says is “the only voice-to-voice model that’s both interoperable with any LLM and available today as an API”.

Hume AI converts voice commands to text that can be recognised by Claude, which are then translated into actions that Computer Use performs. Hume AI took to X to demonstrate the capabilities with a user verbally instructing Hume AI and Computer Use to play chess on a web browser.

Control + Alt + Take Over

Anthropic’s leap ahead of reasoning models and multimodal capabilities to an autonomous system control agent was met with a lot of praise, amazement, and concern. That said, Claude says the feature is still in the beta stage and is far from being deployed in real-world scenarios. But it didn’t take time for established products to test Computer Use’s capabilities.

“At this stage, it is still experimental – at times cumbersome and error-prone. We’re releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time,” Anthropic said in an announcement last month.

Hume AI joins the growing list of products, including Replit, Asana, Canva and Zapier, that are using Claude’s API to test the potential of Computer Use.

“By navigating new UIs with a thoughtful testing plan, it opens the door to creating a powerful autonomous verifier that can evaluate apps while they’re being built,” Replit president Michele Catasta said.

Moreover, Dario Amodei, CEO of Anthropic, suggested they are only at the beginning of what they can achieve with Computer Use. He believes that the techniques Anthropic uses to level up its game in code, and text generation will also work for computer use. “I expect those same techniques will scale here as they have everywhere else.”

He further said that implementing Computer Use did not require much “additional training” for their foundational model.

Moreover, a plethora of developers knocked themselves out with Computer Use since day one. Earlier, AIM had covered some of the ways users were experimenting with Computer Use, but with each passing day, even more interesting use cases continue to emerge.

Recently, an AI agent-maker, Agency, conducted a hackathon where over 250 hackers built use cases on top of AgentOps AI, an agentic tool capable of automating computer control, in just 24 hours. Some of the use cases included a project that could read through Slack messages, identify tasks, and submit them to work management platform Asana, and a browser automation agent that automatically creates SalesForce leads.

A New UX Revolution?

For a long time, there have been discussions of potential interfaces to interact with AI that go beyond the chat interface. While third-party agents built on top of foundational models help automate workflows, they’re much suited for the enterprise environment.

With Computer Use, Anthropic may change the way an average Joe uses AI – through a stock vanilla application. Unlike agents, features like Computer Use will suit a broader application range, given their capability of navigating a graphical user interface.

Over the next decade, it’s likely that new AI interfaces will emerge beyond the common chat UI that many products are using today.
What are some of the most interesting ones you’ve built or use today?

— Aaron Epstein (@aaron_epstein) November 27, 2024

In an interview with AIM, Paras Chopra, founder of Turing’s Dream and Wingify, said, “The chat interface has been a relic of human-to-human communication, and we’ve adapted it for AI, but it’s highly underexplored in terms of what different ways we should interact with artificial intelligence.”

“Agents are one thing, but the user experience of interaction with an LLM also needs a lot of thought,” he added.

If Hume AI is set to scale the feature and releases an app for Android devices, it would certainly set the sun on any possible prospects for Humane AI Pin, and Rabbit R1.

Talking about smartphones, it is worth noting that Apple has already introduced a futuristic way for human-computer interaction—in production! While their eye tracking feature forms the core of Vision Pro’s user interface, Apple doesn’t advertise much about its availability on the iPhone.

It only takes a few minutes to set up the feature, after which users can control iOS with their eyes. It lets them select options when they dwell their sight on a certain area. It wouldn’t be surprising if Apple builds upon this feature in the future, and explores further ways to improve user interactions.

Lonely at the Top, but Not For Long

Sure, Anthropic has the first-mover advantage, but it doesn’t make it free from competition. For one, Microsoft has released its Copilot Vision as an experimental feature on Copilot Labs.

In an announcement, Microsoft said Copilot’s limitation of being able to communicate with language alone will be addressed with Copilot Vision, which will essentially be able to see everything the user is viewing on a device, and then provide relevant context or understanding for the same.

However, Microsoft clarified that “Copilot Vision is also not engaging directly with the web and that it’s there to answer questions rather than take actions”.

“It’ll be an advocate for you in many of life’s most important moments. It’ll accompany you to that doctor’s appointment, take notes and follow up at the right time. It’ll share the load of planning and preparing for your child’s birthday party. And it’ll be there at the end of the day to help you think through a tricky life decision,” said Mustafa Suleyman, CEO of Microsoft AI.

That said, a Microsoft employee told AIM a few weeks ago that autonomous capabilities are also part of the company’s plans.

While Google gave a preview of its Jarvis AI agent, OpenAI is set to launch its agent Operator in January. A look at OpenAI, however, suggests they’re predominantly focused on reasoning and multimodal models, and it may seem difficult to catch up with Anthropic’s Computer Use. Moreover, Anthropic’s proportion of enterprise revenue is more than that of OpenAI.

A recent report from Menlo mentioned that OpenAI’s share in the enterprise AI market fell by 16%, whereas Anthriopic observed a rise of 16%. The report mentioned statistics on AI adoption in enterprises, surveying 600 enterprise leaders. That said, only 10% of the respondents implemented a workflow automation tool in their application layer.

“The primary beneficiary has been Anthropic, which doubled its enterprise presence from 12% to 24% as some enterprises switched from GPT-4 to Claude 3.5 Sonnet when the new model became state-of-the-art,” read the report. However, it should be noted that Menlo Ventures has led multiple fundraising rounds for Anthropic.

Amodei further said that Anthropic will continue to capitalise with partnerships in the enterprise sector. “Our view has been ‘let 1,000 flowers bloom’. We don’t internally have the resources to try all these different things. Let our customers try it and we will see who succeeds and maybe different customers will succeed in different ways,” he added.

The post Anthropic’s ‘Computer Use’ Finally Gets a Voice appeared first on Analytics India Magazine.