Anthropic has announced the release of an upgraded Claude 3.5 Sonnet model and the new Claude 3.5 Haiku, along with a public beta for an experimental feature called “computer use.”
Unlike traditional AI models that rely on specific APIs or tools for task completion, Claude 3.5 Sonnet is now able to navigate computer interfaces in a manner similar to human users. This means the AI can view a screen, move a cursor, click buttons, and type text, allowing it to perform tasks like filling out forms, navigating websites, and interacting with a wide range of software programs designed for human users.
“Entering a new era with “computer use” It’s like FSD for your computer!,” said Groq’s Sunny Madra.
Early adopters like Asana, Canva, Replit, and The Browser Company have already begun exploring its potential. Replit, for instance, is leveraging the feature to build a tool that evaluates apps during their development process.
However, the feature is not without its limitations. Tasks that are simple for humans—such as scrolling, dragging, or zooming—can be cumbersome for Claude at this stage. Despite these challenges, Anthropic believes that computer use has the potential to open up new possibilities for automation and AI-driven software development.
Both Claude 3.5 Sonnet and Claude 3.5 Haiku, along with the computer use feature, are available through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.
While the Claude 3.5 Sonnet model is ready for use immediately, the new Claude 3.5 Haiku will be available later this month. Future updates are expected to include further enhancements, such as image input capabilities.
The computer use feature is similar to former OpenAI co-founder Andrej Karpathy’s vision of an LLM OS. The LLM OS proposes using large language models as the “kernel” or central processing unit of a new type of operating system. It is envisioned as a broader, more modular architecture for agentic behavior, going beyond just a chat interface.
Claude 3.5 Sonnet
The upgraded Claude 3.5 Sonnet model offers substantial improvements in key areas, most notably coding. Anthropic reports that the model has made significant progress in agentic coding tasks, which involve AI autonomously generating and manipulating code.
On the widely recognised SWE-bench Verified benchmark, Claude 3.5 Sonnet’s performance increased from 33.4% to 49.0%, outperforming several major AI systems, including OpenAI’s o1-preview and other coding-focused models.
Claude 3.5 Sonnet has also improved in tasks requiring tool use. For example, the model achieved higher scores on the TAU-bench tool-use benchmark, improving its performance in the retail domain from 62.6% to 69.2%, and in the airline domain from 36.0% to 46.0%.
These advancements make Claude 3.5 Sonnet a strong contender for developers needing AI support in complex, multi-step tasks such as software development, autonomous AI evaluations, and problem-solving.
Claude 3.5 Haiku
Anthropic also introduced Claude 3.5 Haiku, a next-generation model that combines speed with affordability. Designed for real-time applications, Claude 3.5 Haiku improves across multiple benchmarks and surpasses its predecessor, Claude 3 Opus, in several areas, including coding tasks.
Claude 3.5 Haiku is built for tasks requiring low latency and accurate tool use, making it well-suited for user-facing applications, specialised sub-agent tasks, and handling large datasets like inventory records or purchase histories. This makes the model ideal for industries that rely on real-time data-driven decisions.
What are Microsoft and OpenAI up to?
Microsoft recently launched autonomous agents in Copilot Studio, set for public preview next month. These agents will automate tasks across sales, finance, and supply chain to streamline operations.
Microsoft has introduced ten new autonomous agents in Dynamics 365. These agents are built to help organisations drive business value by automating processes like lead generation, customer service, and supplier communication.
Meanwhile, OpenAI has introduced a new approach for creating and deploying multi-agent AI systems, called the Swarm framework. It simplifies the process of creating and managing multiple AI agents that can work together seamlessly to accomplish complex tasks.
The post Anthropic’s Claude 3.5 Now Controls Your Computer Like You Do appeared first on AIM.