Alibaba Cloud Launches its New Qwen2.5 Series

On 11 November, Alibaba Cloud released its new version of the open foundation Qwen model— Qwen2.5-Coder-32B-Instruct. The company refers to it as a ‘family of coder models’ and not just a model that can code.

This model is a significant upgrade from its predecessor, CodeQwen1.5. Ahsen Khaliq, the ML growth lead at Hugging Face, took to LinkedIn to express and said he created a tic-tac-toe game and noted that it was similar to Claude’s artefacts.

Alibaba, in its official blog post, said this series will help promote the development of Open CodeLLMs. This Qwen2.5-Coder, the code-specific model series based on Qwen2.5, was recently uploaded on Hugging Face.

The architecture of Qwen2.5-Coder is spread across six different model sizes: 0.5B, 1.5B, 3B, 7B, 14B, and 32B parameters. While all sizes share the same architecture in terms of head size, they differ in several other key aspects.

Binyuan Hui, core maintainer at Alibaba Qwen, took to X to share an an interesting game he created with Qwen2.5-Coder.

I created something interesting with Qwen2.5-Coder-32B…
I didn’t write a single line of code; it did everything on its own… pic.twitter.com/hBRG6ltLQF

— Binyuan Hui (@huybery) November 11, 2024

Training and Data

According to the official research published earlier this year, high-quality, large-scale, and diverse data is necessary for building pre-trained models. With this in mind, the Qwen team with Alibaba group developed a dataset called Qwen2.5-Coder-Data, which includes five primary data types: Source Code Data, Text-Code Grounding Data, Synthetic Data, Math Data, and Text Data.

Following the file-level pretraining, the company proceeded to repo-level pretraining to improve the model’s long-context capabilities. In this phase, the context length is expanded from 8,192 tokens to 32,768 tokens, and the base frequency of RoPE is adjusted from 10,000 to 1,000,000.

Results

The Qwen2.5-Coder series has set a new standard in open-source coding models, particularly with its flagship, Qwen2.5-Coder-32B-Instruct. This model excels in code generation, matching the capabilities of GPT-4o on benchmarks like EvalPlus, LiveCodeBench, and BigCodeBench.

Beyond generating code, Qwen2.5-Coder-32B-Instruct is adept at code repair, helping developers identify and fix errors efficiently. On the Aider benchmark, it achieved a score of 73.7, comparable to GPT-4o’s performance.

The model’s strength lies in its understanding of code execution, enabling accurate predictions of inputs and outputs. It supports over 40 programming languages, scoring 65.9 on the McEval benchmark, and leads in code repair tasks, scoring 75.2 on the MdEval benchmark.

Additionally, to see how well Qwen2.5-Coder-32B-Instruct matches human coding preferences, a test called Code Arena was conducted. Using a simple comparison with GPT-4o, the team measured which model performed better in each example. The results show that Qwen2.5-Coder-32B-Instruct is strongly aligned with human preferences.

Qwen2.5-Coder with Cursor

Even though code assistants are now widely utilised, most still depend on closed-source models. Alibaba and the Qwen team aim for Qwen2.5-Coder to offer developers a robust, open-source alternative. Below is an example of Qwen2.5-Coder in action within Cursor.

The post Alibaba Cloud Launches its New Qwen2.5 Series appeared first on Analytics India Magazine.