Hugging Face has introduced SafeCoder, an enterprise-focused code assistant that aims to improve software development efficiency through a secure, self-hosted pair programming solution. SafeCoder claims to be a comprehensive, security-driven commercial offering, ensuring code remains within the VPC throughout training and inference. Its customer-centric design, enables on-premises deployment and ownership of the Code Large Language Model, just like a personalised GitHub Copilot.
Additionally, Hugging Face has partnered with VMware to offer SafeCoder on the VMware Cloud platform. VMware is currently using SafeCoder internally and sharing a blueprint for swift deployment on their infrastructure, ensuring quick time-to-value.
But Why Is SafeCoder Needed?
Code assistants like GitHub Copilot, built on OpenAI Codex, boost productivity. Enterprises can enhance this by customising LLMs with their code, as seen with Google’s 25-34% completion rate from training on internal code. However, using closed-source LLMs for in-house assistants poses security risks, both during training (exposing sensitive code) and inference (potential code leakage).
Hugging Face’s SafeCoder addresses this, allowing proprietary LLMs built on open models, fine-tuned on internal code, without external sharing. It also offers secure, on-premises deployment for code privacy.
From StarCoder to SafeCoder
In May, Hugging Face and ServiceNow collaborated on the BigCode project, releasing StarCoder, an open-source language model tailored for code. Enhanced from StarCoderBase, it mastered 35B Python code segments. Impressively, StarCoder excelled on benchmarks like HumanEval, outperforming PaLM, LaMDA, and LLaMA. It matched or surpassed closed models like OpenAI’s code-Cushman-001, formerly behind GitHub Copilot. Boasting 15.5B parameters, 1T+ tokens, and an 8192-token context, it drew from GitHub data across 80+ languages, commits, issues, and notebooks. StarCoder powers SafeCoder, optimized for enterprise self-hosted use with efficient inference, adaptability, and ethical data sourcing.
Training Method
The SafeCoder model excels in over 80 programming languages, adapting code suggestions for users via collaborative training with Hugging Face. Proprietary data remains secure, resulting in a personalized code generation model for customers, promoting self-sufficiency, vendor independence, and control over AI capabilities.
SafeCoder’s inference capability encompasses diverse hardware selections, including NVIDIA Ampere GPUs, AMD Instinct GPUs, Habana Gaudi2, AWS Inferentia 2, Intel Xeon Sapphire Rapids CPUs, and other options, providing customers with a broad spectrum of choices.
Read more: The Peaks and Pits of Open-Source with Hugging Face
The post After StarCoder, Hugging Face Launches Enterprise Code Assistant SafeCoder appeared first on Analytics India Magazine.