Microsoft Analysis has launched BitNet b1.58 2B4T, a brand new 2-billion parameter language mannequin that makes use of just one.58 bits per weight as a substitute of the standard 16 or 32. Regardless of its compact measurement, it matches the efficiency of full-precision fashions and runs effectively on each GPUs and CPUs.
The mannequin was educated on a big dataset containing 4 trillion tokens and performs properly throughout a variety of duties, together with language understanding, math, coding, and dialog. Microsoft has launched the mannequin weights on Hugging Face, together with open-source code for operating it.
Within the technical report, Microsoft stated that “BitNet b1.58 2B4T achieves efficiency on par with main open-weight, full-precision LLMs of comparable measurement, whereas providing vital benefits in computational effectivity, together with considerably lowered reminiscence footprint, vitality consumption, and decoding latency.”
The mannequin’s structure is “derived from the usual Transformer mannequin… incorporating vital modifications based mostly on the BitNet framework”. The central innovation is “changing the usual full-precision linear layers with customized BitLinear layers”, the place “mannequin weights are quantised to 1.58 bits throughout the ahead cross”. This quantisation makes use of an “absolute imply (absmean) quantisation scheme, which maps weights to ternary values {-1, 0, +1}.”
Activations are quantised to 8-bit integers with an “absolute most (absmax) quantisation technique, utilized per token”. Subln normalisation is included to additional improve coaching stability. The feed-forward community (FFN) sub-layers make use of squared ReLU (ReLU²) activation.
Rotary Place Embeddings (RoPE) inject positional info. According to architectures like LLaMA, all bias phrases are faraway from the linear layers and normalisation layers. The tokeniser developed for LLaMA 3 implements a byte-level Byte-Pair Encoding (BPE) scheme with a vocabulary measurement of 128,256 tokens.
The coaching course of for BitNet b1.58 2B4T consists of three phases, pre-training, supervised fine-tuning (SFT), and direct desire optimisation (DPO).
BitNet b1.58 2B4T demonstrates that it’s attainable to dramatically cut back the computational necessities of enormous language fashions with out giving up efficiency. With its compact structure and aggressive outcomes, it represents a significant step ahead in making AI fashions extra environment friendly and accessible.
The submit Microsoft Unveils 1-Bit Compact LLM that Runs on CPUs appeared first on Analytics India Journal.