Hugging Face is among the extra distinguished platforms for internet hosting AI fashions. Whether or not an AI mannequin is developed by ByteDance, Google, or a startup, one is prone to discover it listed there.
In 2024, Hugging Face acquired XetHub, a Seattle-based firm which was a platform to construct and deploy GenAI functions, to utilise its applied sciences to change to a greater model of Git Massive File Storage (LFS) as a storage backend for its Hub’s repositories.
Quick-forward to 2025, and Hugging Face has begun migrating its first mannequin and dataset repositories from LFS and to Xet storage.
The Limitations of Git LFS Storage for AI Repositories
Git LFS is an open supply Git extension for versioning massive information. It replaces information like audio, movies, datasets, and graphics with textual content pointers inside Git, whereas storing the file individually on a distant server.
On the time of scripting this report, Hugging Face nonetheless utilises the know-how together with Amazon S3, a cloud storage service, for distant storage. By September 20, 2024, the entire quantity of information hosted by Hugging Face reached a formidable 29 petabytes.

Nevertheless, the corporate clarified that the repositories on the Hugging Face Hub differ from these on software program improvement platforms. It acknowledges that whereas LFS was designed for giant information, the kind of information in AI is considerably bigger. In consequence, the corporate all the time deliberate to transition to its personal optimised storage and versioning backend sooner or later.
“LFS deduplicates on the file stage. Even tiny edits create a brand new revision to add in full; painful for the multi-gigabyte information discovered in lots of Hub repositories,” Hugging Face defined in a weblog submit.
Introducing Xet Storage and its Use Circumstances to Hugging Face
To beat the constraints of Git LFS highlighted above, Hugging Face began to implement Xet storage.
“When a file backed by Xet storage is up to date, solely the modified knowledge is uploaded to distant storage, considerably saving on community transfers. For a lot of workflows, like incremental updates to mannequin checkpoints or appending/inserting new knowledge right into a dataset, this improves iteration velocity for your self and your collaborators,” Hugging Face identified.
Xet storage makes use of content-defined chunking (CDC) to deduplicate on the stage of bytes. When a small piece of metadata in a GGUF mannequin is modified, solely the altered chunks are transmitted. Moreover, a rolling hash algorithm is employed to compute the chunks. Xet storage additionally gives backwards compatibility with Git LFS.
With these technical deserves, Hugging Face recognised future use instances the place customers wouldn’t must re-upload a ten GB knowledge storage file after including a single row. As a substitute, they may merely re-upload the few chunks which have modified, together with the brand new row.
The corporate shared an instance the place an Xet-backed model of gemma-2-9b-it-GGUF repository totalled 97 GB, saving roughly 94 GB as a substitute of 191 GB, which was the unique dimension of the mannequin. It hints at practically 50% financial savings in storage, which ought to make it simpler for everybody to obtain it.
Hugging Face’s Migration Success
On March 18, Hugging Face shared a proof-of-concept for its first step of migration of repositories.
They said that the migration shifted roughly 6% of the Hub’s obtain site visitors to its Xet infrastructure. Within the course of, Hugging Face transferred all goal repositories for 4.5 TB into Xet storage.

Whereas they confronted challenges like sudden load imbalance and obtain overhead (as proven within the picture above) on their storage system, the preliminary migration was profitable, and Xet is now on the Hugging Face Hub.
Customers of the Hugging Face platform can expertise the advantages of it with much less ready on uploads or downloads and sooner iterations on huge information.
The corporate encourages upgrading to hf_xet to get the advantages, although the legacy purchasers can be suitable by way of the LFS Bridge.
The submit Hugging Face is Changing Git LFS With Xet Storage: Right here’s Why appeared first on Analytics India Journal.