Google has introduced ‘localllm’ which allows developers to develop next-gen AI apps on local CPUs. ‘localllm’ is a set of tools and libraries that provide easy access to quantised models from HuggingFace through a command-line utility.
This solution eliminates the need for GPUs, offering a seamless and efficient solution for application development. ‘localllm’ revolves around the utilisation of quantised models optimised for local devices with limited computational resources. These models, hosted on Hugging Face and tailored for compatibility with the quantisation method, enable smooth operation on Cloud Workstations, eliminating the dependency on GPUs.
Quantised models offer improved performance by employing lower-precision data types, reducing memory footprint, and enabling faster inference. The combination of quantized models with Cloud Workstations enhances flexibility, scalability, and cost-effectiveness.
The approach aims to overcome the limitations of relying on remote servers or cloud-based GPU instances, addressing concerns related to latency, security, and dependency on third-party services.
Key features and benefits include GPU-free LLM execution, enhanced productivity, cost efficiency through reduced infrastructure costs, improved data security by running LLMs locally, and seamless integration with various Google Cloud services. To get started with the localllm, visit the GitHub repository at https://github.com/googlecloudplatform/localllm.
Google recently partnered with Hugging Face to enable companies to build their own AI with the latest open models from Hugging Face and the latest cloud and hardware features from Google Cloud.
The post Google’s ‘localllm’ Lets You Create GenAI Apps Without GPUs appeared first on Analytics India Magazine.