6 Open-Source LLMs That Can Run on Smartphones 

use LLMs on your phone

Large language models (LLMs) demand substantial computational resources, which are often limited to powerful servers. However, a new generation of compact models is making it possible to run these powerful language models directly on your smartphones. Interestingly, you won’t require the internet to utilise LLMs on your smartphones.

Here are six open-source LLMs that can be trained and optimised to be used on your smartphones.

  1. Gemma 2B: Google’s compact, high-performance LLM for mobile language tasks.
  2. Phi-2: Microsoft’s tiny model outperforming giants up to 25 times its size.
  3. Falcon-RW-1B: Efficient 1B-parameter model for resource-constrained mobile devices.
  4. StableLM-3B: Stability AI’s balanced model for diverse language tasks on phones.
  5. TinyLlama: Compact Llama variant delivering impressive results on cell phones.
  6. LLaMA-2-7B: Meta’s powerful 7B model for advanced tasks on high-end smartphones.

1. Gemma 2B

Google’s Gemma 2B is a compact language model that delivers impressive performance despite its small size. It utilises a multi-query attention mechanism, which helps reduce memory bandwidth requirements during inference.

This is particularly advantageous for on-device scenarios where memory bandwidth is often limited. With just 2 billion parameters, Gemma 2B achieves strong results on academic benchmarks for language understanding, reasoning, and safety.

It outperformed similarly sized open models on 11 out of 18 text-based tasks.

2. Phi-2

With 2.7 billion parameters, Phi-2 has been shown to outperform models up to 25 times larger on certain benchmarks. It excels in tasks involving common sense reasoning, language understanding, and logical reasoning.

Phi-2 can be quantised to lower bit-widths like 4-bit or 3-bit precision, significantly reducing the model size to around 1.17-1.48 GB to run efficiently on mobile devices with limited memory and computational resources.

One of the key strengths of Phi-2 is its ability to perform common sense reasoning. The model has been trained on a large corpus of web data, allowing it to understand and reason everyday concepts and relationships.

3. Falcon-RW-1B

Falcon-RW-1B is part of the Falcon family of language models, known for their efficiency and performance. The RW stands for ‘Refined Web’, indicating a training dataset curated for quality over quantity.

Falcon-RW-1B’s architecture is adapted from GPT-3 but incorporates techniques like ALiBi (Attention with Linear Biases) and FlashAttention to enhance computational efficiency. These optimisations make Falcon-RW-1B well-suited for on-device inference on resource-constrained devices like smartphones.

The Falcon-RW-1B-Chat model aims to add conversational capabilities to the Falcon-RW-1B-Instruct-OpenOrca model to improve user engagement, expand use cases, and provide accessibility for resource-constrained environments like smartphones.

4. StableLM-3B

StableLM-3B, developed by Stability AI, is a 3 billion parameter model that strikes a balance between performance and efficiency. The best part of StableLM-3B is that despite being trained on fewer tokens, it outperformed models trained on 7 billion parameters on some benchmarks.

StableLM-3B can be quantised to lower bit-widths like 4-bit precision, significantly reducing the model size to around 3.6 GB to make it run efficiently on smartphones. A user mentioned that StableLM-3B has outperformed Stable’s own 7B StableLM-Base-Alpha-v2.

5. TinyLlama

TinyLlama leverages optimisations like FlashAttention and RoPE positional embeddings to enhance computational efficiency while maintaining strong performance. It is compatible with the Llama architecture and can be integrated into existing Llama-based mobile apps with minimal changes.

TinyLlama can be quantised to lower bit-widths like 4-bit or 5-bit precision, significantly reducing the model size to around 550-637 MB. A user, while sharing his experience with TinyLlama, mentioned that on a mid-range phone like the Asus ROG, TinyLlama was generating 6-7 tokens per second.

Ladies and gentlemen we have tinyllama running locally on mobile ( my dad's cause I am having a broke af low powered phone ) using termux. It aint even hard to do. Here a few pics of me playing with it and me documenting the steps on a notepad pic.twitter.com/lIsYiBiHh9

— Govind-S-B (@violetto96) December 23, 2023

6. LLaMA-2-7B

The LLaMA-2-7B model has been quantised to 4-bit weights and 16-bit activations, making it suitable for on-device deployment on smartphones. This quantisation reduces the model size to 3.6GB, making it feasible to load and run on mobile devices with sufficient RAM.

LLaMA-2-7B model on mobile requires a device with at least 6GB of RAM. During inference, the peak memory usage ranges from 316MB to 4785MB on the Samsung Galaxy S23 Ultra. This suggests that while the model can run on devices with 6GB+ RAM, having more RAM allows for better performance and reduces the risk of out-of-memory errors.

While it requires devices with sufficient RAM and may not match the speed of cloud-based models, it offers an attractive option for developers looking to create intelligent language-based features that run directly on smartphones.

Running llama-2-7b on Replit on my phone thanks to expandable storage. It’s kinda decent on CPU with 8 toks/s. https://t.co/r9yqvWayL5 pic.twitter.com/XeijnENM9E

— Amjad Masad (@amasad) July 21, 2023

The post 6 Open-Source LLMs That Can Run on Smartphones appeared first on AIM.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...