AI — Страница 895

What the Lacklustre Performance of AI Wearables Says About AI Hype

In recent months, we’ve seen the introduction of many new age AI wearable devices. These devices were meant to propel humanity into a future where AI wearables became a norm, potentially eclipsing the era of smartphones.

When Rabbit Inc. and Humane launched the Rabbit R1 and AI Pin, it awed the internet. As usual, social media discussions revolved around AI wearables killing smartphones.

However, reality struck pretty soon this time. Marques Brownlee, probably the most popular tech reviewer in the world, called Rabbit RI “barely reviewable”.

Moreover, some developers also decided to look closely and found that Rabbit R1 is just an application that can run on any smartphone.

According to Android Authority, Rabbit R1, in fact, runs Android under the hood and they even managed to install it on a Pixel 6a.

Reviewers were not kind to Humane’s Ai Pin either. Founded by former Apple employees, the company positioned its gadget as something that will help humanity take the first step in the post-smartphone world.

But it was merely seen as a smartphone without a screen. Everything that Ai Pin could do could be done through the smartphone itself. To make matters worse, recent reports suggest the founders are seeking potential buyers for the company.

Hype vs Reality

If you thought these gadgets would replace smartphones anytime soon, you fell for the AI hype. When ChatGPT arrived in November 2022, it was meant to change search engines forever, and many labelled it as the Google Killer.

However, nearly a year and a half later, Google search remains the most dominant and widely used search engine in the world. While OpenAI did rock Google’s boat to some extent, the latter brought the same technology that powers ChatGPT to Search.

Microsoft, which has invested over $10 billion in OpenAI, was also quick to bring LLMs to its own search engine, Bing. However, did it manage to gain significant market share from Google in the search space? No.

Similarly, it’s unlikely that AI wearables will also replace smartphones anytime soon. Replacing smartphones would need a cultural change. Google today has become a household name, similarly, for many of us, it’s hard to imagine a day without a smartphone.

Modern humans have become accustomed to smartphones, relying on them for a myriad of things in their day-to-day lives.

Besides regular functions like calling, using a camera, sending emails, texting, and e-shopping, executives can use smartphones to conduct meetings on the go, and it also allows developers to write code.

Above all, smartphones today offer convenience. You can order medicines, watch a new episode of your favourite series on Netflix, and email your boss to say, ‘You are sick.’

Additionally, it provides an integrated ecosystem, enhanced interface, security, and versatility. Unless AI wearable devices can outperform these features significantly and offer additional benefits, convincing users to make the transition will be challenging for these companies.

Smartphones will Make AI Consumable

The reality is AI will come to smartphones first and phones will make AI accessible to consumers. On an average, a billion smartphones are sold every year.

Android phone companies such as Google, Samsung, Vivo, among others are already shipping AI features with their high-end smartphones.

While LLMs mostly run on the cloud, phone makers are building smaller versions which can run locally on the phone. For example, Google’s Gemini Nano runs locally on its flagship smartphone, Pixel 8.

By next year, we could expect these brands to introduce more AI features and even on their mid-range smartphones.

Recently, at Worldwide Developers Conference (WWDC) 2024, Apple announced Apple Intelligence, showing how AI can run locally on the phone. Running small language models locally on most devices remains a possibility by next year.

According to Qualcomm CEO Cristiano Amon, AI could even create a new upgrade cycle for phones.

AI Wearables will Have their Day, Someday

Having said that, it would not be right to write off these devices completely. First of all, the founders need a big shoutout for taking up something so bold.

It’s also important to remember that AI wearables are relatively new compared to smartphones. What we have today are the first iterations.

In the future, we could have better AI wearable devices or better versions of the existing devices.

Moreover, not all AI wearables have failed to impress critics and consumers. For instance, Meta’s Ray Ban AI glasses received positive reviews from multiple sources. Gadget360 called it “versatile and practical”.

Interestingly, AI glasses are not new. Google launched its glasses in 2014. Meta, too, has been working on its glasses for a few years.

Meta did not project its glasses as a replacement for smartphones but rather as an advanced AI-powered accessory. Here, it’s important to note that Meta’s chief AI scientist, Yann LeCun, did say AI wearables could replace smartphones in the next 10-15 years.

Is the AI Hype Real?

This is not the first time someone in the AI space has made a forward-looking statement. Overtime, we have seen many prominent figures in the space make such claims, be it about AI wearables, AI or even artificial general intelligence (AGI).

Tesla chief Elon Musk has been promising Level 5 autonomous vehicles for many years. He even claimed AI will be smarter than every human by next year.

Interestingly, most of the bold claims are made by companies or founders who are selling AI. Over time, many of these claims have also been put into question.

For instance, OpenAI, when it released GPT-4, did not reveal any details about its AI model, but did claim that the model scored 90th percentile in the Uniform Bar Exam.

However, a recent Massachusetts Institute of Technology report questions OpenAI’s claims, calling it misleading.

Last year, Google DeepMind claimed its AI tool GNoME found 2.2 million new crystals, including 380,000 stable materials that could power future technologies.

Yet, in a perspective paper featured in Chemical Materials, researchers from the University of California analysed a random subset of the 380,000 structures. They contended that the substances identified are, in fact, crystalline inorganic compounds and should be classified as such, rather than being broadly termed ‘materials’.

What these developments tell us is that oftentimes more is promised than what the technology can do. What is being promised now is based solely on the surface capabilities of future iterations of these technologies.

We are witnessing the same with AI wearables. MKBHD rightly pointed out on X saying, “This is the pinnacle of a trend that’s been annoying for years: Delivering barely finished products to win a ‘race’ and then continuing to build them after charging full price. Games, phones, cars, now AI in a box.”

LinkedIn Premium subscribers get more AI-powered job hunt tools. Here’s what’s new

Navigating the job market can be difficult, whether you want to land your first role, make a career switch, or move to a new company. LinkedIn has helped working professionals navigate the job search with a host of tools, and now, Premium users are getting AI enhancements to help even more.

On Thursday, the business-focused social platform unveiled new AI-powered features that LinkedIn Premium members can use to help them find a job, perfect their application materials, and upskill through courses, coaching, feedback, and more.

When job hunting, you must consider and track numerous factors, including location, pay, and actual role. Looking through listings that match all your needs can be exhausting. For that reason, LinkedIn implemented a new conversational job search.

Also: How to use ChatGPT to write a cover letter (and why you should)

Now, Premium users can ask LinkedIn for help finding a job that meets their criteria, and LinkedIn will automatically populate relevant roles. For example, according to LinkedIn, a possible prompt could be, "Find me a remote marketing job in Detroit that pays at least $110,000."

The feature's demo shows that the chat interface can also help you with other questions, including offering advice on how to land your next role, such as "How can I stand out in my job search?" — as seen below.

Once you find the role you'd like to apply to, the next phase is submitting your application, which includes the dreaded resume and cover letter. LinkedIn added features that can help with both.

First, it can provide immediate feedback on your application and resume so you can tailor it to be a better fit for a specific role. LinkedIn also will provide subscribers with personalized cover letter recommendations, which they can review and edit.

LinkedIn Learning also received a few upgrades for professionals interested in learning more about upskilling opportunities. These include real-time, AI-powered coaching when watching a LinkedIn course, enabling users to chat about the course's contents, and get clarity on the topics, examples, and more.

Through LinkedIn Learning, users will also participate in an interactive experience where they can chat with an AI chatbot trained by an expert. AI-powered one-on-one advice allows users to chat indirectly with experts who might not otherwise be available. The initial pilot includes Alicia Reece, Anil Gupta, Dr, Gemma Leigh Roberts, Lisa Gates, and more.

To help users learn more about their content, Premium subscribers will also receive actionable insights on their posts, articles, and newsletters. These post takeaways will help users tailor their content to achieve their goals.

Also: How to use ChatGPT to build your resume

Finally, LinkedIn added enhancements for businesses, including expanding Recruiter 2024 to global availability; new features in Accelerate, LinkedIn's AI campaign-creating offering, such as the addition of Microsoft Designer; and Premium Company Pages for small businesses.

If you want to try LinkedIn Premium but don't want to commit to the $30 per month cost, there are a couple of ways you can do it for free, including a month free trial.

Artificial Intelligence

Dubai Launches ‘One Million Prompters’- AI Prompt Engineering Training Initiative

Dubai is emerging as a prominent global hub for AI. In a pioneering initiative, the Gulf metropolis has committed to training one million individuals in AI skills over the next three years, marking the world’s first program of its kind.

‘One Million AI Prompters’, is a prompt engineering initiative that prepares expertise and competencies in AI prompt engineering.

“We want to be the most future-ready city and continue to prepare for the AI era by developing expertise and skills that support global technological change and put Dubai at the forefront of innovation,” said the Chairman of the Board of Trustees of the Dubai Future Foundation (DFF), Crown Prince Sheikh Hamdan bin Mohammed bin Rashid Al Maktoum, at the launch of the ‘One Million AI Prompters’ initiative in Dubai.

The United Arab Emirates, with Dubai and Abu Dhabi as its key emirates, is focusing on transitioning from an oil-dependent state to a leading AI power. By 2031, the UAE aims to have 40% of its GDP generated through AI.

To achieve the same, the nation is investing billions of dollars and has appointed the world’s first Minister of State for AI.

Further, it is attracting scientists to the Gulf and is heavily supporting startups.

“We want to show people that there is a full spectrum of use cases. Whether you are technical or non-technical, you can utilise these tools,” said Omar Al Olama, the UAE’s Minister of State for AI, digital economy, and remote work applications.

UAE on the AI Global Map

The UAE is increasingly being sought after as a strategic partner by global tech giants. Microsoft, Google, and IBM participated in the first championship with workshops, and other companies and countries have also shown interest in AI collaborations.

Since 2019, the UAE has had an AI university, and the sovereign wealth fund Mubadala has established a $100 billion AI fund.

Recently, Microsoft invested $1.5 billion in G42, the leading UAE-based AI technology holding company. The investment aims to fortify the partnership between G42 and Microsoft, focusing on expanding AI technologies and skilling initiatives in the UAE and across the globe.

Under this collaboration, G42 will run its AI applications and services on Microsoft Azure, facilitating the delivery of advanced AI solutions to global public sector clients and large enterprises.

How Retrieval Augment Generation (RAG) makes LLMs smarter than before

Blog-Augment-Generation-Makes-LLMs-Smarter-Than-Before.jpg

Ideal generative AI versus reality

Foundational LLMs have read every byte of text they could find and their chatbot counterparts can be prompted to have intelligent conversations and be asked to perform specific tasks. Access to comprehensive information is democratized; No more figuring out the right keywords to search or picking sites to read from. However, LLMs are prone to rambling and generally respond with the statistically most probable response you’d want to hear (sycophancy) an inherent result of the transformer model. Extracting 100% accurate information out of an LLM’s knowledge base doesn’t always yield trustworthy results.

Chat LLMs are infamous for making up citations to scientific papers or court cases that don’t exist. Lawyers filing a suit against an airline included citations to court cases that never actually happened. A 2023 study reported, that when ChatGPT is prompted to include citations, it had only provided references that exist only 14% of the time. Falsifying sources, rambling, and delivering inaccuracies to appease the prompt are dubbed hallucination, a huge obstacle to overcome before AI is fully adopted and trusted by the masses.

One counter to LLMs making up bogus sources or coming up with inaccuracies is retrieval-augmented generation or RAG. Not only can RAG decrease the tendency of LLMs to hallucinate but several other advantages as well.

These advantages include access to an updated knowledge base, specialization (e.g. by providing private data sources), empowering models with information beyond what is stored in the parametric memory (allowing for smaller models), and the potential to follow up with more data from legitimate references.

What is RAG (Retrieval Augmented Generation)?

Retrieval-Augmented Generation (RAG) is a deep learning architecture implemented in LLMs and transformer networks that retrieves relevant documents or other snippets and adds them to the context window to provide additional information, aiding an LLM to generate useful responses. A typical RAG system would have two main modules: retrieval and generation.

retrieval augmented generation architecture - RAG

The main reference for RAG is a paper by Lewis et al. from Facebook. In the paper, the authors use a pair of BERT-based document encoders to transform queries and documents by embedding the text in a vector format. These embeddings are then used to identify the top-k (typically 5 or 10) documents via a maximum inner product search (MIPS). As the name suggests, MIPS is based on the inner (or dot) product of the encoded vector representations of the query and those in a vector database pre-computed for the documents used as external, non-parametric memory.

As described in the piece by Lewis et al., RAG was designed to make LLMs better at knowledge-intensive tasks which “humans could not reasonably be expected to perform without access to an external knowledge source”. Consider taking an open book and non-open book exam and you’ll have a good indication of how RAG might supplement LLM-based systems.

RAG with the Hugging Face 🤗 Library

Lewis et al. open-sourced their RAG models on the Hugging Face Hub, thus we can experiment with the same models used in the paper. A new Python 3.8 virtual environment with virtualenv is recommended.

virtualenv my_env --python=python3.8
source my_env/bin/activate

After activating the environment, we can install dependencies using pip: transformers and datasets from Hugging Face, the FAISS library from Facebook that RAG uses for vector search, and PyTorch for use as a backend.

pip install transformers
pip install datasets
pip install faiss-cpu==1.8.0
#https://pytorch.org/get-started/locally/ to 
#match the pytorch version to your system
pip install torch

Lewis et al. implemented two different versions of RAG: rag-sequence and rag-token. Rag-sequence uses the same retrieved document to augment the generation of an entire sequence whereas rag-token can use different snippets for each token. Both versions use the same Hugging Face classes for tokenization and retrieval, and the API is much the same, but each version has a unique class for generation. These classes are imported from the transformers library.

from transformers import RagTokenizer, RagRetriever
from transformers import RagTokenForGeneration
from transformers import RagSequenceForGeneration

The first time the RagRetriever model with the default “wiki_dpr” dataset is instantiated it will initiate a substantial download (about 300 GB). If you have a large data drive and want Hugging Face to use it (instead of the default cache folder in your home drive), you can set a shell variable, HF_DATASETS_CACHE.

# in the shell:
export HF_DATASETS_CACHE="/path/to/data/drive"
# ^^ add to your ~/.bashrc file if you want to set the variable

Ensure the code is working before downloading the full wiki_dpr dataset. To avoid the big download until you’re ready, you can pass use_dummy_dataset=True when instantiating the retriever. You’ll also instantiate a tokenizer to convert strings to integer indices (corresponding to tokens in a vocabulary) and vice-versa. Sequence and token versions of RAG use the same tokenizer. RAG sequence (rag-sequence) and RAG token (rag-token) each have fine-tuned (e.g. rag-token-nq) and base versions (e.g. rag-token-base).

tokenizer = RagTokenizer.from_pretrained(
"facebook/rag-token-nq")
token_retriever = RagRetriever.from_pretrained(
"facebook/rag-token-nq", 
index_name="compressed", 
use_dummy_dataset=False)
sequence_retriever = RagRetriever.from_pretrained(
"facebook/rag-sequence-nq", 
index_name="compressed", 
use_dummy_dataset=False)
dummy_retriever = RagRetriever.from_pretrained(
"facebook/rag-sequence-nq", 
index_name="exact", 
use_dummy_dataset=True)
token_model = RagTokenForGeneration.from_pretrained(
"facebook/rag-token-nq", 
retriever=token_retriever)
seq_model = RagTokenForGeneration.from_pretrained(
"facebook/rag-sequence-nq", 
retriever=seq_retriever)
dummy_model = RagTokenForGeneration.from_pretrained(
"facebook/rag-sequence-nq", 
retriever=dummy_retriever)

Once your models are instantiated, you can provide a query, tokenize it, and pass it to the “generate” function of the model. We’ll compare results from rag-sequence, rag-token, and RAG using a retriever with the dummy version of the wiki_dpr dataset. Note that these rag-models are case-insensitive

query = "what is the name of the oldest tree on Earth?"
input_dict = tokenizer.prepare_seq2seq_batch(
query, return_tensors="pt")
token_generated = token_model.generate(**input_dict)  token_decoded = token_tokenizer.batch_decode(
token_generated, skip_special_tokens=True)
seq_generated = seq_model.generate(**input_dict)
seq_decoded = seq_tokenizer.batch_decode(
seq_generated, skip_special_tokens=True)
dummy_generated = dummy_model.generate(**input_dict)
dummy_decoded = seq_tokenizer.batch_decode(
dummy_generated, skip_special_tokens=True)
print(f"answers to query '{query}': ")
print(f"t rag-sequence-nq: {seq_decoded[0]},"
f" rag-token-nq: {token_decoded[0]},"
f" rag (dummy): {dummy_decoded[0]}")

>> answers to query ‘What is the name of the oldest tree on Earth?’: Prometheus was the oldest tree discovered until 2012, with its innermost, extant rings exceeding 4862 years of age.

>> rag-sequence-nq: prometheus, rag-token-nq: prometheus, rag (dummy): 4862

In general, rag-token is correct more often than rag-sequence, (though both are often correct), and rag-sequence is more often right than RAG using a retriever with a dummy dataset.

“What sort of context does the retriever provide?” You may wonder. To find out, we can deconstruct the generation process. Using the seq_retriever and seq_model instantiated as above, we query “What is the name of the oldest tree on Earth”

query = "what is the name of the oldest tree on Earth?"
inputs = tokenizer(query, return_tensors="pt")
input_ids = inputs["input_ids"]
question_hidden_states = seq_model.question_encoder(input_ids)[0]
docs_dict = seq_retriever(input_ids.numpy(),
question_hidden_states.detach().numpy(),
return_tensors="pt")
doc_scores = torch.bmm(
question_hidden_states.unsqueeze(1),
docs_dict["retrieved_doc_embeds"]
.float().transpose(1, 2)).squeeze(1)
generated = model.generate(
context_input_ids=docs_dict["context_input_ids"],
context_attention_mask=
docs_dict["context_attention_mask"],
doc_scores=doc_scores)
generated_string = tokenizer.batch_decode(
generated,
skip_special_tokens=True)
contexts = tokenizer.batch_decode(
docs_dict["context_input_ids"],
attention_mask=docs_dict["context_attention_mask"],
skip_special_tokens=True)
best_context = contexts[doc_scores.argmax()]

We can code our model to print the variable “best context” to see what was captured

print(f" based on the retrieved context"
f":nnt {best_context}: n")

based on the retrieved context:

Prometheus (tree) / In a clonal organism, however, the individual clonal stems are not nearly so old, and no part of the organism is particularly old at any given time. Until 2012, Prometheus was thus the oldest “non-clonal” organism yet discovered, with its innermost, extant rings exceeding 4862 years of age. In the 1950s dendrochronologists were making active efforts to find the oldest living tree species in order to use the analysis of the rings for various research purposes, such as the evaluation of former climates, the dating of archaeological ruins, and addressing the basic scientific question of maximum potential lifespan. Bristlecone pines // what is the name of the oldest tree on earth?

print(f" rag-sequence-nq answers '{query}'"
f" with '{generated_string[0]}'")

We can also print the answer by calling the “generated_string” variable. The rag-sequence-nq answers ‘what is the name of the oldest tree on Earth?’ with ‘ Prometheus’.

What Can You Do with RAG?

In the last year and a half, there has been a veritable explosion in LLMs and LLM tools. The BART base model used in Lewis et al. was only 400 million parameters, a far cry from the current crop of LLMs, which typically start in the billion parameter range for “lite” variants. Also, many models being trained, merged, and fine-tuned today are multimodal, combining text inputs and outputs with images or other tokenized data sources. Combining RAG with other tools can build complex capabilities, but the underlying models won’t be immune to common LLM shortcomings. The problems of sycophancy, hallucination, and reliability in LLMs all remain and run the risk of growing just as LLM use grows.

The most obvious applications for RAG are variations on conversational semantic search, but perhaps they also include incorporating multimodal inputs or image generation as part of the output. For example, RAG in LLMs with domain knowledge can make software documentation you can chat with. Or RAG could be used to keep interactive notes in a literature review for a research project or thesis.

Incorporating a ‘chain-of-thought’ reasoning capability, you could take a more agentic approach to empower your models to query RAG system and assemble more complex lines of inquiry or reasoning.

It is also very important to keep in mind that RAG does not solve the common LLM pitfalls (hallucination, sycophancy, etc.) and serves only as a means to alleviate or guide your LLM to a more niche response. The endpoints that ultimately matter, are specific to your use case, the information you feed your model, and how the model is finetuned.

Databricks Looks to Dominate Enterprise AI With Open-Source

Data and AI giant Databricks announced a host of new generative AI capabilities and a major push to its open-source strategy at the annual Data + AI Summit. The new offerings, such as Mosaic AI Model Training, Mosaic AI for RAG, and Mosaic AI Gateway, in addition to open-sourcing their Unity Catalog, aim to help enterprises build high-quality, domain-specific AI applications.

“We want to help people get the best quality possible in their domain for their GenAI application,” said CTO and co-founder Matei Zaharia in an exclusive interview with AIM. “And to do that, we see a lot of companies are building what we call compound AI systems.”

These compound AI systems involve multiple components, such as calls to different models, retrieval of relevant data, use of external APIs and databases, and breaking problems into smaller steps. At the same time, Databricks is also focusing on open-source models.

Why is Databricks Betting Big on Open-Source?

While acknowledging the rapid advancements in closed-source models, Zaharia noted that Databricks is definitely betting big on an open-source strategy. He also believes that the performance gap between closed-source and open-source models is rapidly narrowing.

This is evidenced by recent open-source models like DBRX, Mistral 8×22 billion, and Llama 3 approaching the quality of the best closed-source models. “They’re all quite good, and they’re all in that space, getting really close to the best closed models. Meanwhile, the best closed models haven’t gotten that much better.”

Acknowledging the possibility that significantly higher investments could lead to superior closed models, Zaharia believes that open-source development will continue to thrive as companies seek to share development costs.

While consumer AI applications may stagnate, Zaharia predicts that the most exciting advances in generative AI will come from open, customisable models in the B2B world, applied to complex industry use cases.

“I actually think the most exciting sort of advances in GenAI will next be in the B2B world with custom AI for challenging mission-critical domains,” said Zaharia.

“That’s another reason that we’re betting on open models,” he added.

He drew parallels to how open-source big data technologies initially powered consumer applications but later had a transformative impact on the enterprise.

He elaborated, “Let’s say you build a model for chemistry. That’s really good. Even if it’s not as good at chatting about random topics as GPT-4, it’s still extremely valuable.”

Mosaic AI for Training Cost-Effective Models & Quality Monitoring

The new offerings in Mosaic AI are designed to address major hurdles like quality, cost, governance and security that organisations face in building and deploying generative AI applications.

“If you don’t get the right kind of quality for your application, then you’re stuck,” Zaharia emphasised.

One key offering is the RAG framework in Mosaic AI, which provides a quick way to deploy and manage an entire generative AI application, including the vector database, data pipeline, and serving layer.

Databricks is also introducing quality monitoring capabilities for compound AI applications. This includes the ability to see detailed traces, review results, and even use LLMs as automated judges to score outputs.

“So, of course, you can do prompt engineering, you can try to tell the model to do different things, but at some point, if you have examples of data that you can label and give it, you’ll do a lot better,” explained Zaharia.

“And we actually packaged up all the stuff that we used to train DBRX, which our research team had just developed. It’s now behind a very simple serverless API,” he added.

Additionally, Mosaic AI Training enables organisations to fine-tune models using their own labelled data, resulting in higher-quality outputs.

Fine-Tuning and Cost Reduction

Zaharia underscored the significance of fine-tuning foundation models on organisations’ own data with Mosaic AI Model Training.

He cited the example of FactSet, a financial data vendor that initially built an application using GPT-4, which was, at best, 55% accurate and took 10 seconds per user query. By switching to a multi-step AI system with Databricks, FactSet achieved 87% accuracy, reduced query time to three seconds, and lowered costs by approximately five times compared to GPT-4 calls.

Mosaic AI Training is an optimised software stack that makes training LLMs cost-effective. Through system-level optimisations, tuned parallelism strategies, and model training science, it can reduce training costs by up to 10x.

Zaharia emphasised that another factor to consider for cost-efficiency benefits is using custom and open-source models. “You might have something that works, but it’s very expensive, very slow.

“This is where custom models and open-source models provide a huge benefit because you can often take something that works well with a very expensive model, collect a bunch of examples of it and then fine-tune a small, low-cost model to do it well,” he explained.

Databricks’ DBRX model, for example, surpasses GPT-3.5 in quality while being faster and more cost-effective to serve, with costs similar to a 13 billion parameter model.

By incorporating Databricks Mosaic AI into their data strategy, organisations can experience reduced training time and costs, improved model performance, increased developer productivity, enhanced scalability, and democratised AI.

How we test phones at ZDNET in 2024

The way the smartphone fits into our daily lives has changed dramatically over the past decades, from being solely a communication device to now connecting us to the vast internet. Today, the definition of the smartphone is being altered again, with AI slowly but surely taking center stage in the mobile experience. It might even replace apps one day.

Also: The best phones to buy in 2024

No matter the outcome, the value of smartphones in modern society is immeasurable; it's a must-have gadget. So, to help readers like you find the best handset for your needs and preferences, ZDNET's team of mobile experts tests just about every phone that hits the market throughout the year, from Androids to iPhones. We even test the devices that claim they'll replace smartphones.

If you've ever wondered how we evaluate the latest smartphones to decide if they're worth recommending, here's a breakdown of the various aspects we consider.

How we test phones in 2024

For starters, the phones we review at ZDNET are mostly provided by manufacturers shortly before they launch to the public. That means our initial hands-on reviews are typically based on a week's time (or longer) with the unreleased devices.

Within the embargoed time frame, ZDNET reviewers can test the latest features (ideally on the latest software patch), ask follow-up questions to manufacturers, and evaluate the devices without any influence from other reviewers. There are also moments when we purchase phones to test or review a device provided by a mobile carrier, not the manufacturer. In the latter case, we'll explicitly credit the carrier in the coverage, though it'll have no editorial influence.

While ZDNET primarily covers smartphone releases in the US market, we also evaluate international handsets to understand the competitive landscape better and have a frame of reference when making recommendations to international readers. We also attend trade shows, including CES and Mobile World Congress, to connect with industry experts and analysts.

What makes a phone ZDNET recommended?

For hands-on testing, five aspects determine whether or not a phone gets recommended by ZDNET: design, performance, cameras, battery life, and special features. The importance of each aspect will vary across users; some will value camera quality over battery life, and others just want a phone that's unique and different. Generally, the order of importance is cameras, battery life, design, performance, and then special features.

To be included in our buying guides, the best smartphones must achieve above-average marks on all five criteria (with a reviewed score of over 3.5 out of 5), especially when compared to other devices priced similarly. Reviewers also consider the key differences between the latest phone models and their predecessors during the grading process.

Design and ergonomics

How a phone looks and feels can greatly influence the overall experience. There's a reason why Apple stores meticulously arrange iPhones the way they do, with the most colorful options front and center. Does the latest A17 Pro processor on the iPhone 15 Pro Max truly matter if you're already mesmerized by the adorably-sized, blue-colored iPhone 15? (Just me?)

But also, how does the phone feel when it's tucked in your tight jeans or lightweight shorts? When testing and recommending phones, we consider design and ergonomics heavily, understanding that not everyone wants the biggest and most premium-feeling option out there. For example, a device with a plastic casing will serve you better than an all-glass build if you're a construction worker or someone who's often outdoors.

To truly test the real-world experience of using the latest iPhones and Androids, ZDNET reviewers often don't accessorize the handsets with silicone or rubberized cases; instead, we browse, take pictures, and roam around with them as is. Phones get brownie points if they're rated IP68, the industry standard for water and dust resistance.

Performance

Several factors affect a phone's performance, including LTE/5G signal, battery life, and background tasks. Therefore, we typically begin our evaluations with a fully charged handset, all background tasks closed, and as stable of a mobile connection as possible. I'm based in New York City, so I typically test the performance capacity of phones across various signal areas, such as the subway (where LTE signal can range from poor to non-existent) and back home in Staten Island (where LTE signal is richer due to the lack of skyscrapers and congestion.)

Performance testing also includes putting phones through varying levels of graphic-intensive tasks, including importing and exporting spreadsheets, photo-editing in Adobe Lightroom, and playing mobile games like Genshin Impact and Asphalt 9. I'll oftentimes have a music player app running in the background or YouTube Picture-in-Picture just to push the mobile processor a little more.

Of course, reviewers also consider the price of the tested devices, adjusting their standards and expectations accordingly.

Cameras

Arguably the most valuable aspect of today's smartphones, built-in cameras have improved so much over the past few years that they're now our most convenient (and reliable) tool to capture life's most important moments. Testing phone cameras at ZDNET includes capturing hundreds of photos and videos of various subjects and in various lighting conditions. The list of subjects ranges from flower petals (for macro shots) to people (for portrait shots) to the moon (for zoom/periscope shots).

Also: The best camera phone of 2024: Expert tested and reviewed

Having a larger sample size to reference and compare with images from other phone models gives us the most accurate assessment of what phone camera is best at preserving details, colors, contrast, and more. Whether we're evaluating the latest Samsung Galaxy phones to each other or with the latest iPhone, ZDNET reviewers can typically be found with more than one device in their pockets, both for comparison reasons and because we're simply tech geeks.

Battery life and charging

It's also important for us to evaluate how long phones last under light, moderate, and heavy usage, how long they take to recharge, and how they do it (wired, wireless, or both). We typically judge the endurance of phones based on screen-on time (SOT); that's the total amount of time the screen is turned on, whether you're scrolling through TikTok or typing an email. The higher the SOT, the longer the phone lasts.

On average, phones can score from three hours of SOT to upwards of nine hours of SOT, with the value resetting after 24 hours or when the phone is fully recharged. However, remember that a high SOT value is not always correlated to top-tier battery life; being able to play a Netflix video at full brightness for four hours straight is more impressive, endurance-wise, than leaving a text document on the screen for nine hours. Therefore, when speaking to the battery life of phones, we also describe it in a more practical sense — mentioning if a device can last one full day of usage, more or less.

Special features

Beyond the traditional testing pillars, we also consider phones' unique and special features as we finalize our buying advice. Devices like the Samsung Galaxy S24 Ultra have a built-in S Pen stylus, the Nothing Phone 2a has a light-up back cover, and the OnePlus Open can fold and expand into a handheld tablet. Such features distinguish these devices from a bustling smartphone market, bringing added value to users. Of course, they're judged by a practicality scale, and only the most useful gimmicks will earn our reviewers' approval.

ZDNET Recommends

Meet TORAX, Google DeepMind’s Breakthrough in Open-Source Nuclear Fusion Simulation

Google DeepMind researchers have released TORAX, a new open-source differentiable tokamak core transport simulator implemented in Python using the JAX framework. TORAX essentially simulates the transport of particles, energy, and momentum within the core of a tokamak fusion reactor.

According to the new paper, TORAX solves coupled equations for ion heat, electron heat, particle transport, and current diffusion. It incorporates modular physics-based and machine-learning models, leverages JAX for fast runtimes via just-in-time compilation and automatic differentiation, enables gradient-based optimisation workflows and Jacobian-based PDE solvers, and facilitates coupling to machine-learning surrogate models of physics.

TORAX has been verified against the established RAPTOR code, demonstrating excellent agreement in simulated plasma profiles at stationary state. For an ITER L-mode scenario, the normalised root-mean-square deviation between TORAX and RAPTOR temperature and density profiles was around 1%.

A key innovation is TORAX’s use of the JAX framework, allowing just-in-time compilation for speed and automatic differentiation for advanced algorithms like gradient-based optimisation. JAX also simplifies the integration of machine learning surrogate models like the QLKNN neural network trained on gyrokinetic turbulence simulations.

“TORAX offers a powerful and versatile tool for accelerating fusion energy research,” said Google DeepMind research scientist and lead author Jonathan Citrin. “Its differentiability and ability to leverage machine learning models are game-changers.”

The open-source TORAX code aims to foster collaboration and rapid progress in tokamak modelling for fusion reactor design and operation.

Simulation Training

Google DeepMind has a history of open-sourcing simulators for this purpose. Back in 2020, they released a scalable environment simulator for artificial intelligence research, which helped DeepMind create 2D environments for AI and machine learning research. Simulated training is also the most commonly adopted technique to equip general-purpose robots for the real world.

When Ola Krutrim Attempted UPSC

When ChatGPT was launched, it was said to have all the answers in the world. AIM took the test of that promise, and made ChatGPT attempt the Union Public Service Commission (UPSC) examination. And as we all know, it did not clear USPC.

Now that Indian language models are all the hype, we wanted to make at least one of them attempt the country’s most prestigious and one of the toughest examinations in the world. Since the challenge was so tough, AIM decided to test out Krutrim, which is touted as the most indigenous and culturally aware model of India.

To make Krutrim realise the tough ordeal that it’s going to go through, we decided to let it know and asked if it thinks it can clear the exam. Unfortunately, instead of accepting its fate, it decided to wish us ‘Good luck!’.

Not So Smart and Aware

We made Krutrim attempt the 100 questions from Question Paper 1 (Set A) from UPSC Prelims 2023. It only got 41 of them correct. Since the cut off of the exam was 75.41 for the general category this year, Krutrim failed the UPSC exam miserably.

To compare, ChatGPT answered 54 of them correctly when we took the test in 2022.

The questions ranged from subjects such as geography, economy, history, ecology, general science to current events of national and international importance, social development, and polity.

Strong at Small Questions, Weak at Reasoning

When it comes to geography and general science, Krutrim was able to answer several questions correctly. But, when it came to history and economy, the chatbot fared poorly at even understanding the questions. But all of this seems to depend on its mood.

Moreover, if the given questions had longer contexts, Krutrim failed to correctly answer almost all of them, showing its weak reasoning skills.

Since Krutrim is not connected to the internet, it was not able to answer any questions on current affairs. Surely, it is still in beta and with future updates, the model will be able to get real time information, and maybe hallucinate less too.

Its responses were at times difficult to understand.

The Context Window Problem

Another problem that Krutrim faces, which is worse when compared to other AI models, is that users cannot insert all the text from a single question in one go. At most, Krutrim can take an input of about 500 characters, which is roughly 80 words. Many questions from the paper were longer than that, thus Krutrim could not process them hassle-free.

Moreover, although Krutrim claims to support multiple languages, pasting questions from the paper into the input box was impractical because it counted those characters as more than their English equivalents.

Plus, there is no option to upload a PDF or even scan images on Krutrim yet, which could have made things a lot easier. Nonetheless, attempting the paper in Hindi or other Indian languages is for another time.

Not All is Lost

This just clearly points to the fact that Indian language models, in this case Krutrim, are not nearly as smart as say ChatGPT or Perplexity. Krutrim struggles to find the right answer, and even if it does sometimes, it is hard to assess if it was a fluke or not, since there is no concrete explanation for the answer.

The attempt was also made on the browser, and not the app released by Ola Krutrim since the text input method is far worse in it, and is also without voice input.

Though Bhavish Aggarwal, the CEO of Krutrim, is making big strides in the country with a lot of announcements to make Krutrim the best Indian AI model, such as creating its own cloud and offering its API to developers, there is still a lot that needs to be improved.

Meanwhile, when we informed Krutrim that it had failed the UPSC exam, it told us that we had failed the exam and how it can assist us with study materials!

Well played, Krutrim, well played!

Don’t wait for iOS 18’s AI. ChatGPT offers these same 4 features now

After ChatGPT's launch in November 2022, nearly every company joined in on the AI craze — except Apple. Two years later, at its annual Worldwide Developer Conference (WWDC), Apple unveiled a collection of AI features known as Apple Intelligence. While they look impressive (and certainly are), many of the new features have been done before by OpenAI's ChatGPT.

This fall, Apple's software updates will bring generative AI capabilities to iPhones, iPads, and Macs. However, OpenAI already unveiled major upgrades to the free version of ChatGPT in May, and the similarities to what Apple Intelligence will do are worth noting.

Also: Five iOS 18 features that Android users already have

The Apple Intelligence updates will be free, but the full experience will only be available on iPhones with the A17 Pro chip, which currently only includes iPhone 15 Pro and iPhone 15 Pro Max, and iPads and Macs with the M family of chips.

Instead of spending thousands of dollars rushing to upgrade to the newest Apple devices, you should check out ChatGPT first. OpenAI's free chatbot has many of the same features coming to Apple Intelligence.

1. Writing tools

With Apple Intelligence, users will be able to access a variety of writing tools that can help with rewriting, proofreading, and summarizing text. Apple says the tools will be accessible "everywhere" users write, including in Mail, Keynote, third-party apps, and more.

ChatGPT's advanced natural language processing (NLP) makes it a great writing tool as well. It can generate new text from scratch, proofread, coedit, rewrite, and more. While the ChatGPT experience may not live natively within Apple devices the way Apple Intelligence will, users can easily copy and paste its output into a tab while accessing ChatGPT in their browser.

Also: How ChatGPT (and other AI chatbots) can help you write an essay

Another option for Apple users is to take advantage of the ChatGPT app for iPads and iPhones. There's even a ChatGPT app for MacOS, which allows users to access the chatbot quickly via a keyboard shortcut. The Mac app is available now for ChatGPT Plus subscribers, but OpenAI is planning to release access to all users in the coming months.

2. Image generator

Apple also unveiled its first text-to-image generator, Apple's Image Playground. This generator will be built into iOS 18, iPadOS 18, and MacOS Sequoia as part of Apple Intelligence, and will also live as a stand-alone app.

Also: The best AI image generators to try right now

It's unclear without testing it, but Image Playground's functionality will likely be similar to OpenAI's image generator, DALL-E 3, which can be accessed via ChatGPT Plus. Even though it requires a $20 per month subscription, it offers users a wider variety of options, since it can render images in any style. Image Playground is limited to three styles: Animation, illustration, and sketch.

3. On-screen awareness

At Apple's event, the company shared that with Apple Intelligence, Siri will have on-screen awareness, making asking it for help with certain tasks easier. During OpenAI's Spring Launch event, the company also showed a demo indicating that ChatGPT will have on-screen awareness as well, as seen below — though it didn't clarify when we'll see this feature.

The value of having an AI voice assistant that can see what you are working on and use it as context for your query is evident, and it will likely be the future of all assistants.

4. Advanced conversational capabilities

Another update coming to Siri is better NLP, meaning it will be able to understand you even if you stutter or pause. OpenAI indicated at its Spring Launch that the improved Voice Mode for ChatGPT will have the same capabilities, such as stopping when a user interrupts it, understanding queries better, and more. The improved Voice Mode will be rolling out in alpha in the coming weeks, and ChatGPT Plus users will get early access as the company rolls it out more broadly.

Also: Everything to know about Apple's AI features for iPhones, Macs, and iPads

5. Type and chat

With Apple Intelligence, Siri will be upgraded to accept typed and voice queries — a significant change considering that it has only ever functioned as a voice assistant. However, as discussed above, ChatGPT can also take text and voice inputs.

6. And, of course…access to ChatGPT

Apple also announced that Siri will have access to ChatGPT, which you can also access by going directly to the source.

Apple

Bengaluru-Based Startup rampp.ai Raises $500,000

Bengaluru-based startup rampp.ai announced on Thursday that they had successfully bagged angel funding of USD $500,000.

The startup, which is currently powered by OpenAI and partnered with Microsoft, founded the RADI Navigator Platform. The platform helps deliver contextual and customised insights to enterprises in order to allow them to better strategise across business domains and functionalities.

The company was founded by tech entrepreneurs Ajay Agrawal and Huzefa Saifee in 2023. This is the fourth such venture by the duo, in which they leverage technology to provide highly tailored business solutions to organisations.

“We see early investor support as a validation of our core strategy to leverage GenAI for tackling transformative challenges. Constant market & technology shifts, coupled with external confounding factors, make enterprise-wide transformation daunting, and our solution provides leaders with crucial insights to tread with confidence,” said Agrawal, who is also the CEO of the startup.

According to the company, the funding has come from a group of seasoned executives and investors, a majority of whom come from India and North America.

In particular, they target companies that are aiming to grow and pivot, allowing rampp.ai to help them leverage AI to navigate these intricacies. “This results in agile adaptation and scalable pivoting with measurable outcomes, no matter the distinct challenges a business faces,” the company said.

Termed as a “GenAI-based enterprise-grade business transformation solutions” provider, the company’s major product is the RADI Navigator Platform.

Short for rampp.ai Artificial Digital Intelligence, the platform makes use of a deep industry and technology knowledge base to create an effective roadmap for companies in real time. This includes incorporating stakeholder inputs and offering tools to ensure that a business can accelerate its processes.

While not much is known about the capacity of the startup that has partnered with OpenAI and Microsoft, this is not the first time that either major company has partnered with smaller startups. However, a partnership with a Bengaluru-based startup is particularly interesting as OpenAI has started to put more focus on its operations in India.

For one, the company only recently hired its first India-based employee. Further, they are also hoping to establish a local team in the country.