How this retailer uses machine learning and computer vision to keep its shelves full

hales-photo-home-depot-2765

The Home Depot staff use computer vision to find items on shelves.

When you're a home improvement specialist with thousands of outlets throughout the US, it can be tough to keep track of products across stores and warehouses. Add in the complication of Black Friday and a busy holiday period and the challenge seems almost intractable.

Yet The Home Depot is meeting this test head-on by using a mix of machine learning (ML) and computer vision technology to help staff find products for customers quickly and effectively.

Also: Two breakthroughs made 2023 tech's most innovative year in over a decade

Hari Ramamurthy, technology fellow at The Home Depot, explains to ZDNET in a video interview how this deployment of emerging technology is very much par for the course for the retail giant.

"We are very much a technology-focused company," he says. "We look for ways we can leverage the latest and best technologies to materially improve the experience for our staff and ultimately our customers."

Ramamurthy says The Home Depot has developed an ML-powered app, known as Sidekick, to boost staff productivity.

The app, which also uses computer vision, is installed on "hdPhones", which are mobile devices used by The Home Depot's staff. These devices have been developed in collaboration with Zebra Technologies, HPE, and Aruba.

Sidekick went live at the beginning of 2023 and Ramamurthy says the app is just the latest stage in a range of data-led initiatives across the business.

"Technologies like machine learning or artificial intelligence clearly have tremendous potential in terms of unlocking the right outcomes for our associates and customers," he says.

Also: Generative AI in commerce: 5 ways industries are changing how they do business

When it comes to the development of Sidekick, The Home Depot created a bespoke system that uses a cloud-enabled ML algorithm to allow staff — whom Ramamurthy refers to as associates —to prioritize important tasks.

The app ensures that associates focus their attention on the most in-demand products and helps them locate items in hard-to-find locations, such as overhead shelves.

"We wanted to make sure our associates were always given the highest value task related to where they are, so they can be productive in the tasks they perform," he says. "We're using multiple signals generated from internal data sources to inform our algorithm."

The ML model takes data from transactional systems, including point-of-sale technologies and inventory management platforms.

However, the model goes beyond traditional structured sources of retail data and draws insight from semi-structured sources, such as video camera feeds that demonstrate the flow of shoppers within stores.

The app also uses computer vision, where images are captured by associates in the Sidekick app on their hdPhones.

Also: These 5 major tech advances of 2023 were the biggest game-changers

Staff members take pictures of locations across the store. The Home Depot uses the data to discover more details about which products are available on the shelves.

"Computer vision is a good example of data that's coming from a non-transactional system and informing our algorithms," says Ramamurthy.

"It's a very exciting technique because we can see there's a lot of information that comes through this stream to augment our data sources. It means we can build a more complete set of signals, and get the appropriate tasks generated and delivered to our associates."

While the app is a data-heavy tool that requires input from staff to work effectively and productively, Ramamurthy says the aim has been to ensure that any demands on staff are not overly onerous — and that their inputs produce big benefits in terms of outputs.

"Our goal is to make the technologies fade into the background and to be as seamless as possible," he says. "The associates don't really have to understand all the factors that went into play in ensuring that a task was generated. Our objective is simply to try and prioritize the appropriate tasks."

Also: Machine learning helps this company deliver a better online shopping experience

In his position as technology fellow at The Home Depot, Ramamurthy is always looking for ways to both hone the Sidekick app and find other sources of data-led innovation.

"My role is to bridge across our various product teams, business partners, and our technology team," he says. "We're constantly looking for ways to optimize how we perform certain tasks, as well as challenge the way we are thinking. That means considering the introduction of technologies and experimenting in many cases to develop next-generation experiences that make a dent in our customers' problems."

The Home Depot has experimented with various ML and artificial intelligence (AI) techniques for several years, including the home-grown Sidekick app.

Going down the bespoke development route for emerging technology might seem like a significant risk to some digital leaders.

Avivah Litan, distinguished VP analyst at Gartner, has previously told ZDNET that emerging technologies, such as ML and AI, promise big productivity increases, yet there are significant challenges to be overcome before the tools can reap big rewards in an enterprise context.

Also: Nearly 1 in 5 dollars of digital sales during cyber week was influenced by AI

In the case of The Home Depot, Ramamurthy says the company had the in-house talent and proof-of-concept studies to show that ML and computer vision could make a big difference.

The message for other digital and business leaders when it comes to exploiting emerging technology is to focus on testing and honing your approach.

"Our experience has been very iterative. Internally, we think of this as a 'crawl, walk, run approach' to delivering value. We've made tactical improvements and had challenges that we've had to overcome along the way," he says.

"But the iterative approach that we have taken has really helped us ensure we are able to deliver on the expectations. And at this point, we're happy with the results in terms of the performance and the overall experience for the associates."

Ramamurthy and this team continue to look for small iterations that will create big improvements to the Sidekick app.

He believes there's a lot more the company can do to not only ensure it generates the appropriate tasks for staff, but to focus on factors across every facet of the store, whether that's analyzing data from sales locations or considering the layout of the sales floor.

Also: AI in 2023: A year of breakthroughs that left no human thing unchanged

"Those are all areas for further exploration," he says. "In addition, we continue to look at how we can improve our statistical ML models and the quality of some of the tasks that we generate, especially when they're augmented with other signals that come through."

Ramamurthy says he's also keen to use the insight they glean from the Sidekick app to ensure associates have the right skills and resources as they complete their tasks.

"I think those are areas, both in terms of task generation and in terms of task delivery, where there's opportunity for further refinement," he says.

Innovation

Will December bring startup winter?

Will December bring startup winter? Haje Jan Kamps 8 hours

Welcome to Startups Weekly. Sign up here to get it in your inbox every Friday.

Borrowing has become more expensive, and profits are harder to come by, which means that 2023 has been a savage year in startup land. PitchBook data suggests that around 3,200 startups — representing a total of $27.2 billion in venture funding — have gone under, with a significant number of startups being in zombie mode: Unable to grow, unable to raise money, but just about limping by well enough to avoid shutdown. Layoffs are happening — also in December — and over the next couple of weeks, a bunch more startups will shut down, so as to not drag out the inevitable into a new tax year. I’ll be looking into this more deeply over the next couple of weeks, so stay tuned.

Also: I was supposed to be writing this newsletter but instead ended up playing the TechCrunch pub quiz for way too long. My score was . . . embarrassingly poor, considering that I’ve literally read every story on the site for the past year to write this newsletter. Still, it was fun — give it a whirl!

When the AIs come marching in

Image Credits: Devin Coldewey / TechCrunch

I love it when my colleagues are going super deep into nerd land. That’s definitely one of the hallmarks of Devin’s work from time to time; in this case, he talks about how “Star Trek: Deep Space Nine” fans are using AI to make the old show look better because there’s no official high-quality version. They’re using AI to add details to the original episodes, which is tough and takes a lot of effort — but it’s showing a bunch of promise. Devin concludes that the tech could be a cool way for companies to upgrade old shows, but there are some legal and technical hurdles to figure out. Don’t miss his 3,000-word ode to de-grainification.

The other AI-related nerd-out this week comes courtesy of Ron, who dug into the continued relevance of traditional AI models in enterprises, despite the rise of large language models (LLMs). That makes sense: LLMs are kind of the Leatherman of AI tools: They sort of do everything. I never leave the house without my Leatherman, and it has helped me out of many a knotty situation, but if I’m building a house or repairing a car, I break out the more specialized tool kit.

More startuppy AI news this week:

This really moved me: Just when you thought your online photos were safe, here comes Animate Anyone turning them into eerily lifelike, video deepfakes — because regular old photo fakes weren’t unsettling enough.

G-oops-le: Google’s new AI model Gemini isn’t exactly hitting it out of the park, with early users finding more bloopers than brilliance in its answers. Turns out, even Google can have an “oops” moment in the AI world.

The Pokémon approach to startups: Elon Musk, seemingly never tired of starting new ventures, is now chasing a cool $1 billion for his latest AI escapade — xAI — because why settle for running just a few companies when you can add another AI startup to your collection?

This week in Elon Times

Tesla CEO Elon Musk looks up as he addresses guests at the Offshore Northern Seas 2022 (ONS) meeting in Stavanger, Norway on August 29, 2022

Image Credits: CARINA JOHANSEN/NTB/AFP / Getty Images

Look, I’m as bored of Elon Musk as everyone else, but gotta give the guy credit for one thing: He doesn’t half attract some attention. Rarely for good reasons, recently, it must be said.

Darrell summarizes the situation in his piece “The end of Elon,” where he — tongue firmly planted in cheek and with the snark meter turned to 11 — dissects the Tesla Cybertruck launch (spoiler: It was a bit of a nothingburger; there’s still much unknown about the truck) and Musk’s, er, unique approach to managing his various ventures — including telling X (formerly Twitter) advertisers to go do something anatomically improbable.

Of course, there was (much) more Musk-related news this week, and if you want it all, give our Elon Musk tag a quick scroll.

What goes up . . . : SpaceX drops $2.2 million on a parachute company, because apparently making parachutes that don’t buckle in space is harder than rocket science.

Keep on truckin’: The Tesla Cyberbeast: Heavy, quick, and falling a bit short in towing compared to its high-priced electric rivals — but hey, who’s counting when you’re driving an angular beast?

Show me the money: X has scored licenses for payment processing in 12 U.S. states, inching closer to Musk’s vision of turning the platform into an “everything app.” With recent advertiser exits and controversies, it seems there’s more drama than dollars in Musk’s grand plan — for now.

Shutdown City

Image Credits: Bryce Durbin (opens in a new window)

After the heyday of 2021, a bunch of startups are crashing to the ground after failing to meet their goals. Let’s have a moment of silence for some of our fallen-from-grace brethren:

To its final zesting place: Going from a zesty $450 million valuation to shutting down, even Goldman Sachs’ backing couldn’t spice up ZestMoney’s survival.

So close: Edtech company Doubtnut learns the hard way that a bird in the hand is worth two in the bush, selling for $10 million after passing up a $150 million deal from Byju’s.

Now, not so fab: From unicorn to extinct: Prefab home builder Veev proves that soaring to billion-dollar status doesn’t guarantee a sturdy foundation.

Top reads on TechCrunch this week

That not enough for ya? Fine, here’s a collection of the most-loved, most-read articles from the past week:

Is it a bird? Is it a plane?: Anduril’s new fighter jet weapon, Roadrunner, lands with the grace of a Falcon 9.

Pour me another one: MIT spinoff Liquid AI thinks it’s time for a change in the AI game with their new “liquid neural network,” because who needs another GPT clone when you can have AI inspired by worm brains and run on a Raspberry Pi . . .

Yeah, but will it wear a beanie hat?: Ex-SpaceX engineers are now saving the planet with a “vegetarian rocket engine,” because apparently shooting stuff into space wasn’t cool enough. Also, were previous rockets full of bacon? I’m confused.

It’s electrifying: GM and Toyota, welcome to the Oops, We Missed the EV Bus club!

Breaking kneecaps, and YouTube records: Grand Theft Auto VI just stole MrBeast’s YouTube crown, racking up more views in a day than a money-giving philanthropist could dream of.

AWS Unveils Major Bedrock Upgrade: More AI Models and Enhanced User Flexibility

AWS Unveils Major Bedrock Upgrade: More AI Models and Enhanced User Flexibility December 8, 2023 by Drew Jolly (AI generated/Shutterstock)

As the generative AI landscape continually evolves with new use cases emerging, Amazon Web Services (AWS) is keeping pace by enhancing its Bedrock platform. This upgrade significantly broadens the range of AI models available, offering users more choices and greater flexibility for their AI-driven applications.

The latest updates to Amazon Bedrock include an expanded selection of AI models from AI21 Labs, Anthropic, Cohere, Meta, and Stability AI, along with Amazon's in-house models. Additionally, Amazon has introduced advanced customization options, enabling users to precisely adjust existing models using their own proprietary data. This is complemented by new tools designed for efficient evaluation and comparison of models, which assists in pinpointing the most suitable model for specific requirements.

Commenting at AWS re:Invent 2023, Adam Selipsky, CEO of AWS, emphasized the cloud giant's comprehensive approach to AI model deployment and development. Selipsky highlighted the collaboration with Hugging Face, a leader in the AI research space, to deploy their models on AWS SageMaker. This partnership has led to the creation of a Hugging Face AWS deep learning container designed to accelerate the training and deployment of foundation models using SageMaker, along with AWS's Tranium and Inferentia chips.

Selipsky stressed AWS's commitment to providing the resources necessary for building custom models. "The best chips, the most advanced virtualization, powerful petabyte-scale networking capabilities, hyperscale clustering and the right tools to help you build," he said.

Addressing the needs of organizations looking to quickly leverage powerful models, Selipsky acknowledged the challenges they face in selecting the right model for their specific applications. Questions about model selection, deployment speed, data security, and accuracy are top concerns for these organizations.

In response, AWS is investing significantly in "that middle layer in the stack," as Selipsky says. This investment aims to simplify the process of accessing and utilizing various foundation models, thereby enabling organizations to rapidly experiment, test, and deploy generative AI applications while ensuring data security and integrity.

Hype aside, generative AI is becoming integral to a few key business processes. AWS points out that industries such as customer service, content creation, and data analysis are increasingly relying on AI technologies to enhance efficiency and innovate services. AWS says that the Bedrock platform's expanded capabilities and model variety can be crucial to providing businesses with the tools to develop more sophisticated, AI-driven solutions that can adapt to their evolving needs.

With the increasing capabilities of AI models, ethical considerations and the responsible use of AI have become paramount. AWS says it is addressing these concerns by embedding robust security and privacy features into Bedrock, ensuring that users can innovate with AI while adhering to ethical standards and regulations.

In short, the Bedrock platform enhancements emphasize a key theme: choice in model selection and the freedom to experiment. By broadening the array of available AI models, AWS is empowering users with the flexibility to explore and select the most fitting AI solutions for their unique needs. This approach not only fosters a more tailored use of AI technology but also encourages innovative applications across different industries. As users navigate through the diverse options within Bedrock, they are better positioned to discover and leverage AI models that align with their specific goals and challenges.

Related Items:

AWS Launches New Analytics Engine That Combines the Power Of Vector Search And Graph Data

AWS Expands Amazon Bedrock with Additional Foundation Models, New Model Provider, and Advanced Capabilities

AWS and Nvidia Talk 65 Exaflop ‘Ultra-Cluster’ at re:Invent AWS Announces 5 New Amazon SageMaker Capabilities for Scaling with Models

Editor's note: This article originally appeared on Datanami.

Related

UltraFastBERT: Exponentially Faster Language Modeling

UltraFastBERT : Exponentially Faster Language Modeling

Language models and generative AI, renowned for their capabilities, are a hot topic in the AI industry. Global researchers are enhancing their efficacy and capability. These systems, typically deep learning models, are pre-trained on extensive labeled data, incorporating neural networks for self-attention. They use various layers—feedforward, recurrent, embedded, and attention—to process input text and produce relevant outputs.

Mostly, large language models' feedforward layers hold the most parameters. Studies show that these models use only a fraction of available neurons for output computation during inference.

This article introduces UltraFastBERT, a BERT-based framework matching the efficacy of leading BERT models but using just 0.3% of neurons during inference, specifically 12 out of 4095 in each layer. We'll explore UltraFastBERT's architecture, functionality, and results. Let’s begin.

UltraFastBERT : An Introduction to Exponentially Faster Language Modeling

Traditionally, a language model employs different components to equip itself with content generation capabilities including feedforward layers, recurrent layers, embedded layers, and attention layers. These components are responsible for learning to recognize patterns during training, and ultimately generate accurate output on the basis of the input texts. Each of these components have some parameters, and in language models, a bulk of these parameters is held by the feedforward layers. However, these feedforward layers do not utilize 100% of the neurons available to them to generate output for every input at interference time which leads to wastage of resources that increases complexity, computation time, and computational costs.

At its core, the UltraFastBERT framework is a variant of the BERT framework, builds on this concept, and replaces feedforward layers with faster feedforward networks in its architecture that ultimately results in the UltraFastBERT framework utilizing only 0.3% of the available neurons while delivering results comparable to BERT models with a similar size and training process, especially on the downstream tasks. Due to its design implementations, the intermediate layers in UltraFastBERT framework is exponentially faster,

Given a fast feedforward(FFF) network, and a feedforward(FF) network, each with n number of neurons, the time complexity of a forward pass in a feedforward network is O(n) whereas the time complexity is O(log2n) for a fast feedforward network, and the difference in time complexity is primarily due to the fact in a fast feedforward network, the neurons are organized into a balanced binary tree, and when the input is provided, the network executes only one branch of the tree conditionally. Furthermore, performing interference on a fast feedforward network results in CMM or Conditional Matrix Multiplication, in which the input rows dot with the natural weight columns individually, and the output of the previous dot-product operation determines the weight of the columns to proceed with. Resultantly, the network uses all the neurons only for a few inputs, and no input requires more than a few neurons to be handled by the network. The CMM dot product contrasts the DMM or Dense Matrix Multiplication that computes the dot product of all inputs with all the weight columns.

To sum it up, UltraFastBERT is a BERT-based framework that provides results comparable to state of the art BERT language models that

  1. Utilizes only 0.3% of the available neurons during the interference stage, and engages just 12 neurons out of a total of 4095 neurons for each interference layer.
  2. Delivers strong performance comparable to state of the art BERT models by implementing fine-tuning strategies on downstream tasks.
  3. Provides a native implementation of the CMM or Conditional Matrix Multiplication that forms the base for the fast feedforward network, and ultimately leads to 78x speedup in performance when compared to native optimized DMM or Dense Matrix Multiplication.

Feed Forward Neural Networks

A feedforward neural network is one of the most straightforward artificial neural networks that moves the information in only the forward direction, from the input nodes to the output nodes via hidden nodes. One of the main highlights of a fast forward neural network is that there are no loops or cycles in such networks, and they are simpler to construct when compared to RNN or Recurrent Neural Networks, and CNN or Conventional Neural Networks. The architecture of a fast forward neural network comprises three components namely input layers, hidden layers, and output layers, and every layer consists of units called neurons, and each layer is interconnected to the other with the help of weights.

The neurons present in the input layers receive inputs, and forwards it to the next layer. The amount of neurons in each input layer is determined by the dimension of the input data. Next up, we have the hidden layers that are not exposed either to the input or the output, and they are responsible for the necessary computations. The neurons in each hidden layer take the weighted sum of the outputs given by the previous layer, employ an activation function, and pass the result to the next layer, and the process repeats all over again. Finally, we have the output layer that produces the output for the given inputs. Each neuron in every layer of a fast feedforward network is interconnected with every neuron in the next layer, thus making FFF neural networks a fully connected network. Weights are used to represent the strength of connection between the neurons, and the network updates these weights to learn the patterns by updating the weights on the basis of the error occurring in the output.

Moving forward, there are two key stages in the working of a fast feedforward neural network: the feedforward phase, and the backpropagation phase.

Feedforward Phase

In the feedforward phase, the input is fed to the network, and it then propagates forward. The hidden layers then compute the weighted sum of the inputs, and introduce non-linearity in the model by passing the sum of the inputs through an activation function like ReLu, Sigmoid, and TanH. The process repeats all over again until the weights reach the output layer, and the model makes a prediction.

Backpropagation Phase

Once the model makes a prediction, it computes the error between the generated output, and the expected output. The error is then back propagated through the network, and the network uses a gradient descent optimization algorithm to adjust the weights in an attempt to minimize the error.

UltraFastBERT : Model Architecture and Working

The UltraFastBERT framework is built on the crammedBERT architecture, and the UltraFastBERT framework employs all the components of the crammedBERT framework except the nature of the intermediate layers. Instead, the UltraFastBERT framework replaces the transformer encoder in the feedforward networks contained in the intermediate layers of the crammedBERT framework with fast feedforward networks. The UltraFastBERT framework makes the following changes to the original feedforward networks.

  1. The framework gets rid of the difference between leaf, and non-leaf nodes by using the GeLu activation function across nodes, and equipping these nodes with output weights, and removing output biases in its entirety. Post this, the framework fixes the leaf size to 1.
  2. Finally, the framework allows multiple fast feedforward network trees in parallel by jointly computing the intermediate output layers. The framework manages to do this computation by taking a sum of individual trees, and then presents the sum as the intermediate output layer.

Moving along, in training, the UltraFastBERT framework follows the training procedure employed by the crammedBERT framework that includes disabling the dropout in pretraining, and using the 1-cycle triangular learning rate schedule. The model is then fine-tuned to maximize its performance on a wide array of tasks mainly of the GLUE benchmark for a total of 5 epochs.

Interference

Interference is an important part for a fast feedforward network, and these fast feedforward networks in themselves form a major chunk of large language models, and they are known for their exceptional acceleration potential. To understand this acceleration potential, let’s take an example of one of the most advanced language models, the GPT-3 in which the feedforward networks in every transformer layer consist of over 49,100 neurons. If trainable, a fast feedforward network(maximum depth of 15) could replace the original feedforward network. The introduced fast feedforward network will have over 65,000 neurons, but it will only utilize 16 of these neurons for interference, which amounts to roughly 0.03% of the neurons available to GPT-3.

Algorithm and Compatibility

The UltraFastBERT framework makes use of a recursive pseudocode algorithm for fast feedforward interference, and the algorithm is depicted in the image below.

Here, B represents the batch size, H represents the width of the input layers, and M represents columns. Another major cause of concern with the use of a Computational Matrix Multiplication approach is whether it makes the fast feedforward networks incompatible with the process that is already in use for Dense Matrix Multiplication and existing Deep Learning frameworks. Fortunately, the use of CMM does not affect the performance or introduces incompatibility, although it does increase the caching complexity.

It’s vital to note that as a part of the fast feedforward network, single-threaded Dense Matrix Multiplication relies on executing the MAC or Multiplication and Accumulation instructions, and resultantly, replacing DMM with CMM approach will benefit CPUs because fewer MAC instructions are needed to compute the layer output per element. Therefore, despite employing a conditionality that is usually associated with branching, the “neural branching” acts as an addition to the memory offset to relevant pointers in the framework. Therefore, in the UltraFastBERT framework, the instruction branch prediction is never fully engaged to facilitate the conditionality of the CMM, and only loads the relevant columns of the weight matrix individually. Furthermore, as the framework performs row-column dot products, the SIMD or single instruction multiple data vector parallel processing is still a good option to speed up the interference implementations for specific devices.

UltraFastBERT : Performance and Results

We will talk about the performance of the UltraFastBERT framework for fine-tuning as well as for interference tasks to analyze how the framework fares against state of the art language models.

Fine-Tuning Results

The following figure demonstrates the performance of various models on GLUE-dev test datasets. Here, N represents the number of neurons available to the frameworks for training, “Avg” represents the average score of all tasks.

As it can be clearly seen, the UltraFastBERT framework that has been trained on the A6000 GPU for over 24 hours manages to retain almost 96% of the predictive performance on GLUE downstream tasks when compared to the original BERT framework. Furthermore, it can also be seen that with an increase in the depth of the fast feedforward networks, the performance of the frameworks witness a decline, although the majority of performance degradation occurs only for the CoLa task. If the CoLa task is disregarded for a while, the UltraFastBERT framework returns a predictive performance score of about 98.6%.

Interference Results

In this section, we will compare the performance of several feedforward or fast feedforward networks on interference implementations, and these implementations are spread across three levels.

  1. In Level 1 implementation, the implementation is constructed using BLAS Level 1 routines namely scalar-vector product, and vector-vector dot products.
  2. In Level 2, the implementations make use of BLAS Level 2 routines namely batched scalar-vector product, and batched matrix-vector dot products.
  3. In Level 3, the implementations employ the non-batched BLAS Level 3 matrix-matrix multiplication approach, and although it is the fastest implementation available for feedforward networks, such implementations are not available for fast feedforward networks because the library does not support the vector-level sparsity of the Computational Matrix Multiplication.

Additionally, the UltraFastBERT framework deploys GPU implementations by using either custom CUDA or PyTorch kernels.

The above table, compares the performance of the UltraFastBERT framework with its predecessors, the BERT-based frameworks in terms of feedforward and fast feedforward layers where every column contains the relative inference Fast Feedforward over Feedforward implementation speedups when they are making use of the same linear-algebraic routine primitives.

However, it is worth noting that the speedups reported in the above table are meant for “fair comparisons” i.e both the fast feedforward and feedforward implementations make use of identical linear-algebraic routine primitive operations. Furthermore, on Level 1 and Level 2, the implementations of the fast feedforward networks are capable of performing the interference 48x and 78x quicker than the quickest feedforward implementation respectively.

Final Thoughts

In this article, we have talked about the UltraFastBERT, a variant of the BERT framework, builds on the concept that feedforward layers do not utilize 100% of the neurons available to them to generate output for every input at interference time which leads to wastage of resources that increases complexity, computation time, and computational costs, and replaces feedforward layers with faster feedforward networks in its architecture that ultimately results in the UltraFastBERT framework utilizing only 0.3% of the available neurons while delivering results comparable to BERT models with a similar size and training process, especially on the downstream tasks.

Due to its design implementations, the intermediate layers in UltraFastBERT framework are exponentially faster. Furthermore, the strong performance delivered by the UltraFastBERT framework is a proof that LLMs can deliver strong performance by engaging only a fraction of their parameters for individual interferences, as the UltraFastBERT framework utilizes only 0.3% of the available neurons during interference, and yet manages to achieve 78x speedup over interference times.

Google’s NotebookLM gets a dozen new features, including a Gemini Pro upgrade

Notes on a laptop

In July, Google launched NotebookLM, a notetaking software with a large language model at its core to act as a personalized AI collaborator for all of your notetaking needs. Now, Google is expanding the platform's offerings with a dozen new features, including an upgrade to Google's most advanced large language model (LLM).

On Friday, Google announced that NotebookLM is starting to use Gemini Pro, a version of Google's most advanced LLM, Gemini, which was released earlier this week.

Also: AI in 2023: A year of breakthroughs that left no human thing unchanged

Gemini was released in three different sizes, Ultra, Pro, and Nano, to make Gemini suitable for different tasks. Gemini Pro is being used for NotebookLM because it is the best size for scaling across a wide range of tasks, according to Google.

Alongside the LLM upgrade, Google added a dozen new features to optimize the platform based on early tester feedback, with the most notable being a new noteboard space, suggested actions, and formats for different writing projects.

The new noteboard space gives users a place to save excerpts from their sources, their own written notes, or interesting exchanges with NotebookLM, such as quotes from chats.

The suggested actions feature dynamically suggests actions to users based on what they are currently doing. For example, if a user selects a passage while reading a source, NotebookLM will automatically offer to summarize the text, according to Google.

NotebookLM also has new formats that can be used to transform user's notes into structured documents. All the user has to do is select a set of notes and ask NotebookLM to create something new. To make the process even easier, the platform will automatically suggest formats such as "create a study guide" or "create an outline."

Also: Two breakthroughs made 2023 tech's most innovative year in over a decade

For the full list of the new features, you can visit the latest release notes here.

Since July, NotebookLM has been available as an experimental product in Labs, and now Google is expanding its availability to users in the US ages 18 and up. If you want to try it for yourself, you can visit NotebookLM or Google Labs to get started.

Artificial Intelligence

X’s AI chatbot Grok now ‘rolled out to all’ US Premium+ subscribers, English language users are next

X’s AI chatbot Grok now ‘rolled out to all’ US Premium+ subscribers, English language users are next Sarah Perez @sarahintampa / 8 hours

Yesterday, X began rolling out Grok, the “rebellious” AI chatbot developed by Elon Musk’s xAI startup, to Premium+ subscribers on X’s platform. Today, Musk says that Grok’s rollout to all U.S. Premium+ subscribers is now complete, but cautioned that the beta would face many issues, though it would be steadily improved. He also offered a timeframe for Grok reaching other markets beyond the U.S., noting that all English language users (who subscribe to Premium+) would gain access to Grok in “about a week or so.”

Japanese users, which is X’s second-largest user base, would then follow with the aim of bringing Grok to “hopefully” all languages by “early 2024,” the X owner said.

Grok AI (beta) is now rolled out to all 𝕏 Premium+ subscribers in the US.

There will be many issues at first, but expect rapid improvement almost every day. Your feedback is much appreciated.

Will expand to all English language users in about a week or so. Japanese is next…

— Elon Musk (@elonmusk) December 8, 2023

Of course, Musk’s timeframes for when things will happen don’t always come to pass — just ask any longtime Tesla watcher who spent time waiting on full self-driving (FSD). However, with Grok, Musk has only been a little behind in terms of his launch estimates. On November 22nd, for instance, Musk said that xAI’s Grok would launch to Premium+ subscribers “next week,” which would have been the week of Nov. 3-Dec.5, but the chatbot actually launched this week instead, on Dec. 7.

Whether or not the chatbot is a success in driving subscription revenue for X still remains to be seen. For now, Grok is only part of X’s top-tier subscription offering — the $16 per month Premium+. That’s much pricier than X’s Basic ($3/mo) and Premium ($8/mo) options, and it’s not clear it will appeal to casual AI dabblers who can use rival chatbots like ChatGPT or Google’s Bard for free.

The Premium+ subscription comes with access to other features to broaden its appeal, including the benefit of seeing no advertisements in the For You and Following timelines on X. Premium+ users also have their replies boosted the most, in addition to all of Premium’s features like ads revenue sharing for creators, ID verification, a verified checkmark, access to Media Studio, and more.

But so far, subscriptions haven’t driven the majority of X, formerly Twitter’s revenue — advertising has.

Yet now, it’s unclear what X’s ad-supported future may look like as Musk has been alienating X’s advertisers — even telling them to “fuck yourself” for leaving X over concerns of antisemitic content on the site. For X to be sustainable, it may need a larger number of users to subscribe to Premium+ for Grok to help shore up the loss of ad dollars as brands like Apple, Disney, IBM, Paramount, Walmart, and others flee the platform.

Of note, X had its largest-ever month for subscription revenue in November, pulling in $6.2 million in net revenue, after app store fees, according to one estimate by app intelligence provider Apptopia. However, that’s still less than a third of what Snapchat made for its own in-app subscription, which topped $20 million for the first time last month.

In other words, there’s still plenty of room for X to grow subscribers, given it reportedly has more than 500 million monthly active users. Whether or not it can, of course, is another story.

Revolutionizing Healthcare: Exploring the Impact and Future of Large Language Models in Medicine

Large Language Models in Medicine

The integration and application of large language models (LLMs) in medicine and healthcare has been a topic of significant interest and development.

As noted in the Healthcare Information Management and Systems Society global conference and other notable events, companies like Google are leading the charge in exploring the potential of generative AI within healthcare. Their initiatives, such as Med-PaLM 2, highlight the evolving landscape of AI-driven healthcare solutions, particularly in areas like diagnostics, patient care, and administrative efficiency.

Google's Med-PaLM 2, a pioneering LLM in the healthcare domain, has demonstrated impressive capabilities, notably achieving an “expert” level in U.S. Medical Licensing Examination-style questions. This model, and others like it, promise to revolutionize the way healthcare professionals access and utilize information, potentially enhancing diagnostic accuracy and patient care efficiency.

However, alongside these advancements, concerns about the practicality and safety of these technologies in clinical settings have been raised. For instance, the reliance on vast internet data sources for model training, while beneficial in some contexts, may not always be appropriate or reliable for medical purposes. As Nigam Shah, PhD, MBBS, Chief Data Scientist for Stanford Health Care, points out, the crucial questions to ask are about the performance of these models in real-world medical settings and their actual impact on patient care and healthcare efficiency.

Dr. Shah's perspective underscores the need for a more tailored approach to utilizing LLMs in medicine. Instead of general-purpose models trained on broad internet data, he suggests a more focused strategy where models are trained on specific, relevant medical data. This approach resembles training a medical intern – providing them with specific tasks, supervising their performance, and gradually allowing for more autonomy as they demonstrate competence.

In line with this, the development of Meditron by EPFL researchers presents an interesting advancement in the field. Meditron, an open-source LLM specifically tailored for medical applications, represents a significant step forward. Trained on curated medical data from reputable sources like PubMed and clinical guidelines, Meditron offers a more focused and potentially more reliable tool for medical practitioners. Its open-source nature not only promotes transparency and collaboration but also allows for continuous improvement and stress testing by the wider research community.

MEDITRON-70B-achieves-an-accuracy-of-70.2-on-USMLE-style-questions-in-the-MedQA-4-options-dataset

MEDITRON-70B-achieves-an-accuracy-of-70.2-on-USMLE-style-questions-in-the-MedQA-4-options-dataset

The development of tools like Meditron, Med-PaLM 2, and others reflects a growing recognition of the unique requirements of the healthcare sector when it comes to AI applications. The emphasis on training these models on relevant, high-quality medical data, and ensuring their safety and reliability in clinical settings, is very crucial.

Moreover, the inclusion of diverse datasets, such as those from humanitarian contexts like the International Committee of the Red Cross, demonstrates a sensitivity to the varied needs and challenges in global healthcare. This approach aligns with the broader mission of many AI research centers, which aim to create AI tools that are not only technologically advanced but also socially responsible and beneficial.

The paper titled “Large language models encode clinical knowledge” recently published in Nature, explores how large language models (LLMs) can be effectively utilized in clinical settings. The research presents groundbreaking insights and methodologies, shedding light on the capabilities and limitations of LLMs in the medical domain.

The medical domain is characterized by its complexity, with a vast array of symptoms, diseases, and treatments that are constantly evolving. LLMs must not only understand this complexity but also keep up with the latest medical knowledge and guidelines.

The core of this research revolves around a newly curated benchmark called MultiMedQA. This benchmark amalgamates six existing medical question-answering datasets with a new dataset, HealthSearchQA, which comprises medical questions frequently searched online. This comprehensive approach aims to evaluate LLMs across various dimensions, including factuality, comprehension, reasoning, possible harm, and bias, thereby addressing the limitations of previous automated evaluations that relied on limited benchmarks.

MultiMedQA, a benchmark for answering medical questions spanning medical exam

MultiMedQA, a benchmark for answering medical questions spanning medical exam

Key to the study is the evaluation of the Pathways Language Model (PaLM), a 540-billion parameter LLM, and its instruction-tuned variant, Flan-PaLM, on the MultiMedQA. Remarkably, Flan-PaLM achieves state-of-the-art accuracy on all the multiple-choice datasets within MultiMedQA, including a 67.6% accuracy on MedQA, which comprises US Medical Licensing Exam-style questions. This performance marks a significant improvement over previous models, surpassing the prior state of the art by more than 17%.

MedQA

The MedQA dataset3 features questions styled after the USMLE, each with four or five answer options. It includes a development set with 11,450 questions and a test set comprising 1,273 questions.

Format: question and answer (Q + A), multiple choice, open domain.

Example question: A 65-year-old man with hypertension comes to the physician for a routine health maintenance examination. Current medications include atenolol, lisinopril, and atorvastatin. His pulse is 86 min−1, respirations are 18 min−1, and blood pressure is 145/95 mmHg. Cardiac examination reveals end diastolic murmur. Which of the following is the most likely cause of this physical examination?

Answers (correct answer in bold): (A) Decreased compliance of the left ventricle, (B) Myxomatous degeneration of the mitral valve (C) Inflammation of the pericardium (D) Dilation of the aortic root (E) Thickening of the mitral valve leaflets.

The study also identifies critical gaps in the model's performance, especially in answering consumer medical questions. To address these issues, the researchers introduce a method known as instruction prompt tuning. This technique efficiently aligns LLMs to new domains using a few exemplars, resulting in the creation of Med-PaLM. The Med-PaLM model, though it performs encouragingly and shows improvement in comprehension, knowledge recall, and reasoning, still falls short compared to clinicians.

A notable aspect of this research is the detailed human evaluation framework. This framework assesses the models' answers for agreement with scientific consensus and potential harmful outcomes. For instance, while only 61.9% of Flan-PaLM’s long-form answers aligned with scientific consensus, this figure rose to 92.6% for Med-PaLM, comparable to clinician-generated answers. Similarly, the potential for harmful outcomes was significantly reduced in Med-PaLM's responses compared to Flan-PaLM.

The human evaluation of Med-PaLM's responses highlighted its proficiency in several areas, aligning closely with clinician-generated answers. This underscores Med-PaLM’s potential as a supportive tool in clinical settings.

The research discussed above delves into the intricacies of enhancing Large Language Models (LLMs) for medical applications. The techniques and observations from this study can be generalized to improve LLM capabilities across various domains. Let's explore these key aspects:

Instruction Tuning Improves Performance

  • Generalized Application: Instruction tuning, which involves fine-tuning LLMs with specific instructions or guidelines, has shown to significantly improve performance across various domains. This technique could be applied to other fields such as legal, financial, or educational domains to enhance the accuracy and relevance of LLM outputs.

Scaling Model Size

  • Broader Implications: The observation that scaling the model size improves performance is not limited to medical question answering. Larger models, with more parameters, have the capacity to process and generate more nuanced and complex responses. This scaling can be beneficial in domains like customer service, creative writing, and technical support, where nuanced understanding and response generation are crucial.

Chain of Thought (COT) Prompting

  • Diverse Domains Utilization: The use of COT prompting, although not always improving performance in medical datasets, can be valuable in other domains where complex problem-solving is required. For instance, in technical troubleshooting or complex decision-making scenarios, COT prompting can guide LLMs to process information step-by-step, leading to more accurate and reasoned outputs.

Self-Consistency for Enhanced Accuracy

  • Wider Applications: The technique of self-consistency, where multiple outputs are generated and the most consistent answer is selected, can significantly enhance performance in various fields. In domains like finance or legal where accuracy is paramount, this method can be used to cross-verify the generated outputs for higher reliability.

Uncertainty and Selective Prediction

  • Cross-Domain Relevance: Communicating uncertainty estimates is crucial in fields where misinformation can have serious consequences, like healthcare and law. Using LLMs' ability to express uncertainty and selectively defer predictions when confidence is low can be a crucial tool in these domains to prevent the dissemination of inaccurate information.

The real-world application of these models extends beyond answering questions. They can be used for patient education, assisting in diagnostic processes, and even in training medical students. However, their deployment must be carefully managed to avoid reliance on AI without proper human oversight.

As medical knowledge evolves, LLMs must also adapt and learn. This requires mechanisms for continuous learning and updating, ensuring that the models remain relevant and accurate over time.

Apple Offers Developers MLX Framework for Machine Learning

While mostly staying out of the generative AI competition, Apple has released an open source array framework on GitHub for building machine learning transformer language models and text generation AI on the company’s own silicon.

Jump to:

  • What is Apple’s MLX framework?
  • MLX is intended to be familiar to deep learning researchers
  • Apple’s place in the competitive AI landscape

What is Apple’s MLX framework?

MLX is a set of tools for developers who are building AI models, including transformer language model training, large-scale text generation, text fine-tuning, generating images and speech recognition on Apple silicon. Apple machine learning research scientist Awni Hannun announced the MLX machine learning framework on X (formerly Twitter) on Dec. 5.

SEE: Apple recommends users update to iOS 17.1.2, iPadOS 17.1.2 and macOS 14.1.2 due to zero-day vulnerabilities. (TechRepublic)

MLX uses Meta’s LlaMA for text generation and low-rank adoption for text generation. MLX’s image generation is based on Stability AI’s Stable Diffusion, while MLX’s speech recognition hooks up to OpenAI’s Whisper.

MLX is intended to be familiar to deep learning researchers

MLX was inspired by NumPy, PyTorch, Jax and ArrayFire, but unlike its inspirations it is intended to keep arrays in shared memory, according to the MLX page on GitHub. Currently supported devices, which are CPUs and GPUs for now, can run MLX on-device without creating data copies.

MLX’s Python AI should be familiar to developers who already know how to use NumPy, the Apple team said on GitHub; developers can use MLX through a C++ API that mirrors the Python API. Other APIs similar to those used in PyTorch aim to simplify building complex machine learning models. Composable function transformations are built in, Apple said, meaning differentiation, vectorization and computation graph optimization can be done automatically. Computations in MLX are lazy as opposed to eager, meaning arrays only materialize when needed. Apple claims computation graphing and debugging are “simple and intuitive.”

“The framework is intended to be user-friendly, but still efficient to train and deploy models,” the Apple developers wrote on GitHub. “The design of the framework itself is also conceptually simple. We intend to make it easy for researchers to extend and improve MLX with the goal of quickly exploring new ideas.”

NVIDIA AI research scientist Jim Fan wrote on LinkedIn on Dec. 6.: “The release did an excellent job on designing an API familiar to the deep learning audience, and showing minimalistic examples on OSS models that most people care about: Llama, LoRA, Stable Diffusion, and Whisper.”

Apple’s place in the competitive AI landscape

Apple – which has had its artificial intelligence assistant Siri since well before the generative AI craze – seems to be focused on the tools to make large language models instead of producing the models themselves and the chatbots that can be built with them. However, Bloomberg’s Mark Gurman reported on Oct. 22, 2023 that “…Apple executives were caught off guard by the industry’s sudden AI fever and have been scrambling since late last year to make up for lost time,” and that Apple is working on upcoming generative AI features for iOS and Siri. Compare Apple to Google, which recently released its powerful Gemini large language model on the Pixel 8 Pro and in the Bard conversational AI. Google is still lagging behind its rival OpenAI in terms of widespread generative AI functionality.

Note: TechRepublic has reached out to Apple for more information about MLX. This article will be updated with more information based on Apple’s response.

5 Super Cheat Sheets to Master Data Science

5 Super Cheat Sheets to Master Data Science
Image by Author

Data science is a vast field, combining elements of statistics, machine learning, and data analysis. To navigate this complex domain, having a set of handy cheat sheets can be immensely helpful.

The cheat sheets can also serve as a valuable resource for preparing for technical interviews, reviewing key concepts, and providing an overview for beginners starting their careers in data science.

Here are five super cheat sheets that every data science professional and enthusiast should have:

1. Data Science Max Pro Cheat Sheet

Link: Data-Science-Cheatsheet/data-science-cheatsheet.pdf

5 Super Cheat Sheets to Master Data Science

This comprehensive 9-page reference covers the basics of probability, statistics, statistical learning, machine learning, big data frameworks, and SQL. Ideal for those with a basic understanding of statistics and linear algebra, it's a great starting point for anyone diving into data science.

2. Probability and Statistics Cheat Sheet by Stanford

Link: CME 106 (stanford.edu)

5 Super Cheat Sheets to Master Data Science

This cheat sheet is a concise summary of key concepts in probability and statistics. It includes topics like random samples, estimators, the Central Limit Theorem, confidence intervals, hypothesis testing, regression analysis, correlation coefficients, and more. It's perfect for understanding the foundational statistical concepts that are crucial in data science.

3. Data Science Cheat Sheet 2.0

Link: aaronwangy/Data-Science-Cheatsheet

5 Super Cheat Sheets to Master Data Science

This cheat sheet is a condensed version of data science knowledge, encompassing over a semester's worth of introductory machine learning based on MIT's Machine Learning courses 6.867 and 15.072. It covers topics such as linear and logistic regression, decision trees, SVM, K-Nearest Neighbors, and more. The cheat sheet is a valuable resource for exam reviews, interview preparation, and a quick refresher on key machine learning concepts.

4. Super Machine Learning Cheat Sheet

Link: afshinea/stanford-cs-229-machine-learning

5 Super Cheat Sheets to Master Data Science

This cheat sheet summarizes the key concepts covered in Stanford's CS 229 Machine Learning course. It includes refreshers on related topics (Probabilities and Statistics, Algebra, and Calculus), detailed cheat sheets for each machine learning field, and an ultimate compilation of important concepts. It's an essential resource for anyone interested in delving deeper into machine learning. It's designed for experts and provides a quick reference for basic concepts.

5. Super Deep Learning Cheat Sheet

Link: afshinea/stanford-cs-230-deep-learning

5 Super Cheat Sheets to Master Data Science

If you're interested in deep learning, the CS 230 course from Stanford has an excellent collection of materials that cover everything you need to know about convolutional neural networks and recurrent neural networks and offers tips for training deep learning models. This resource is invaluable for anyone focusing on the deep learning aspect of data science, and it is FREE.

Conclusion

These cheat sheets offer a concise and effective way to review and strengthen your understanding across data science disciplines. From the basics of statistics to the intricacies of machine learning and deep learning, these resources are invaluable for students, professionals, and enthusiasts alike. Refer to them often to solidify foundational concepts or brush up on the latest methodologies.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

More On This Topic

  • 7 Super Cheat Sheets You Need To Ace Machine Learning Interview
  • KDnuggets™ News 22:n06, Feb 9: Data Science Programming…
  • KDnuggets News, June 8: 21 Cheat Sheets for Data Science…
  • The Complete Collection of Data Science Cheat Sheets — Part 1
  • The Complete Collection of Data Science Cheat Sheets — Part 2
  • 21 Must-Have Cheat Sheets for Data Science Interviews: Unlocking…

How Redis Finds Moat in the Indian Market

India boasts a staggering Redis user base of approximately 12 million downloads per day, securing its position as the third-largest adopter on the planet, closely trailing behind the technology giants, the United States and China.

Amid this expansive user base, Bengaluru (rightly referred to as the Silicon Valley of India) emerges as a distinguished hub for Redis adoption with the highest adoption rate globally, followed by Mumbai and more.

The recent surge in Redis adoption across India has been the result of the boom of generative AI. The country actively engages in advancing its capabilities in this domain, recognising the pivotal role that the database plays in shaping these foundational models.

During Redis’ cofounder and chief tech officer Yiftach Shoolman’s recent visit to India, AIM caught up with the visionary tech veteran to understand the application of Redis by Indian companies, shedding light on the future plans of the Redis team, addressing the prevailing AI hype, and more.

How Indian Firms are Using Redis

“Every month, half a million people use Redis Enterprise in India alone,” Shoolman told AIM.

Adding on to what he said, Dishank Nagpal, country manager head of Redis further explained how a diverse set of companies, ranging from media to trading are leveraging Redis with some case studies. Groww, Purplle, Apna, AngelOne, and Zee Entertainment are some of its customers.

For example, job search platform Apna which caters to tier-two and tier-three markets manages the creation and storage of job seekers’ profiles and provides real-time feeds of relevant job opportunities based on factors like location, experience, and availability with Redis. This real-time approach has proven effective in connecting blue-collar workers with suitable employment, addressing the challenge of finding relevant jobs in these markets.

Many blue-collar workers lack internet experience but possess specific skills. So using Redis, Apna has developed an AI model that automatically generates sentences describing the abilities of these workers, aiding in articulating the capabilities of first-time internet users in a fully automated manner, leveraging Redis for search, personalisation, scalability, and augmentation.

Additionally, another customer, Zee Entertainment, a media house, faced significant challenges operating across five regions. They employed Redis Enterprise’s active-active solution for distributed caching, allowing updates made in one region to be automatically reflected in the other five. This not only streamlined operational efficiency but also resulted in a substantial cost saving of around 70%. Redis Enterprise’s capabilities proved essential for Zee in overcoming manual tasks and achieving a seamless operation across diverse regions.

Redis is also working with a Mumbai-based full-service retail brokerage firm. The company’s system encountered delays while accessing their investment portfolio, around 480 milliseconds to a second for each share, disrupting the seamless experience of exploring and making decisions on shares.

By transitioning from open source to Redis Enterprise, they experienced a remarkable improvement, reducing latency from 480 milliseconds to just 20 milliseconds, even with a billion operations per second. This transformation showcases the tangible benefits and performance enhancements achieved by adopting Redis Enterprise for critical applications.

“This issue showed how important real-time solutions are, emphasising the contrast between a sluggish, reminiscent-of-the-80s experience and the efficient, responsive user experiences facilitated by Redis,” commented Shoolman.

Backbone of Generative AI

Databases form the backbone of large language models. “We have been working with vector databases even before generative AI came into action,” Shoolman noted.

Redis is not only providing real-time data to fuel the generative AI wave but has also collaborated with LangChain to release OpenGPT, an open-source model that offers a flexible approach to generative AI, allowing users to select models, control data retrieval, and manage data storage.

“If you observe, GPT has a somewhat limited window function. While it’s fantastic to work with, the accuracy of its output may be less than optimal, depending on the task,” said Shoolman.

And this is what OpenGPT is trying to solve. It allows for the selection of different models, extending beyond the confines of GPT. Furthermore, OpenGPT facilitates interaction with data across multiple domains.

“LangChain is known for its flexibility but requires a higher skill level from developers. It goes beyond simple file uploads, allowing developers to decide on the structure of knowledge bases, whether they are based on general information or user prompts,” he added. Developers can create their knowledge bases of prompts and add relevant content to requests associated with these prompts.

What’s Next for Redis

Shoolman believes the future appears positive and promising for the company.

“Examining trends impacting real-time aspects such as network agility, device performance evaluation, three-dimensional interfaces, immersive experiences, and the pervasive influence of AI, it’s evident that the factors driving the need for a real-time database are converging rapidly,” he concluded.

Read more: How Redis is Fueling the Generative AI Wave

The post How Redis Finds Moat in the Indian Market appeared first on Analytics India Magazine.