NVIDIA’s AI Supremacy is All About CUDA

NVIDIA’s AI Supremacy is All About CUDA

By now, it is clear that no matter who wins the AI race, the biggest profiteer is NVIDIA. It’s common knowledge that the company is a market leader in the hardware category with its GPUs being widely used by all AI-focused companies in the world. That’s not all. NVIDIA, the biggest chip company in the world, is leading the battle from the software side of things as well, with its CUDA (Computing Unified Device Architecture) software.

CUDA, in essence, is like the magic wand that connects software to NVIDIA GPUs. It’s the handshake that enables your AI algorithms to work with the computing power of these graphical beasts. But to NVIDIA’s advantage, CUDA isn’t just any ordinary enchantment, but a closed-source, low-level API that wraps the software around NVIDIA’s GPUs, creating an ecosystem for parallel computing. It’s so potent that even the most formidable competitors such as AMD and Intel struggle to match its finesse.

While other contenders such as Intel and AMD attempt to juggle one or the other, NVIDIA has mastered the art of both. Their GPUs are sleek, powerful, and coveted – and it’s no coincidence that they’ve also laid down the foundations of software that make the most of these machines.

Software companies can’t just waltz in and claim the crown to replace NVIDIA, they lack the hardware prowess. On the flip side, hardware manufacturers can’t wade into the software territory without struggling. This has made CUDA the winning ingredient for NVIDIA in AI.

Undisputed but vulnerable

NVIDIA built CUDA in 2006 with parallel computing for processing on multiple GPUs simultaneously. Earlier, developers were using models like Microsoft’s Direct3D or Linux’s OpenGL for computational purposes on GPUs, but lacked parallel computing capabilities. After the launch of CUDA, businesses began tailoring their strategies to adopt the software. OpenCL by Khronos Group was the only potential competitor released in 2009. But by then all companies had already started leveraging CUDA, leaving no room or need for it.

NVIDIA’s current strategy sounds all great, but there are some major drawbacks in it as well. Though CUDA is a moat for NVIDIA, the company’s pursuit of an upmarket strategy, focusing on high-priced data centre offerings, might let other companies be able to catch up with their software.

Moreover, the market is rife with a GPU shortage that feels almost mythical, but a few are willing to forsake NVIDIA’s wares for alternatives like AMD or Intel. It’s almost as if tech aficionados would rather gnaw on cardboard than consider a GPU from another company.

NVIDIA’s maintenance of its current dominance is rooted in removing the RAM constraints within its consumer grade GPUs. This situation is likely to change as necessity drives the development of software that efficiently exploits consumer-grade GPUs, potentially aided by open-source solutions or offerings from competitors like AMD and Intel.

Both Intel and AMD stand a chance at challenging NVIDIA’s supremacy, provided they shift away from mimicking NVIDIA’s high-end approach and instead focus on delivering potent, yet cost-effective GPUs, and build open source solutions. Crucially, they should differentiate themselves by avoiding artificial constraints that limit GPU capabilities, which NVIDIA employs to steer users towards their pricier data centre GPUs.

Even after these existing constraints, a lot of developers choose NVIDIA’s consumer grade GPUs over Intel or AMD for ML development. A lot of recent development in these smaller GPUs has led to people shifting to them for deploying models.

There is another competitor coming up

Interestingly, OpenAI’s Triton emerges as a disruptive force against NVIDIA’s closed-source stronghold with CUDA. Triton, taking Meta’s PyTorch 2.0 input via PyTorch Inductor, carves a path by sidestepping NVIDIA’s CUDA libraries and favouring open-source alternatives like CUTLASS.

While CUDA is an accelerated computing mainstay, Triton broadens the horizon. It bridges languages, enabling high-level ones to match the performance of lower-level counterparts. Triton’s legible kernels empower ML researchers, automating memory management and scheduling while proving invaluable for complex operations like Flash Attention.

Triton is currently only being powered on NVIDIA GPUs, the open-source reach might soon extend beyond, marking the advent of a shift. Numerous hardware vendors are set to join the Triton ecosystem, reducing the effort needed to compile for new hardware.

NVIDIA, with all its might, overlooked a critical aspect – usability. This oversight allowed OpenAI and Meta to craft a portable software stack for various hardware, questioning why NVIDIA didn’t simplify CUDA for ML researchers. The absence of their hand in initiatives like Flash Attention raises eyebrows.

NVIDIA has indeed had the upper hand when it comes to product supremacy. But let’s not underestimate the giants of tech. Cloud providers have rolled up their sleeves, designing their own chips that could give NVIDIA’s GPUs a run for their transistors.

Still, all of this is just wishful thinking as of now.

The post NVIDIA’s AI Supremacy is All About CUDA appeared first on Analytics India Magazine.

Top Posts July 31 – August 6: Forget ChatGPT, This New AI Assistant Is Leagues Ahead

Forget ChatGPT, This New AI Assistant Is Leagues Ahead and Will Change the Way You Work Forever
Most Popular Posts Last Week

  1. Forget ChatGPT, This New AI Assistant Is Leagues Ahead and Will Change the Way You Work Forever by Abid Ali Awan
  2. 3 Ways to Access GPT-4 for Free by Abid Ali Awan
  3. ChatGPT Code Interpreter: Do Data Science in Minutes by Natassha Selvaraj
  4. 7 Steps to Mastering Data Cleaning and Preprocessing Techniques by Eugenia Anello
  5. Introduction to Statistical Learning, Python Edition: Free Book by Bala Priya C

Most Popular Posts Past 30 Days

  1. Forget ChatGPT, This New AI Assistant Is Leagues Ahead and Will Change the Way You Work Forever by Abid Ali Awan
  2. ChatGPT Dethroned: How Claude Became the New AI Leader by Ignacio de Gregorio Noblejas
  3. 3 Ways to Access GPT-4 for Free by Abid Ali Awan
  4. Free From Google: Generative AI Learning Path by Eugenia Anello
  5. ChatGPT Code Interpreter: Do Data Science in Minutes by Natassha Selvaraj
  6. Why is DuckDB Getting Popular? by Abid Ali Awan
  7. 7 Best Platforms to Practice SQL by Bala Priya
  8. Introduction to Statistical Learning, Python Edition: Free Book by Bala Priya C
  9. 3 Ways to Access Claude AI for Free by Abid Ali Awan
  10. Falcon LLM: The New King of Open-Source LLMs by Nisha Arya

More On This Topic

  • Forget ChatGPT, This New AI Assistant Is Leagues Ahead and Will Change the…
  • Stop Doing this on ChatGPT and Get Ahead of the 99% of its Users
  • ChatGPT as a Python Programming Assistant
  • This Week in AI, August 7: Generative AI Comes to Jupyter & Stack Overflow…
  • Forget PIP, Conda, and requirements.txt! Use Poetry Instead And Thank Me…
  • Forget Telling Stories; Help People Navigate

Microsoft is expanding Bing AI to more browsers — but there’s a catch

Bing AI app

Microsoft's Bing AI is moving further beyond its roots. Previously available only in Microsoft Edge and the Bing mobile app, Bing Chat is heading toward more third-party browsers. In a blog post published on Monday, the company said that people will be able to experience "the new AI-powered Bing in third-party browsers on web and mobile soon."

Microsoft didn't reveal which specific browsers on desktop and mobile would support Bing AI. At this point, the chatbot is already available for select users of Chrome in Windows and Safari on MacOS.

Also: 7 ways you didn't know you can use Bing Chat and other AI chatbots

In my testing, Bing AI also worked in Chrome on iOS (but not iPadOS) and Android. The news could mean that it will expand to other browsers and platforms, such as Chrome on MacOS, Safari on iOS/iPadOS, and even Firefox across the board.

Of course, Microsoft still wants you to stick with its own browser rather than rely on the competition. As such, chatting with Bing AI in Edge offers certain advantages, including longer conversations and a history of your chats.

With Bing AI in Chrome, for example, you're restricted to five messages per chat compared to 30 in Edge. Some browsers, such as Safari, limit you to 2,000 characters per request as opposed to 4,000 in Edge. Plus, a popup window keeps appearing, prompting you to head to Edge to chat with Bing.

Also: How to use Microsoft Edge's integrated Bing AI Image Creator

Otherwise, Bing AI works the same in other browsers as it does in Edge and the Bing app. Choose a conversation style — More Creative, More Balanced, or More Precise. Enter your question or request. In response, Bing answers your question or creates content for you. You can then submit further queries about the same topic or start a new subject. Beyond generating text, Bing can also devise an image based on your description.

In its blog post, Microsoft also touted recent enhancements to Bing AI. With the visual search feature, you can add a photo or other image to your request and ask Bing to describe or interpret it or answer questions about it. Dark Mode now works with Bing Chat both on the website and in the Bing mobile app. There's even an enterprise version of Bing AI for large organizations that want to offer but also manage the use of the chatbot among their employees.

Artificial Intelligence

Google launches Project IDX, a new AI-enabled browser-based development environment

Google launches Project IDX, a new AI-enabled browser-based development environment Frederic Lardinois @fredericl / 9 hours

Google today announced the launch of Project IDX, its foray into offering an AI-enabled browser-based development environment for building full-stack web and multiplatform apps. It currently supports frameworks like Angular, Flutter, Next.js, React, Svelte and Vue, and languages like JavaScript and Dart, with support for Python, Go and others in the works.

Google did not build a new IDE (integrated development environment) when it created IDX. Instead, it is using Visual Studio Code — Open Source as the basis of its project. This surely allowed the team to focus on the integration with Codey, Google’s PaLM 2–based foundation model for programming tasks. Thanks to Codey, IDX supports smart code completion, a ChatGPT/Bard-like chatbot that can help developers with general coding questions as well as those related specifically to the code you are working on (including the ability to explain it) and the ability to add contextual code actions like “add comments.”

Image Credits: Google

“We spend a lot of time writing code, and recent advances in AI have created big opportunities to make that time more productive,” the IDX team explains in today’s announcement. “With Project IDX, we’re exploring how Google’s innovations in AI — including the Codey and PaLM 2 models powering Studio Bot in Android Studio, Duet in Google Cloud and more — can help you not only write code faster, but also write higher-quality code.”

As a cloud-based IDE, it’s no surprise that Project IDX integrates with Google’s own Firebase Hosting (and Google Cloud Functions) and allows developers to bring in existing code from the GitHub repository. Every workspace has access to a Linux-based VM (virtual machine) and, soon, embedded Android and iOS simulators right in the browser.

Image Credits: Google

I had a chance to test out Project IDX for a couple of days before today’s launch. The IDX chatbot works as expected but didn’t feel all that tightly coupled with the source code. It can’t directly manipulate the code, for example (which, to be fair, is also true for most of Google’s competitors), and it doesn’t seem to be aware of which code you have selected in the editor. It’s still very early days, though, and Google notes that the team is “just at the beginning of this journey” and plans to add new capabilities over time.

While GitHub’s Copilot, Amazon’s CodeWhisperer and others offer similar AI coding features, Google’s focus on full-stack development puts a slightly different twist on this theme. With Codespaces and AWS Cloud9, GitHub and Amazon also offer cloud-based development environments. In addition, Google offers its Cloud Code IDE plugins, which it could use to bring Codey to virtually every popular IDE as well. Project IDX makes for a nice sandbox for Google to show off some of its AI capabilities for coders, but it remains to be seen if it will turn into a full-blown IDE that developers will want to use for their projects.

Generative AI: The Idea Behind CHATGPT, Dall-E, Midjourney and More

Generative AI - Midjourney Prompt

The world of art, communication, and how we perceive reality is rapidly transforming. If we look back at the history of human innovation, we might consider the invention of the wheel or the discovery of electricity as monumental leaps. Today, a new revolution is taking place—bridging the divide between human creativity and machine computation. That is Generative AI.

Generative models have blurred the line between humans and machines. With the advent of models like GPT-4, which employs transformer modules, we have stepped closer to natural and context-rich language generation. These advances have fueled applications in document creation, chatbot dialogue systems, and even synthetic music composition.

Recent Big-Tech decisions underscore its significance. Microsoft is already discontinuing its Cortana app this month to prioritize newer Generative AI innovations, like Bing Chat. Apple has also dedicated a significant portion of its $22.6 billion R&D budget to generative AI, as indicated by CEO Tim Cook.

A New Era of Models: Generative Vs. Discriminative

The story of Generative AI is not only about its applications but fundamentally about its inner workings. In the artificial intelligence ecosystem, two models exist: discriminative and generative.

Discriminative models are what most people encounter in daily life. These algorithms take input data, such as a text or an image, and pair it with a target output, like a word translation or medical diagnosis. They're about mapping and prediction.

Generative models, on the other hand, are creators. They don't just interpret or predict; they generate new, complex outputs from vectors of numbers that often aren't even related to real-world values.

Generative AI Types: Text to Text, Text to Image (GPT, DALL-E, Midjourney)

The Technologies Behind Generative Models

Generative models owe their existence to deep neural networks, sophisticated structures designed to mimic the human brain's functionality. By capturing and processing multifaceted variations in data, these networks serve as the backbone of numerous generative models.

How do these generative models come to life? Usually, they are built with deep neural networks, optimized to capture the multifaceted variations in data. A prime example is the Generative Adversarial Network (GAN), where two neural networks, the generator, and the discriminator, compete and learn from each other in a unique teacher-student relationship. From paintings to style transfer, from music composition to game-playing, these models are evolving and expanding in ways previously unimaginable.

This doesn't stop with GANs. Variational Autoencoders (VAEs), are another pivotal player in the generative model field. VAEs stand out for their ability to create photorealistic images from seemingly random numbers. How? Processing these numbers through a latent vector gives birth to art that mirrors the complexities of human aesthetics.

Generative AI Types: Text to Text, Text to Image

Transformers & LLM

The paper “Attention Is All You Need” by Google Brain marked a shift in the way we think about text modeling. Instead of complex and sequential architectures like Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs), the Transformer model introduced the concept of attention, which essentially meant focusing on different parts of the input text depending on the context. One of the main benefits of this was the ease of parallelization. Unlike RNNs which process text sequentially, making them harder to scale, Transformers can process parts of the text simultaneously, making training faster and more efficient on large datasets.

Transformer-model architecture

In a long text, not every word or sentence you read has the same importance. Some parts demand more attention based on the context. This ability to shift our focus based on relevance is what the attention mechanism mimics.

To understand this, think of a sentence: “Unite AI Publish AI and Robotics news.” Now, predicting the next word requires an understanding of what matters most in the previous context. The term ‘Robotics' might suggest the next word could be related to a specific advancement or event in the robotics field, while ‘Publish' might indicate the following context might delve into a recent publication or article.

Self-Attention Mechanism explanation on a demmo sentence
Self-Attention Illustration

Attention mechanisms in Transformers are designed to achieve this selective focus. They gauge the importance of different parts of the input text and decide where to “look” when generating a response. This is a departure from older architectures like RNNs that tried to cram the essence of all input text into a single ‘state' or ‘memory'.

The workings of attention can be likened to a key-value retrieval system. In trying to predict the next word in a sentence, each preceding word offers a ‘key' suggesting its potential relevance, and based on how well these keys match the current context (or query), they contribute a ‘value' or weight to the prediction.

These advanced AI deep learning models have seamlessly integrated into various applications, from Google's search engine enhancements with BERT to GitHub’s Copilot, which harnesses the capability of Large Language Models (LLMs) to convert simple code snippets into fully functional source codes.

Large Language Models (LLMs) like GPT-4, Bard, and LLaMA, are colossal constructs designed to decipher and generate human language, code, and more. Their immense size, ranging from billions to trillions of parameters, is one of the defining features. These LLMs are fed with copious amounts of text data, enabling them to grasp the intricacies of human language. A striking characteristic of these models is their aptitude for “few-shot” learning. Unlike conventional models which need vast amounts of specific training data, LLMs can generalize from a very limited number of examples (or “shots”)

State of Large Language Models (LLMs) as of post-mid 2023

Model Name Developer Parameters Availability and Access Notable Features & Remarks
GPT-4 OpenAI 1.5 Trillion Not Open Source, API Access Only Impressive performance on a variety of tasks can process images and text, maximum input length 32,768 tokens
GPT-3 OpenAI 175 billion Not Open Source, API Access Only Demonstrated few-shot and zero-shot learning capabilities. Performs text completion in natural language.
BLOOM BigScience 176 billion Downloadable Model, Hosted API Available Multilingual LLM developed by global collaboration. Supports 13 programming languages.
LaMDA Google 173 billion Not Open Source, No API or Download Trained on dialogue could learn to talk about virtually anything
MT-NLG Nvidia/Microsoft 530 billion API Access by application Utilizes transformer-based Megatron architecture for various NLP tasks.
LLaMA Meta AI 7B to 65B) Downloadable by application Intended to democratize AI by offering access to those in research, government, and academia.

How Are LLMs Used?

LLMs can be used in multiple ways, including:

  1. Direct Utilization: Simply using a pre-trained LLM for text generation or processing. For instance, using GPT-4 to write a blog post without any additional fine-tuning.
  2. Fine-Tuning: Adapting a pre-trained LLM for a specific task, a method known as transfer learning. An example would be customizing T5 to generate summaries for documents in a specific industry.
  3. Information Retrieval: Using LLMs, such as BERT or GPT, as part of larger architectures to develop systems that can fetch and categorize information.
Generative AI ChatGPT Fine Tuning
ChatGPT Fine Tuning Architecture

Multi-head Attention: Why One When You Can Have Many?

However, relying on a single attention mechanism can be limiting. Different words or sequences in a text can have varied types of relevance or associations. This is where multi-head attention comes in. Instead of one set of attention weights, multi-head attention employs multiple sets, allowing the model to capture a richer variety of relationships in the input text. Each attention “head” can focus on different parts or aspects of the input, and their combined knowledge is used for the final prediction.

ChatGPT: The most Popular Generative AI Tool

Starting with GPT's inception in 2018, the model was essentially built on the foundation of 12 layers, 12 attention heads, and 120 million parameters, primarily trained on a dataset called BookCorpus. This was an impressive start, offering a glimpse into the future of language models.

GPT-2, unveiled in 2019, boasted a four-fold increase in layers and attention heads. Significantly, its parameter count skyrocketed to 1.5 billion. This enhanced version derived its training from WebText, a dataset enriched with 40GB of text from various Reddit links.

GPT-3, launched in May 2020 had 96 layers, 96 attention heads, and a massive parameter count of 175 billion. What set GPT-3 apart was its diverse training data, encompassing CommonCrawl, WebText, English Wikipedia, book corpora, and other sources, combining for a total of 570 GB.

The intricacies of ChatGPT's workings remain a closely-guarded secret. However, a process termed ‘reinforcement learning from human feedback' (RLHF) is known to be pivotal. Originating from an earlier ChatGPT project, this technique was instrumental in honing the GPT-3.5 model to be more aligned with written instructions.

ChatGPT's training comprises a three-tiered approach:

  1. Supervised fine-tuning: Involves curating human-written conversational inputs and outputs to refine the underlying GPT-3.5 model.
  2. Reward modeling: Humans rank various model outputs based on quality, helping train a reward model that scores each output considering the conversation's context.
  3. Reinforcement learning: The conversational context serves as a backdrop where the underlying model proposes a response. This response is assessed by the reward model, and the process is optimized using an algorithm named proximal policy optimization (PPO).

For those just dipping their toes into ChatGPT, a comprehensive starting guide can be found here. If you're looking to delve deeper into prompt engineering with ChatGPT, we also have an advanced guide that light on the latest and State of the Art prompt techniques, available at ‘ChatGPT & Advanced Prompt Engineering: Driving the AI Evolution‘.

Diffusion & Multimodal Models

While models like VAEs and GANs generate their outputs through a single pass, hence locked into whatever they produce, diffusion models have introduced the concept of ‘iterative refinement‘. Through this method, they circle back, refining mistakes from previous steps, and gradually producing a more polished result.

Central to diffusion models is the art of “corruption” and “refinement”. In their training phase, a typical image is progressively corrupted by adding varying levels of noise. This noisy version is then fed to the model, which attempts to ‘denoise' or ‘de-corrupt' it. Through multiple rounds of this, the model becomes adept at restoration, understanding both subtle and significant aberrations.

Generative AI - Midjourney Prompt
Image Generated from Midjourney

The process of generating new images post-training is intriguing. Starting with a completely randomized input, it's continuously refined using the model's predictions. The intent is to attain a pristine image with the minimum number of steps. Controlling the level of corruption is done through a “noise schedule”, a mechanism that governs how much noise is applied at different stages. A scheduler, as seen in libraries like “diffusers“, dictates the nature of these noisy renditions based on established algorithms.

An essential architectural backbone for many diffusion models is the UNet—a convolutional neural network tailored for tasks requiring outputs mirroring the spatial dimension of inputs. It's a blend of downsampling and upsampling layers, intricately connected to retain high-resolution data, pivotal for image-related outputs.

Delving deeper into the realm of generative models, OpenAI's DALL-E 2 emerges as a shining example of the fusion of textual and visual AI capabilities. It employs a three-tiered structure:

DALL-E 2 showcases a three-fold architecture:

  1. Text Encoder: It transforms the text prompt into a conceptual embedding within a latent space. This model doesn't start from ground zero. It leans on OpenAI's Contrastive Language–Image Pre-training (CLIP) dataset as its foundation. CLIP serves as a bridge between visual and textual data by learning visual concepts using natural language. Through a mechanism known as contrastive learning, it identifies and matches images with their corresponding textual descriptions.
  2. The Prior: The text embedding derived from the encoder is then converted into an image embedding. DALL-E 2 tested both autoregressive and diffusion methods for this task, with the latter showcasing superior results. Autoregressive models, as seen in Transformers and PixelCNN, generate outputs in sequences. On the other hand, diffusion models, like the one used in DALL-E 2, transform random noise into predicted image embeddings with the help of text embeddings.
  3. The Decoder: The climax of the process, this part generates the final visual output based on the text prompt and the image embedding from the prior phase. DALL.E 2's decoder owes its architecture to another model, GLIDE, which can also produce realistic images from textual cues.
Architecture of DALL-E model (diffusion multi model)
Simplified Architecture of DALL-E Model

Python users interested in Langchain should check out our detailed tutorial covering everything from the fundamentals to advanced techniques.

Applications of Generative AI

Textual Domains

Beginning with text, Generative AI has been fundamentally altered by chatbots like ChatGPT. Relying heavily on Natural Language Processing (NLP) and large language models (LLMs), these entities are empowered to perform tasks ranging from code generation and language translation to summarization and sentiment analysis. ChatGPT, for instance, has seen widespread adoption, becoming a staple for millions. This is further augmented by conversational AI platforms, grounded in LLMs like GPT-4, PaLM, and BLOOM, that effortlessly produce text, assist in programming, and even offer mathematical reasoning.

From a commercial perspective, these models are becoming invaluable. Businesses employ them for a myriad of operations, including risk management, inventory optimization, and forecasting demands. Some notable examples include Bing AI, Google's BARD, and ChatGPT API.

Art

The world of images has seen dramatic transformations with Generative AI, particularly since DALL-E 2's introduction in 2022. This technology, which can generate images from textual prompts, has both artistic and professional implications. For instance, midjourney has leveraged this tech to produce impressively realistic images. This recent post demystifies Midjourney in a detailed guide, elucidating both the platform and its prompt engineering intricacies. Furthermore, platforms like Alpaca AI and Photoroom AI utilize Generative AI for advanced image editing functionalities such as background removal, object deletion, and even face restoration.

Video Production

Video production, while still in its nascent stage in the realm of Generative AI, is showcasing promising advancements. Platforms like Imagen Video, Meta Make A Video, and Runway Gen-2 are pushing the boundaries of what's possible, even if truly realistic outputs are still on the horizon. These models offer substantial utility for creating digital human videos, with applications like Synthesia and SuperCreator leading the charge. Notably, Tavus AI offers a unique selling proposition by personalizing videos for individual audience members, a boon for businesses.

Code Creation

Coding, an indispensable aspect of our digital world, hasn’t remained untouched by Generative AI. Although ChatGPT is a favored tool, several other AI applications have been developed for coding purposes. These platforms, such as GitHub Copilot, Alphacode, and CodeComplete, serve as coding assistants and can even produce code from text prompts. What's intriguing is the adaptability of these tools. Codex, the driving force behind GitHub Copilot, can be tailored to an individual's coding style, underscoring the personalization potential of Generative AI.

Conclusion

Blending human creativity with machine computation, it has evolved into an invaluable tool, with platforms like ChatGPT and DALL-E 2 pushing the boundaries of what's conceivable. From crafting textual content to sculpting visual masterpieces, their applications are vast and varied.

As with any technology, ethical implications are paramount. While Generative AI promises boundless creativity, it's crucial to employ it responsibly, being aware of potential biases and the power of data manipulation.

With tools like ChatGPT becoming more accessible, now is the perfect time to test the waters and experiment. Whether you're an artist, coder, or tech enthusiast, the realm of Generative AI is rife with possibilities waiting to be explored. The revolution is not on the horizon; it's here and now. So, Dive in!

Nvidia Breaks Gaming Tradition With RTX, Turning GPUs Into AI Powerhouses

Nvidia Breaks Gaming Tradition With RTX, Turning GPUs Into AI Powerhouses August 8, 2023 by Agam Shah

Nvidia's RTX GPUs were largely known for gaming and graphics, but are being configured and repackaged for enthusiasts interested in trying out AI on desktops. The new GPUs are a part of Nvidia’s approach to make GPUs available wherever and whenever customers need them.

The company announced RTX GPUs, which can be used for AI inferencing and training. The GPUs are based on the Ada Lovelace architecture, which is different from the Hopper architecture used in the red-hot H100 GPUs that are in short supply.

Enthusiasts are already using GPUs on gaming laptops to run AI-powered applications, such as text-to-text or text-to-image models. At this week’s SIGGRAPH conference, Nvidia announced new desktop and workstation designs with RTX GPUs.

Computer makers including Dell, Lenovo and Boxx will announce workstations that can pack up to four RTX 6000 data generation in a chassis. Nvidia said the suggested retail price for the GPU was $6,000, though vendors such as Dell are selling it in excess of $9,000, including tax.

Each of the RTX 6000 GPUs, which are based on the Ada Lovelace design, has 48GB of GDDR6 memory and a 200Gbps network-interface card. The GPU draws 300 watts of power and is based on the older PCIe 4.0 interconnect standard.

Nvidia also announced the L40S Ada GPU, which is more like a poor man’s version of the H100, as it is faster than previous-generation A100 GPUs in AI training and inference. The new product is a variant of the L40 server GPU announced a year ago.

Nvidia's new L40S GPU. (Source: Nvidia)

The L40S also has 48GB GDDR6 memory and will be in systems based on the OVX reference server design for metaverse applications.

The L40S is up to four times faster for AI and graphics workloads over the previous generation A40 GPU, which is based on the previous generation Ampere architecture. The AI training is 1.7 times faster than the A100 GPU, and inference is 1.5 times faster. The L40S has faster clock speeds and more tensor and graphics rendering performance.

Nvidia’s RTX systems for enterprises are built for the metaverse and AI markets, and the new hardware will include licenses for the Omniverse and AI Enterprise software. The company also announced AI Enterprise 4.0, which will include the Nemo large-language model.

There should be no struggles acquiring the L40S GPU supplies, which will ship later this year.

"These will not be as constrained as we've been in some of our highest-end GPUs," said Bob Pette, vice president for Pro Visualization at Nvidia, during a press briefing.

Nvidia’s low-end RTX 4000 GPU will become available in September for $1,250. The RTX 4500 will be available for $2,250 starting in October.

AI is as important as gaming to Nvidia. The company wants to make GPUs a commodity on which enthusiasts can create their own programs, and then run where the closest GPU is available. Nvidia’s H100 GPUs are hard to find and have become an asset for companies. A startup called CoreWeave has put its Nvidia GPUs on collateral to fund its growth. Cryptocurrency miners are also repurposing their GPUs in data centers to run AI.

Related

AMD’s Security Flaws, Revenue Dips, Fuel Anxiety

A group of security researchers from Technical University (TU) Berlin have identified a vulnerability in the AMD-based Media Control Unit in modern Tesla vehicles, allowing them to unlock paid features and gain access to other subsystems. The researchers exploited a known flaw in the AMD processor that controls Tesla’s MCU. The attack targeted the third-generation MCU (MCU-Z), which is based on a custom AMD Ryzen SoC.

The researchers employed voltage fault injection (or voltage glitching) and attacked the AMD Ryzen SoC used in MCU-Z‘s Platform Security Processor. This attack granted them root permissions, enabling them to make persistent changes to the vehicle’s Linux system and decrypt data stored in the Trusted Platform Module (TPM). This access could potentially allow an attacker to unlock features that are typically locked behind paywalls, such as vehicle upgrades that Tesla offers for a fee.

One of the notable consequences of this exploit is that it’s considered “unpatchable,” meaning Tesla currently lacks a known solution to mitigate it. Furthermore, the exploit could extract a hardware-bound RSA key used for authenticating and authorizing a car within Tesla’s internal service network. This could potentially allow salvage-titled vehicles, which are not eligible for certain Tesla services due to damage, to access services like the Supercharging network.

It seems like AMD is in a whirlwind of issues, and it doesn’t just end there.

Smitten by Vulnerabilities

Recently, security researchers also discovered a new bug or vulnerability—’Zenbleed’ in AMD CPUs using the Zen 2 architecture. Unlike previous exploits, Zenbleed allows remote exploitation without physical hardware access. This poses a significant security risk, particularly for enterprises that use these chips. AMD has released an update for EPYC 7002 series chips, but a comprehensive firmware fix is pending.

Similar to Intel’s Meltdown and Spectre vulnerabilities, the exploit leverages CPU’s internal mechanisms to extract sensitive data, functioning Zenbleed’s impact could be substantial due to slow enterprise security update adoption. Addressing this requires changes in cybersecurity approaches.

While the vulnerability itself is common, the challenge lies in effectively deploying fixes, as prolonged exposure to such vulnerabilities can lead to more potent hacking strategies in the future.

Not just that, AMD discreetly disclosed 31 new CPU vulnerabilities through a January update, affecting both its consumer-oriented Ryzen chips and EPYC data centre processors. The company collaborated with researchers from Google, Apple, Oracle, and others in a coordinated disclosure to develop mitigations before public disclosure. These vulnerabilities, affecting Ryzen desktop, HEDT, Pro, and Mobile processors, can be exploited via BIOS manipulation or attacks on the AMD Secure Processor bootloader.

AMD has detailed AGESA revisions for OEMs to patch the vulnerabilities, with BIOS patches’ availability varying by the vendor. The vulnerabilities also impact EPYC processors, with four high-severity variants enabling arbitrary code execution and data integrity issues. With AMD gaining market share, scrutiny of its architectures for security gaps is increasing. Recent vulnerability disclosures include Meltdown-like variants, Hertzbleed, and Take A Way.

As the chipmaker looks to handle these issues, it also faces demands and stiff competition

Pursuit of NVIDIA

With Generative AI set to grow by 31% annually, reaching a substantial $152 billion market by 2032, AMD’s growth potential is contingent on capturing more market share. However, this requires adept product development amidst competition from NVIDIA, which is also ready to supply AI chips.

To bolster its standing, AMD is channelling resources into AI-related research and development, aiming to provide customers with specialised AI chips and tailored software solutions. Su envisions AI as a potent growth driver, expected to gain momentum with Microsoft and other major software providers incorporating generative AI into their offerings, thereby igniting demand for PCs.

In a bid to close the gap with NVIDIA, AMD plans to launch chips that directly compete with Nvidia’s offerings in the fourth quarter of 2023. Reuters reports that AMD aims to significantly ramp up production of its flagship MI300 artificial intelligence chips to challenge NVIDIA’s H100 chips.

Despite robust demand for its MI300 series chips, AMD faces hurdles in the Chinese market due to performance constraints imposed by export controls. These restrictions have led to the inability to sell MI300 chips in China. In contrast, NVIDIA and Intel have developed specialized chips to meet these performance limits. CEO Su reassured investors of AMD’s commitment to full compliance with US export controls.

In this evolving landscape, analysts offer diverse perspectives on AMD’s trajectory. Jenny Hardy, a portfolio manager at GP Bullhound, who holds both NVIDIA and AMD stocks, holds an optimistic view. Hardy believes that if AMD effectively ramps up production and launches MI300 chips in the fourth quarter, it could help address supply constraints, filling the gap left by the scarcity of NVIDIA chips.

Morningstar maintains a favourable outlook on AMD’s potential to effectively compete with NVIDIA in the Generative AI realm, holding a target stock price of $130 following the second-quarter earnings report.

Senior Director Brian Colello echoed similar optimism, pointing to the exponential growth in AI customer engagements as a promising indicator. He foresees a surge in AI graphics processing units from late 2023 into 2024, positioning AMD as a viable secondary source to Nvidia for GPUs in AI training and inference.

Revenue Plummets

Lately, AMD has experienced a phase of decline spanning the past two quarters. Not just that, a consistent downward trend has been observed over four consecutive quarters of declining earnings and a subsequent drop in sales due to reduced demand for traditional personal computers and servers.

Earlier this month, AMD announced results that outperformed expectations. However, these positive outcomes were juxtaposed against a decline in revenue compared to the previous year, alongside a less-than-stellar projection for the upcoming quarter.

Considering AMD’s recent financial performance, the figures for the second quarter reveal a revenue of $5.36 billion, indicating an 18% decrease from the prior year and a 19% decrease in profits, while still surpassing estimates by $50 million. Adjusted earnings per share for Q2 reached 58 cents, slightly exceeding expectations. Looking ahead to the third quarter, revenue guidance is anticipated to reach $5.7 billion, reflecting a 2.5% growth from the previous year. However, it falls short by $110 million according to analysts’ forecasts.

AMD’s growth prospects are concentrated in its client and data centre segments. CFO Jean Hu, in the second quarter 2023 earnings call transcript, anticipates a year-over-year revenue increase in the client segment and stable performance in the data centre segment. Sequentially, both segments are projected to experience double-digit growth, while the gaming and embedded segments might face declines.

As AMD tackles challenges and pursues its AI ambitions, its journey in the technology domain assumes critical significance for both the industry and investors.

The post AMD’s Security Flaws, Revenue Dips, Fuel Anxiety appeared first on Analytics India Magazine.

Chegg is getting a generative AI makeover just in time for back-to-school season

Chegg logo on a phone on a desk

ChatGPT's advanced abilities have the potential to shake up many industries, and online learning services are an early example of this. With a free, competent tool like ChatGPT that can help with homework and studying, why would students pay for an online learning service?

As a result, Chegg was a victim of ChatGPT, with its stock slicing in half in May. On Tuesday, Chegg announced its quarterly earnings report, which exceeded expectations and caused the company's shares to surge by more than 25%.

The company is trying to keep up that momentum by leveraging generative AI and announcing a partnership with Scale AI.

Also: Do you need a speech therapist? Now you can consult AI

Scale AI will develop proprietary large language models (LLMs) for the online learning company to help support the development of a personalized learning assistant. The AI company has partnered with other noteworthy companies in the past, including the US Department of Defense.

The new experience is set to combine Chegg's robust proprietary content with generative AI to provide students with a conversational interface and personalized learning, including practice tests, study guides, and flash cards.

Also: How will AI impact your industry? Pew Research has answers

"With Chegg's Large Language Models trained with our unique data sets, specifically for education, and with the help of our 150,000 subject matter experts, we will deliver a significantly enhanced and differentiated learning experience for students," said Nathan Schultz, Chief Operating Officer of Chegg.

The new Chegg experience is slated to roll out over the course of the next two semesters.

In April, the company revealed Cheggmate, which is meant to combine the power of GPT-4 with Chegg's content to create personalized learning experiences for users on the platform. The integration with Scale AI might be just what the company needs to continue its upward trajectory.

Artificial Intelligence

Nvidia teams up with Hugging Face to offer cloud-based AI training

Nvidia teams up with Hugging Face to offer cloud-based AI training Kyle Wiggers 8 hours

Nvidia is partnering with Hugging Face, the AI startup, to expand access to AI compute.

Timed to coincide with the annual SIGGRAPH conference this week, Nvidia announced that it’ll support a new Hugging Face service, called Training Cluster as a Service, to simplify the creation of new and custom generative AI models for the enterprise.

Set to roll out in the coming months, Training Cluster as a Service will be powered by DGX Cloud, Nvidia’s all-inclusive AI “supercomputer” in the cloud. DGX Cloud includes access to a cloud instance with eight Nvidia H100 or A100 GPUs and 640GB of GPU memory, as well as Nvidia’s AI Enterprise software to develop AI apps and large language models and consultations with Nvidia experts.

Companies could subscribe to DGX Cloud on its own — pricing starts at $36,999 per instance for a month. But Training Cluster as a Service integrates DGX Cloud infrastructure with Hugging Face’s platform of more than 250,000 models and over 50,000 data sets — a helpful startup point for any AI project.

“People around the world are making new connections and discoveries with generative AI tools, and we’re still only in the early days of this technology shift,” Hugging Face co-founder and CEO Clément Delangue said. “Our collaboration will bring Nvidia’s most advanced AI supercomputing to Hugging Face to enable companies to take their AI destiny into their own hands with open source to help the open-source community easily access the software and speed they need to contribute to what’s coming next.”

Hugging Face’s tie-up with Nvidia comes as the startup reportedly looks to raise fresh funds at a $4 billion valuation. Founded in 2014 by Delangue, Julien Chaumond and Thomas Wolf, Hugging Face has expanded rapidly over the past nearly-decade, evolving from a consumer app to a repository for all things related to AI models. Delangue claims that more than 15,000 organizations are using the platform today.

The collaboration makes sense for Nvidia, which in recent years has made bigger pushes into cloud services for training, experimenting with and running AI models as the demand for such services grows. Just in March, the company launched AI Foundations, a collection of components that developers can use to build custom generative AI models for particular use cases.

Tech market research firm Tractica forecasts that AI will account for as much as 50% of total public cloud services revenue by 2025. Demand is so high for AI cloud training infrastructure, in fact, that it’s causing hardware shortages, forcing cloud providers like Microsoft to curb investors’ expectations around growth.

After GTC, NVIDIA Rides The Generative AI Wave at SIGGRAPH

Amid the ongoing generative AI revolution, after the GTC 2023 in March, NVIDIA CEO Huang Jensen once again took to the stage in Los Angeles Convention Center at SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) to announce the future of interactive entertainment and technology.

During the hour-long keynote he revealed new Nvidia products and research focusing on generative AI, computer graphics, and the company’s role in OpenUSD developments. The organization was recently formed to standardize the open universal scene description language for building 3D-enabled products and services.

This year, ACM is celebrating the 50th SIGGRAPH and the program reflected on half a century of advancement and the current genAI wave. Here are all the major announcements and updates unveiled at the conference.

AI Workbench

The software leader unveiled NVIDIA AI Workbench, a unified workspace for developers to effortlessly sculpt and fine-tune pretrained generative AI models on a PC or workstation — then scale them to virtually any data center, public cloud or NVIDIA DGX Cloud.

The workspace removes the complexity of getting started with an enterprise AI project. Accessed through a simplified interface running on a local system. With Workbench, users can customise and run generative AI in just a few clicks. It also allows them to pull together all necessary enterprise-grade models, frameworks, SDKs and libraries from open-source repositories like Hugging Face, GitHub and NVIDIA NGC using custom data and the NVIDIA AI platform into a unified developer workspace.

NVIDIA Enterprise 4.0 Upgraded

Jensen also proudly introduced the latest iteration of its enterprise software platform, NVIDIA AI Enterprise 4.0. This release grants businesses access to the essential tools for seamlessly integrating generative AI. Simultaneously, it promises the security measures and API stability that underpin dependable production deployments.

New features like NVIDIA NeMo, NVIDIA Triton Management Service and NVIDIA Base Command Manager Essentials. NVIDIA AI Enterprise 4.0 will be integrated into partner marketplaces, including Google Cloud and Microsoft Azure. Furthermore, it finds a natural abode with NVIDIA cloud partner Oracle Cloud Infrastructure.

But the spotlight falls on NeMo — an application framework that helps companies curate their training datasets, build and customise large language models (LLMs), and run them in production on a grand scale. With organisations from Korea to Sweden using it to customise LLMs for their local languages, the framework will become the go-to solution for industries.

“Before NeMo, it took us four and a half months to build a new billion-parameter model. Now we can do it in 16 days — this is mind blowing,” CTO of Writer, Waseem Alshikh stated in the press release.

Hugging Face — Training Cluster as a Service

The collaboration hub for the machine learning community Hugging Face will be giving developers access to NVIDIA DGXTM Cloud AI supercomputing to train and tune advanced AI models. The collaboration will bring one-click access to NVIDIA’s multi-node AI supercomputing platform.

As part of the strategic alliance, HF will offer a new service — called Training Cluster as a Service. This avant garde service will simplify creating new and custom generative AI models for enterprises. Powered by NVIDIA DGX Cloud, the service will be available in the coming months.

Omniverse

The company boasted its major strides with releases announced of its NVIDIA Omniverse platform for developers to enhance 3D pipelines with the (OpenUSD Universal Scene Description) framework and generative AI. Cesium, Convai, Move AI, SideFX Houdini and Wonder Dynamics are now connected to Omniverse via the native software platform — OpenUSD.

NVIDIA’s vision extends beyond software as it is also collaborating with global systems manufacturers to forge RTX workstations optimally configured for Omniverse experience. Powered by NVIDIA RTXTM 6000 Ada Generation GPUs and fortified with NVIDIA AI Enterprise and OmniverseTM Enterprise software, these workstations are designed to develop and create content.

The release is now available in beta to download for free and coming soon to Omniverse Enterprise.

With the software ripple extending, the company also introduced three new desktop workstation Ada Generation GPUs—RTX 5000, RTX 4500, and RTX 4000—each bearing the signature NVIDIA prowess.

Within this panoramic overhaul, other updates have been made to Omniverse Kit, Omniverse Audio2Face, and Omniverse USD Composer. This follows the recent announcement of NVIDIA joining Pixar, Adobe, Apple and Autodesk to form the Alliance for OpenUSD.

Users can get early access to OpenUSD resourcess through the NVIDIA OpenUSD Developer Program.

Shutterstock x NVIDIA’s Picasso – A News Stroke

Building on the existing partnership the visual-content provider Shutterstock is providing features to a part of NVIDIA AI Foundation Models — Picasso to let artists enhance and light 3D scenes based on simple text or image prompts, all with AI models built using fully licensed, rights-reserved data.

The companies have introduced an additional layer for artists — 360 HDRi Video. This feature empowers artists to create and personalize environment maps as per their creative visions (even in panorama!).

Renewed GPU Prowess

The technology will provide professional visualization workflows, like real-time rendering, product design, and 3D content creation. With the fusion of OVXTM servers and the state-of-the-art L40S GPU, NVIDIA sets the stage for immersive experiences.

The post After GTC, NVIDIA Rides The Generative AI Wave at SIGGRAPH appeared first on Analytics India Magazine.