AI — Страница 1472

Google AI in Workspace Adds New Zero-Trust and Digital Sovereignty Controls

Man using both his laptop and mobile phone with Google on display. — Image: Urupong/Adobe Stock

At a Google Cloud press event on Tuesday, the company announced Google Cloud’s rollout over the course of this year of new AI-powered data security tools bringing zero-trust features to Workspace, Drive, Gmail and data sovereignty. The enhancements to Google Drive, Gmail, the company’s security tools for IT and security center teams and more are designed to help global companies keep their data under lock and encrypted key and security operators outrun advancing threats.

Jump to:

Google Cloud’s enhancements align with CISA’s zero-trust model
With zero-trust in mind, Google enhances data loss prevention and access
Google’s new sovereignty controls in Workspace
Google adds keys to data encryption
Adding AI to Google Cloud SOC support

Google Cloud’s enhancements align with CISA’s zero-trust model

The event was kicked off by Jeanette Manfra, senior director of global risk and compliance for Google Cloud and former assistant director for the Cybersecurity and Infrastructure Security Agency. Noting last year’s 38% increase in cyberattacks and an average $4.35 million cost to organizations due to data breaches, she said Google’s ambition behind many of its security innovations is to align capabilities with CISA’s Zero Trust Maturity Model.

“At Google, zero-trust is much more than a buzzword — it’s a core part of our organization,” said Manfra. “I’m a big fan of what CISA is trying to do. We are mapping our capabilities against that, including adding ways to improve how users classify and label data — specifically, using AI in Google Drive to do so automatically.”

SEE: At Black Hat, experts discuss the virtues of AI as a cybersecurity weapon (TechRepublic)

With zero-trust in mind, Google enhances data loss prevention and access

Google said the roster of improvements is designed to enhance security teams’ control over data loss prevention and context-aware access, capabilities that give security operations granular control of who and what digitally enters and leaves an organization. The improvements will also help organizations accelerate their zero-trust adoption and meet standards articulated in CISA’s Zero-Trust Maturity Model and other industry frameworks, according to the company.

Google AI for Google Drive

The focus of the new enhancements across Google Drive includes a slew of zero-trust aligned, AI-powered enhancements to its cloud-native architecture, according to Google, which said AI will drive automated data labeling and classification to defend against exfiltration attempts by threat actors.

In essence, administrators can use customizable confidentiality-preserving AI models to automatically classify and label new and existing files in Google Drive. Administrators can then apply granular data protection controls such as data loss prevention and context-aware access, which allow control over who can access an application depending on such factors as user location, IP address or their device (Figure A).

Figure A

Google AI-powered automatic data classification and labeling in Google Drive. Image: Google

Tim Ehrhart, domain lead, information security at pharma company Roche extolled the virtues of context-aware access, saying the granular controls CAA allows helped the company shift away from VPNs and office network connections. “Context-aware access has helped us manage our risks by not making access a binary choice, but allowing for more flexibility in access policies and allowing them to be applied to the right people, applications and data,” he said in a statement.

This new AI application for Google Drive is now available in preview.

Enforcing DLP controls in Google Drive

Google is also incorporating data loss prevention into Workspace, a feature that the company said will include the ability for admins to put guardrails around how someone shares data by enabling settings based on criteria such as device location and user security status. A user would only be able to share sensitive content on Google Drive if they met specific requirements. Google said the new capability provides more granular controls to help prevent unintended data loss (Figure B).

Figure B

Data loss prevention enhancements for Google Drive. Image: Google

Enhanced Data Loss Prevention for Workspace will be available later this year in preview.

Extending enhanced DLP controls to Gmail

Google said it will also extend data loss prevention to Gmail, letting administrators regulate data osmosis in and out of an organization based on the sensitivity of emails. This feature, already in Google Chat, Drive and Chrome, will be added to Gmail initially in preview later this year.

Google’s new sovereignty controls in Workspace

Google is also adding controls to Workspace that can provide a step change in attestable digital sovereignty with secure-by-default infrastructure, technical data access controls and industry certifications all in a single cloud instance.

Andy Wen, Google Cloud’s director of product for Workspace security and compliance, explained that the company’s digital sovereignty controls are enabling a nuanced approach to how organizations control the use of data they own, and how they tailor these priorities to meet such regulatory frameworks as the European General Data Protection Regulation, or GDPR. He said new sovereignty controls improve upon such tactics as data residency, when it comes to how an organization controls the movement of its information across borders.

SEE: On GDPR’s fifth birthday, experts lauded its successes (TechRepublic)

“By itself, data residency in a given country does not prevent unintended data transfer due to things like law enforcement requests,” Wen said. He added that if an organization is using on-premise solutions to prevent data transfer, it may inadvertently transfer data in, say, email notifications because of aspects of email content such as subject lines. “Customers implementing data transfer limitations might not realize this is happening and therefore are countermanding sovereignty.”

Google adds keys to data encryption

Among the announcements Google Cloud made at the press event was a new client-side encryption program that lets administrators thwart third-party access to sensitive data. The third parties include foreign governments and Google.

The involvement of security firms Thales, Stormshield and FlowCrypt speaks to the program’s focus on issues around securing transnational data flow from the peering eyes of threat actors, government entities and others. Google said CSE customers will be able to securely store their encryption keys with trusted partners in the country of their choice in order to make the local regulatory compliance process easier.

In June 2023, Google launched an open beta feature that allows individuals and organizations to log in to Workspace with public and private encrypted passkeys. This feature enhances identity access management for users.

Other encryption-focused enhancements Google Cloud said it is installing include the following.

Support for mobile apps in Google Calendar, Gmail and Meet. This is generally available.
The ability to set CSE as default for select organizational units. This will be available in preview later this year.
Guest-access support in Meet. This will be available in preview later this year.
Comments support in Docs. This will be available in preview later this year.
The ability for users to view, edit or convert Microsoft Excel files. This is available in preview.

“We started work on client-side encryption in 2021; today, we’re launching an expansion of coverage to our mobile apps for Gmail, Calendar and Meet so that our enterprise and public sector customers can get the benefit of CSE on-the-go instead of just their desktops,” said Wen. “It protects data by encrypting it browser to browser, so even Google doesn’t see the content. We think this is not only a great control for sovereignty but a helpful control for security.”

SEE: Google Cloud study sees risks in proliferating credentials (TechRepublic)

Adding AI to Google Cloud SOC support

Google Cloud spokespeople said the company will incorporate new and sometimes mandatory identity access management protocols into its Workspace tools for IT and security operations.

Google this year will phase in two-step verification for reseller administrator accounts and make 2SV mandatory for its biggest enterprise customers.
The company will, later this year, require multi-party approval for sensitive administrator actions such as changing a user’s 2SV settings.
AI-powered automated email filtering or forwarding to screen for potential phishing content. This is available in preview.
The ability for Workspace administrators to export Workspace logs into Google’s Chronicle SIEM, using AI to identify anomalies and help improve their response time to threats. This is available in preview.

“Most security administrators are overwhelmed with alerts,” said Wen, adding that the ability to move Workspace logs into Chronicle reduces the workload on security teams. “There are lots of scenarios that our Chronicle investigation tool can help identify. It can even detect insider threats, where a trusted insider has downloaded data and is potentially looking for data leaks. This type of detection is particularly handy amid ongoing resource constraints in the security industry.”

Subscribe to the Cybersecurity Insider Newsletter

Strengthen your organization's IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices.

Delivered Tuesdays and Thursdays Sign up today

Hugging Face Embraces $235 Mn Funding Hug from Google, Amazon, NVIDIA, and More

AI startup Hugging Face has secured $235 million at a $4.5 billion valuation in a Series D funding round, initially revealed by The Information and seemingly confirmed by Salesforce CEO Marc Benioff on X (formerly known as Twitter).

The funding round received contributions from major players including Google, Amazon, NVIDIA, Salesforce, AMD, Intel, IBM, and Qualcomm, as announced by the company. Hugging Face CEO Clement Delangue stated that the raised funds will primarily be allocated towards talent acquisition to enhance competitiveness in the field of artificial intelligence.

Established in 2016, Hugging Face had secured a cumulative funding of $160 million before the latest investment, including its most recent series C round of $100 million announced in 2022.

Hugging Face operates a platform that enables AI developers to collaborate on code, models, and data sets, utilizing the company’s developer tools to simplify the deployment of open-source AI models. One of its key offerings involves hosting weights, essential components of contemporary AI models, consisting of large numeric lists.

According to Delangue, AI developers are utilizing Hugging Face continuously throughout the day. He anticipates a significant rise in the count of software developers engaging with AI models in the upcoming years.

“Maybe in five years, you’re going to have like 100 million AI builders. And if all of them use Hugging Face all day, every day, we’ll obviously be in a good position,” he said.

Hugging Face also recently introduced GPT- styled multimodal IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), an open-access visual language model which accepts arbitrary sequences of images and texts and produces text.

Thanks to the new funding, Delangue mentioned that Hugging Face intends to focus more on supporting different areas like research, business, and startups. The company, which currently has 170 employees, also plans to hire more people in the next few months.

The post Hugging Face Embraces $235 Mn Funding Hug from Google, Amazon, NVIDIA, and More appeared first on Analytics India Magazine.

The Best Courses for AI from Universities with YouTube Playlists

Image by Author

When you’re looking to start something new, it’s really hard to find which is the best. There are a multitude of AI courses available online, and it can get very tiresome to weave through the different ones and try to determine which is the right one.

Creating a study plan can exhaust you enough and lead to decision fatigue. So I’ve decided to take the work off your shoulders and have created a list of the best courses for learning about machine learning and artificial intelligence, with YouTube playlists.

This list consists of courses that have been created by trusted Universities. You may find that some of the courses are similar, which is normal. I have provided a range as I understand everybody learns differently, some like presentations, and some are particular about the voice they are learning from. To cater to everybody, I hope you find this list helpful.

Let’s get straight into it!

Stanford University Courses

Stanford University is well known for being one of the world's leading research and teaching institutions and providing insightful courses. Below is a list of courses with their respective YouTube playlists.

CS221 — Artificial Intelligence: Principles and Techniques — YouTube Playlist
CS224U: Natural Language Understanding — YouTube Playlist
CS224n — Natural Language Processing with Deep Learning — YouTube Playlist
CS224w — Machine Learning with Graphs — YouTube Playlist
CS229 — Machine Learning — YouTube Playlist
CS230 — Deep Learning — YouTube Playlist
CS231n — Convolutional Neural Networks for Visual Recognition — YouTube Playlist
CS234 — Reinforcement Learning — YouTube Playlist
CS330 — Deep Multi-task and Meta-Learning — YouTube Playlist
CS25 — Transformers United — YouTube Playlist

Carnegie Mellon University Courses

Carnegie Mellon University aimed to create problem solvers, drivers of innovation and pioneers in technology and the arts. They provide a good range of courses that can help you kickstart and elevate your career in artificial intelligence. With the developments in Large Language Models (LLMs) and NLP’s role, below is a list of courses that will help you better understand the theory and building of LLMs.

CS 10-708: Probabilistic Graphical Models — YouTube Playlist
CS/LTI 11-711: Advanced NLP — YouTube Playlist
CS/LTI 11-737: Multilingual NLP — YouTube Playlist
CS/LTI 11-747: Neural Networks for NLP — YouTube Playlist
CS/LTI 11-777: Multimodal Machine Learning — YouTube Playlist
CS/LTI 11-785: Introduction to Deep Learning — YouTube Playlist
CS/LTI 11-785: Neural Networks — YouTube Playlist
CS/LTI Low Resource NLP Bootcamp 2020 — YouTube Playlist

Massachusetts Institute of Technology Courses

Another well-known university that focuses on private research and the advancement of knowledge and education in areas such as science and technology. These courses have a heavier focus on deep learning.

6.006 — Introduction to Algorithms — YouTube Playlist
6.S191 — Introduction to Deep Learning — YouTube Playlist
6.S094 — Deep Learning — YouTube Playlist
6.S192 — Deep Learning for Art, Aesthetics, and Creativity — YouTube Playlist

DeepMind x UCL

DeepMind researchers are in collaboration with University College London (UCL) to offer students comprehensive courses to better understand AI.

COMP M050 — Introduction to Reinforcement Learning — YouTube Playlist
Deep Learning Series — YouTube Playlist

Wrapping it up

This list was curated to cater to all different types of levels and understanding. You may be completely new to AI and this is the first time you’re hearing a lot of new terminology. You may be a machine learning engineer who wants to learn more about LLMs and NLP.

I hope this list has taken a bit of the weight off your shoulders, and provided you with an easier study plan and guide to learning more about AI.
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.

Amazon brings new AI-driven features to Thursday Night Football

Amazon brings new AI-driven features to Thursday Night Football Lauren Forristal 8 hours

As Amazon’s Prime Video gears up for its second year as the exclusive rights holder to NFL’s Thursday Night Football (TNF), the streaming service hopes to give fans a more enhanced viewing experience with a slew of new AI-driven features.

During a demo with Prime Video executives, TechCrunch learned about the AI elements coming to TNF this season, as well as the first Black Friday NFL game and when viewers can expect HDR video quality.

AI is changing how sports content is consumed

If we asked someone about sports broadcasting a few years ago, we would have bet artificial intelligence (AI) and machine learning (ML) would be the last thing on their mind. However, the use of AI and ML models in sports has drastically altered how hardcore fans watch games, allowing them to dive deeper into the analytics.

“We don’t want to just put math on the screen,” Betsy Riley, senior coordinating producer at Prime Video, told us. “It’s about using data to tell a deeper story and to bring our fans insights so that they understand the game better. We think doing that lets people understand the chess match that’s unfolding on the field. For us as a tech company, it’s been really fun to dig into the numbers and to think about ways we can innovate and use tech to tell the story of the game in deeper and more meaningful ways.”

Amazon introduced AI to TNF last year, including X-Ray, which gives fans real-time access to live statistics and data; Rapid Recap, which generates up to 13 two-minute-long highlights for viewers to catch up on plays during a game and more. And after winning its first Sports Emmy award in May, it’s safe to say the tech behemoth isn’t easing up on the gas.

All the new AI features will live within Prime Vision with Next Gen Stats— TNF’s weekly alternate stream that features various graphic overlays on the screen during plays so fans can see stats and analysis in real-time.

Note that Amazon will internally test the features during tonight’s preseason game at 8 p.m. ET. However, fans won’t be able to experience them just yet. The features roll out on September 14, when the 2023 season begins. (Fans can find the complete TNF schedule on Amazon’s website).

Defensive Alerts

What if we said that AI can predict blitzes? Defensive Alerts is Amazon’s in-house ML neural network that recognizes when defensive players are about to rush the opposing quarterback. A red orb will appear around the players of interest so fans know exactly who to focus on.

“It’s able to look at all players XY coordinate data, their relationship to each other, as well as their acceleration; where are they moving and how fast are they moving directionally to predict who’s going to blitz,” explained Sam Schwartzstein, TNF Analytics Expert at Prime Video.

The ML model was trained on 35,000 plays and will continue to get smarter, Schwartzstein told TechCrunch, adding that it’s identifying blitzes and situations better than offensive linemen. He also said the team has a panel of NFL experts who are former quarterbacks, coaches and offensive linemen that help annotate the plays.

“Having this as an in-house neural network can only expand the kind of features that we can do in the future,” he said.

Prime Targets

Prime Targets (featured in the first image at the very top of the page) works similarly in that a green orb will light up a player that is open for a pass. The feature automatically tracks when a quarterback drops back to get ready to throw a pass, and the receiver (lit up by the green orb) runs out and creates separation from himself and the defenders.

This feature was previously called Open Receiver, which tracked which players would most likely convert the first down. Amazon tested it during last season’s games.

“This is the first statistic that is measuring the process of the play,” Schwartzstein noted. “Everything that we do on Prime Vision is predictive… This is all in real-time.”

Fourth Down Territory

Amazon is also launching a feature that may help fans understand how fourth-down decisions are made while potentially helping teams prepare for fourth downs.

The fourth down territory is an area on the field that offensive players use in an attempt to tie or win the game. Historically, coaches usually opt to punt the ball away since it feels less risky. However, as years go by, more and more teams are going for the fourth-down conversion.

Instead of putting analytics on the screen after the play happened, Fourth Down Territory operates like a real NFL analytics coordinator does; it shows viewers exactly when a team should try a fourth down and what the probability is.

Field Goal Target Zones

NFL fans are accustomed to seeing field goal target lines on broadcasts—the digital line that appears at the end of half or end of the game, where if a team gets to it, they can kick a field goal. Amazon’s Field Goal Target Zones feature will have multiple lines on the screen that tell viewers the likelihood that a kicker will make a field goal at each point.

Key Plays

Key Plays gives fans the ability to view in-game highlights and critical moments, whether they’re already watching the game live or streaming on demand afterward. Much like Rapid Recap ensures fans never miss the action, Key Plays leverages AI and machine learning to offer viewers a full rundown of what’s happening on the field.

The first Black Friday NFL game will stream for free on Prime Video

What can viewers expect for the NFL’s first Black Friday game?

Amazon and the NFL announced last fall that Prime Video would exclusively stream the NFL’s first Black Friday game on November 24, with the Miami Dolphins playing against the New York Jets and an expected kick-off at 3 p.m. ET.

Notably, the game will be free to watch for non-subscribers. The e-commerce giant will also use this as an opportunity to promote exclusive shopping deals to viewers.

During a press call earlier this week, Prime Video’s global head of sports, Jay Marine, hinted that Amazon has some “interesting things” planned for Black Friday that will be “additive” for fans.

While the company declined to share more details, Riley told TechCrunch that we can expect to hear more in the coming weeks. It’s our guess that Amazon will roll out interactive shopping elements in order to take advantage of the busiest shopping day of the year. Last year, the company launched a dedicated fan store page, TNF Central, offering TNF-related items, NFL-branded merch and Amazon devices.

We also spoke with Eric Orme, who serves as director of live events at Prime Video. Orme oversees global product, engineering, and operations for live sports events, including Thursday Night Football, the Premier League, UEFA Champions League, MLB, NBA, and US Open tennis, among others.

Prime Video will likely see a surge in viewership during the Black Friday game since everyone will be home for the holiday weekend. However, Orme is confident it will be a smooth streaming experience for all viewers.

“We work really closely with the retail teams and everybody’s leveraging AWS, so we spend a lot of time around traffic projections,” Orme said. “We ran a bunch of scenarios and are really confident where we think that number is going to be.”

Prime Video brings HDR video quality to TNF

While the streamer already tested HDR streaming last season, it will officially be available to all TNF viewers this year. Many fans will likely appreciate the visual upgrade because it provides a more compelling experience, with greater contrast and vivid colors.

The company said it would be available on any HDR-enabled device, and subscribers don’t need to change anything in settings as the livestream will automatically be upgraded.

Amazon Prime Video’s ‘Thursday Night Football’ starts strong with 15.3 million viewers

GPT-3 : Few Shot Learning for Language Model?

In the past few years, the AI and ML industry has witnessed a meteoric rise in the development & application of the NLP systems as researchers have been able to implement NLP practices in highly flexible and task-agnostic ways for downstream transferring tasks.

Initially, it was the single-layer representations that used word vectors, and were then fed to the task-specific architecture. Next, it was the RNN architecture that used multi-layer representations & contextual state to form better representations. And most recently, we have the transfer language models or pre-trained recurrent models that have entirely removed the need for task-specific architectures by fine-tuning these networks.

The transfer language models have proved to be a major turning point in the NLP industry as they have resulted in tremendous progress on challenging tasks like answering questions, reading comprehensions or blocks of text, textual entailment, and much more.

However, despite their advantages, transfer language models have a major limitation as they require task-specific finetuning or task-specific dataset to achieve the desired performance on a task. Furthermore, transfer language models also require developers to finetune the datasets to hundreds of thousands of examples specific to a particular task.

It goes without saying that removing the requirement for task-specific dataset, and task-specific finetuning will be highly desirable, and beneficial for the NLP industry for numerous reasons.

Issues with Existing Pre-Trained Transfer Language Models or Recurrent Models

Limiting the Practicality & Applicability

First and foremost, the requirement of a large dataset with labeled data for each task limits the applicability & practicality of the language models. Language models find their applications in a wide variety of tasks ranging from generating a short story, to correcting grammatical errors, to generating examples on a concept. At times, it is a challenging task to collect a large supervised dataset with labeled data, especially when the process needs to be repeated for every individual task.

Exploiting Spurious Correlations in Training Data

Limitations & narrowness of the training distribution coupled with expressiveness of the model can result in a fundamental growth in potential to exploit spurious correlations in training data. The potential to exploit the training data can result in problems during the fine-tuning and pre-training paradigm because the transfer language models are designed in a way to absorb a large amount of information during pre-training.

Furthermore, work on prior models have indicated that large models do not result in better out of distribution each & every time. Furthermore, it’s also been indicated that generalization achieved under such a paradigm can result in poor performance primarily because the model is highly specific to the training data, and cannot perform well on situations beyond the scope of the training data.

Comparison with Human Learning

Finally when compared to transfer language models, humans do not require a large training dataset when it comes to learning a majority of language tasks. Most often, a brief directive in a person’s natural language or a small demonstration of the language task is adequate for a human to understand and perform a language task with a certain level of competitiveness.

Human’s ability to adapt has numerous practical advantages as it allows them to either switch between different skill sets or mix them together to better perform during a dialect, something that’s beyond the capabilities of the current NLP systems.

Tackling the Issues with Meta Learning & GPT-3

A possible solution to the above challenges is the use of meta learning, a concept in modern ML that allows a model to develop a larger & broader set of skills & ability to recognize patterns while training, and then uses these learned abilities during interference to adapt rapidly, or recognize the required task.

Meta Learning is being implemented in language model architecture via a technique called “in-context learning” that uses text input of a pre-trained language model as a task specification. In the process, the model conditions on a natural language instruction, and might even use a few demonstrations, and the model is then expected to complete the rest of the task by predicting the next steps.

The only major issue with Meta Learning is that although it has shown positive potential, it’s still inferior to the fine-tuning approach in natural language architecture, and it needs further improvement in order to become a practical method for overcoming language tasks.

In addition to meta learning, another method that’s gaining popularity is increasing the capacity of transformer language models. In the past few years, transfer models have witnessed a substantial increase in their capacity with the RNSS18 model with 100 million parameters, the DCLT18 model with 300 million parameters, the RWC19 model with 1.5 billion parameters, the SSP19 model with 8 billion parameters, the RSR19 model with 11 billion parameters, and the TUR20 model with 17 billion parameters.

Increasing the capacity of the model or increasing the parameters has historically resulted in improvements in text synthesis, and there’s been an indication that log loss, that correlates with downstream tasks also follows a smooth trend of improving with the scale.

That brings us to the GPT-3 model that has over 175 billion parameters, and when it was launched, it was the transfer language model with the highest capacity. Let’s now talk about the GPT-3 model.

An Introduction to the GPT-3 Model

The GPT-3 is an autoaggressive language model with over 175 billion parameters that was released by OpenAI in 2020. GPT-3 is also classified as a large language model that just like its predecessor the GPT-2 model is a decoder-only deep learning transformer model that uses convolution-based architecture to generate textual data.

The GPT-3 model measures its own context-learning abilities, and the GPT-3 model is evaluated on over two dozen NLP datasets and multiple novel tasks. For every individual task, the GPT-3 model is evaluated under three conditions,

Few Shot Learning or In-Context Learning: In few shot learning, the GPT-3 model allows as many distributions that can fit well into the model’s context window.
One Shot Learning: In one shot learning, the model allows only one demonstration.
Zero Shot Learning: In zero shot learning, there are no demonstrations, and there’s only an instruction in natural language that’s fed to the model.

Broadly speaking, the GPT-3 model achieves desired performance in zero-shot, and one-shot settings, and in the few-shot setting, it outperforms the state-of-the-art transfer models most of the time. Furthermore, the GPT-3 model performs well in one-shot, and zero-shot settings at natural language tasks designed to test on the fly reasoning, or requires rapid attention like using novel words after a sentence, or unscrambling words, or performing arithmetic operations. On the other hand, when operated in a few-shot setting, the GPT-3 model generates synthetic news articles that resemble human writing when passed through human evaluators.

GPT-3 Model: Approach

The GPT-3 model uses a conventional pre-training approach that comprises model, data, and training, and it resembles the pre-training process followed by the RWC-19 transfer language model. The GPT-3 model scales up the model size, the dataset size, diversity of the dataset, and increases the length of the training period.

The model also uses an in-context learning approach that once again resembles the RWC-19 model’s approach, but tweaks things up a bit by systematically exploring different settings for learning patterns within the context of the dataset.

So, let’s start by exploring these settings, and evaluate how the GTP-3 model performs on different settings.

Fine Tuning

Fine-tuning the model has been the conventional approach in transfer language models, and this approach involves updating the weights of a pre-trained model by training the model on a supervised dataset that’s specific to the desired task, and hundreds of thousands of labeled examples are used during the process.

The fine-tuning approach is beneficial because it returns strong performance across numerous benchmarks. On the other hand, the main limitation of using the fine-tuning approach is that it requires a new & large dataset for every individual task, has the potential to exploit spurious features of the training dataset, can potentially result in unfair comparison with human performance, and poor generalization for out-of-distribution.

The current scope of the GPT-3 model does not implement the fine-tuning approach because of its task-agnostic performance, although fine-tuning can be applied to the GPT-3 model in the future.

Few Shot

Few Shot is a term that refers to the setting where the GPT-3 model is given a few demonstrations of the task during interference as conditioning, but the weights of the model are not updated. In the few shot settings, the dataset typically has an example with a context, and a desired completion (for example, a French sentence, and its English translation). The few shot setting gives the model K examples of context, and completion, and it then provides the model with one final context, and expects the model to provide the completion.

The major advantage of using the few shot setting is that it significantly reduces the need for task-specific data, and also reduces the potential to learn a narrow distribution from a large dataset that's fine-tuned narrowly. On the other hand, the major disadvantage of using few shot learning is that the results delivered in the few shot setting are not up to the mark, and significantly poor when compared to other state of the art models that are fine-tuned.

One Shot

In the one shot setting, the model is provided only with a single demonstration, and the rest is similar to the few shot setting. The reason why one shot setting is relevant in transfer language models is because out of all the three settings, one shot is the one that resembles the way in which tasks are communicated to humans the best. It’s because in most of the tasks, it's common to give one demonstration of the task otherwise it might be difficult to understand the context of the task.

Zero Shot

In the zero shot setting, there are no demonstrations, and the model is given a natural language instruction that describes the task. The zero shot method is the one that offers maximum convenience, is robust, and also avoids spurious correlations, but it’s also the most challenging of all the three settings. Its because in some cases, it’s difficult even for us humans to figure out the context of a task without seeing a demonstration first.

Regardless, for some tasks, zero-shot setting is the one that resembles how humans perform natural language tasks the closest.

The above figure compares the few shot, the one shot, and the zero shot setting when performing a natural language task of taking an English sentence, and translating it into French.

GPT-3: Model Architecture

The GPT-3 model uses the same architecture as the one used in the GPT-2 model, and it includes pre-normalization, modified initialization, and reversible tokenization techniques as they were used on the GPT-model with the exception of using an alternate strategy for locally banded sparse attention patterns, and alternating dense layers in the transformer layers, similar to Sparse Transformer.

To study the dependency of the model’s performance on the model size, the developers have trained 8 different model sizes that range over three different orders of magnitude from 125 million to over 175 billion parameters, the last one of them being called the GPT-3 model. Prior work related to LLM models have indicated that Scaling of validation loss with a sufficient amount of training data should be an approximate smooth power law as a function of size. Training models of varying sizes allows developers to test the hypothesis for both downstream language tasks as well as for validation loss.

The above figure compares the size & architecture of the 8 different models used for development of GPT-3. Here, n(params) defines the total number of trainable patterns, n(layers) defines the total number of layers in the model, d(model) defines the number of units in each layer of the bottleneck, and d(head) defines the dimensions of each attention head. The context window for each model is the same with 2048 tokens.

Furthermore, to minimize the transfer of data between the nodes, the model is partitioned across the GPUs along the depth & the width of the dimensions. The architectural parameters for each model have been chosen on the basis of computational efficiency, & load-balancing to maximize precision in the layout of models across GPUs.

Training Datasets

Typically, the large language models use datasets that have expanded significantly with recent developments, and they culminate in the Common Crawl dataset that consists of over a trillion different words. The size of the dataset is adequate enough to train the GPT-3 model without updating on the same sequence multiple times. However, studies & performance analysis indicate that lightly filtered versions or unfiltered versions of the Common Crawl dataset have low quality when compared to more curated dataset.

To tackle the issue of the average quality of the dataset, developers took 3 steps to boost the quality of the dataset.

Developers downloaded & filtered a version of the Common Crawl dataset based on a range similar to high-quality reference corpora.
Developers performed fuzzy duplication at the document level across the dataset in an attempt to preserve the integrity of their held-out validation set as an effective measurement of overfitting, and also to prevent redundancy.
Developers also added high-quality reference corpora to the training data to augment the Common Crawl dataset, and to further increase the diversity of the dataset.

The following figure shows the final proportion or mixture of the datasets used for training the GPT-3 model. The Common Crawl data consisted of over 45 TB of plaintext before filtering that was reduced to 570 GB of data after filtering, a rough equivalent to over 400 billion byte-pair encoded tokens. It's worth noting that datasets in the training that are viewed as higher-quality are sampled with more frequency instead of sampling the dataset proportion to their size. As a result, datasets like Books2 & Common Crawl are sampled less than one time during training, whereas the other datasets are sampled multiple times. It allows the model to accept a small amount of overfitting in exchange for training on training data with a higher quality.

A significant concern with large language models that are pre-trained on a large amount of internet data with the capacity to memorize & learn a large amount of content is the potential contamination of downstream tasks by having their development or test sets seen during the pre-training process. To reduce such potential contamination, the developers searched for any overlaps with the test & development sets of the benchmarks studied for GPT-3, and attempted to remove these overlaps.

The above image shows the total compute used during the training of the GPT-3 model. The model uses Scaling Laws for Neural Language Models to train much larger models on fewer tokens than typical. As a result, both GPT-3 and RoBERTa-Large model, that is 10x smaller than the GPT-3 model took nearly 50 petaflops/day of compute during the pre-training process.

Evaluation

For the few shot learning, the model evaluates each example present in the evaluation data set by drawing K examples randomly from that task’s training dataset as conditioning, and delimits it by 1 or 2 newlines depending upon the task. For Storycloze, and LAMBADA, the model draws conditioning examples from the development set & evaluates it on the test set because of unavailability of a supervised training set. For Winograd, there exists only one dataset, and so the conditioning samples are drawn directly from it.

K can be any value ranging from 0 to the maximum amount allowed by the model's context window which is next = 2048 for all the models, and it typically fits about 10 to 100 examples. Larger values of K often result in better results, but not always which is why when the model has a test set, and a separate development set available, the model experiments on a few values of K on the development set, and based on the results, it runs the best value on the test set.

Furthermore, on the tasks that require selecting a correct completion from multiple options, the developers provide K examples of correction plus context completion, and follow it up by providing one example of context only, and the tasks are then compared on the basis of LM likelihood of each completion. For tasks that require binary classification, the models often give options more semantically, and with more meaningful names, and then treats the task as multiple choice, and sometimes also frames the task similar to what is done by the RSR model & architecture.

For the tasks that require free-form completion, the model uses beam search with identical parameters as used in the RSR framework, with a beam of length 4, and a penalty of 0.6. The model is then scored using either the F1 similarity score, exact match, or BLEU, depending on the standard for the dataset.

Results

The above figure displays the training curves for the 8 models used in the GPT-3 model architecture, as described in the previous sections. Similar to the results from the KMH language model, the performance of the GPT-3 model follows a proper law when using training compute effectively. There is a slight difference from the law only when the trend is extended by two more orders of magnitude. It might occur to people that the improvements in cross-entropy loss might be a result of modeling spurious details of the training corpus. However, the improvements in the cross-entropy loss lead to consistent gains in the overall performance across a broad spectrum of a variety of NLP tasks.

Before evaluating the 8 different models on a wide range of training data, the datasets are grouped into 8 different categories that represent similar tasks. These categories are

Evaluation on traditional language modeling tasks, and tasks that resemble language modeling like Cloze tasks, or sentence/paragraph completion tasks.
Evaluation on “closed-book” question answering tasks.
Evaluating the model’s ability to translate between languages (especially one-shot and few-shot)
Evaluating the model’s performance on Winograd Schema-like tasks.
Evaluating on datasets that involve commonsense reasoning or question answering.
Evaluating on reading comprehension tasks.
Evaluating on the SuperGLUE benchmark suite.
Exploring NLI.

Language Modeling, Completion, and Cloze Tasks

In this section, the GPT-3 model’s performance is evaluated on the traditional language modeling tasks as well as tasks that require the prediction of a single word of interest, or completing a paragraph or a sentence, or completing a piece of a text. Let’s discuss them in brief detail.

Language Modeling

The GPT-3 model calculates the zero-shot perplexity on the PTB or the Penn Tree Bank dataset. The model omits Wikipedia-related tasks because it's already included in the model’s training data, and the one billion word benchmark is also omitted because it causes a significant amount of friction of the dataset being within the training data. However, the PTB dataset tackles these issues because it can predate the modern internet. The largest model in the GPT-3 model architecture ets new SOTA on the PTB dataset by a noteworthy margin of 15 points, and achieves a perplexity of 20.50.

LAMBADA

The LAMBADA dataset is used to test the modeling of the model on long-range dependencies in paragraphs or texts. It means that the model is asked to predict the last word of a sentence after reading the paragraph for the context. Furthermore, the continuous scaling of the language models yields diminishing returns on the benchmark.

The GPT-3 model achieves 76% accuracy on LAMBADA, and has a gain of over 8% over previous best models. Furthermore, the LAMBADA model demonstrates the flexibility of few-shot learning as it addressed the problem in a way that occurs classically with the dataset. The completion of a sentence in LAMBADA is usually the last word of the sentence, but as a language model cannot know that, it assigns a probability not only to the correct ending, but also to other continuations in the paragraph.

Furthermore, when the examples fed to the GPT-3 model are modified in a certain way, the model returns an accuracy of over 86%, an increase of over 18% over previous models. Additionally, the results also indicated that the performance of the model in a few-shot setting increases proportionally with the increase in model size. Although this strategy reduces the smallest model in the GPT-3 architecture by 20%, it enhances the accuracy of the primary GPT-3 model with 175 billion parameters by 10%.

Closed Book Question Answering

Closed Book Question Answering is an attempt to measure the GPT-3 model’s ability to answer questions based on broad factual knowledge. Because such questions often have a high amount of possible queries, the task is normally achieved using an information retrieval system that allows the model to find relevant text in combination with the model that learns to generate a response to an answer given the retrieved text, and the question.

The above image compares the result for the GPT-3 model compared with different models, and running on different datasets. On the TriviaQA dataset, the model achieves an accuracy score of 64.3% in the zero-shot setting, while it achieves an accuracy score of 68%, and 71.2% in one-shot, and few-shot settings respectively.

It can evidently be seen that the GPT-3 model in zero-shot setting outperforms the fine-tuned T5-11B model by over 14%.

The above figure shows the performance of the GPT-3 model grows smoothly with an increase in the model size. The performance suggests that the language models continue to learn from the dataset as their capacity increases.

Final Thoughts

It would be safe to say that GPT-3 was a revolutionizing phase in the LLM industry as GPT-3 helped in pushing the limits of what a language model could do. It was the developments made, and obstacles overcome by GPT-3 that paved the way for the most advanced, and accurate large language model till date, the GPT-4.

Modular Raises $100M to Develop AI Models, Aims to Challenge NVIDIA

Modular, a startup for simplifying the development and optimisation of AI systems, secured a substantial $100 million in funding through a round led by General Catalyst, with participation from Google Ventures, SV Angel, Greylock, and Factory.

The infusion brings Modular’s total raised capital to $130 million for the startup which was co-founded in 2022 by Chris Lattner, a former Google employee, and Tim Davis, a colleague from Google’s research division. The funds will be directed towards key initiatives, including product expansion, hardware support, and the enhancement of its programming language, Mojo, as emphasised by CEO Chris Lattner.

“This new funding will enable us to scale to the incredible customer demand we are seeing, continue to hire world-class talent and scale the release of our AI Engine and Mojo. We’re incredibly excited for the future,” Lattner said

Lattner and Davis shared the conviction that the potential of AI was being hindered by intricate and fragmented technical infrastructures. Modular’s foundation was thus laid with a mission to simplify the complexities of creating and managing AI systems at a larger scale.

Modular introduces an engine designed to enhance the inferencing performance of AI models on CPUs, and soon on GPUs as well. This initiative aligns with the increasing demand for AI capabilities, which has put pressure on GPU supply, affecting companies from Microsoft to smaller AI startups. Nvidia, a dominant GPU supplier, also controls Cuda, a prominent software for crafting machine-learning applications, which is exclusive to Nvidia chips. In contrast, Modular’s software strives to streamline AI developers’ ability to train and run their models on chips produced by various companies, such as AMD, Intel, and Google.

Modular’s flagship products encompass its engine for improved AI model performance and the Mojo programming language, designed to combine the usability of Python with features like caching and adaptive compilation techniques. These initiatives aim to tackle the issues of complexity and inefficiency commonly faced by developers in the AI landscape.

While challenges related to complexity and demand are palpable, Modular’s endeavours exhibit considerable ambition. Lattner acknowledged that AI’s compute power requirements are becoming unsustainable, leading to compute capacity shortages in certain instances. Modular seeks to address this issue, making AI technology more accessible, affordable, and sustainable for enterprises beyond just the large tech companies.

“We have been able to achieve tremendous momentum in only 20 months,” said Tim Davis, Modular co-founder and President.

“The financing will allow us to accelerate our momentum even more, scaling to meet the incredible demand we have seen since our launch. We now have a community of more than 120K+ developers, including many of the world’s leading technology companies, and 10K’s of enterprises that are excited to deploy Modular infrastructure,” he added.

Challenges

However, there’s a potential challenge in driving the widespread adoption of Mojo, given Python’s entrenched status in the machine-learning community. Lattner believes in the potential of Mojo’s unique benefits, asserting that AI applications entail more than just high-performance acceleration; they encompass end-to-end data processes. He believes that Mojo can unify these processes, enhancing performance and scalability.

Modular’s endeavours have garnered attention and support. The company’s community has expanded rapidly, and leading tech companies have already embraced its infrastructure. With an ongoing commitment to simplify and revolutionize the AI landscape, Modular’s journey is off to an auspicious start.

The post Modular Raises $100M to Develop AI Models, Aims to Challenge NVIDIA appeared first on Analytics India Magazine.

Also: Amazon hones in on generative AI at AWS Summit and unveils new AI projects

Singapore's Smart Nation and Digital Government Group (SNDGG) has signed up as the first global customer of the AWS Dedicated Local Zones. The government agency is responsible for the public sector's digital transformation and engineering capabilities.

The country's government chief digital technology officer Chan Cheow Hoe said SNDGG had worked with AWS to define and build Dedicated Local Zones. "[These] meet our stringent data isolation and security requirements, enabling Singapore to run more sensitive workloads in the cloud securely," Chan said.

The Singapore government in June said it had carved out dedicated cloud resources for the public sector to deploy artificial intelligence (AI) applications more efficiently and securely. Called the AI Government Cloud Cluster, the platform runs within a dedicated environment on Google Cloud and offers builder tools for generative AI applications, which developers with varying coding skillsets and limited technical knowledge can use to build chatbots and search platforms.

Also: Google introduces 11 new security features for Workspace (some AI-powered)

The AI cluster can be accessed via the Government on Commercial Cloud (GCC) platform, which provides a central infrastructure for local government agencies to deploy commercial cloud services. Running on AWS, Google Cloud, and Microsoft Azure, the GCC is part of a five-year roadmap to move the public sector's on-premise IT systems to commercial cloud platforms.

AWS has cloud regions and local zones, the latter of which are provided to meet latency and data sovereignty requirements. The new Dedicated Local Zones offer similar local benefits, with the exception that they are exclusively used by a single customer or community.

With Dedicated Local Zones, customers also have the ability to configure their own private zones with the security and governance capabilities they need to adhere to their local regulatory needs, according to AWS. These features allow customers to monitor and control access and operations to their dedicated zones, including access auditing and restrictions.

Also: Ransomware attacks broke records in July, mainly driven by this one group

Matt Garman, AWS' senior vice president of sales and marketing, said in a post: "Our public sector and regulated industry customers have told us they want dedicated infrastructure for their most critical workloads to help meet regulatory or other compliance requirements. Many of these customers manage their own infrastructure on premises for workloads that require isolation. [However], this forgoes the performance, innovation, elasticity, scalability, and resiliency benefits of the cloud."

According to Garman, the dedicated zones, operated by local AWS staff, provide these cloud benefits with added security and governance features, including options to apply security clearance or other criteria on local AWS operating personnel.

AWS has five local zones across the Asia-Pacific region, including in Singapore, Sydney, Tokyo, and Mumbai.

Featured

OpenAI Trying Hard to Woo Enterprises

Looks like OpenAI has had enough. And now, it wants to give it back to Meta’s Llama 2 and its very own partner Microsoft who thought it could play OpenAI by forging multiple partnerships with Meta and Databricks.

OpenAI recently announced that fine-tuning for GPT-3.5 Turbo is now available and fine-tuning for GPT-4 is coming this fall. In their blog post, OpenAI stated that a finely-tuned variant of GPT-3.4 Turbo has the potential to achieve, and in some cases even surpass, the capabilities of the base GPT-4 model on certain narrow tasks.

The company has stated that the fine-tuning of GPT 3.5 Turbo is suitable specifically for businesses and developers to customize the model depending upon their use case as it lets them train the model on company’s data and run it at scale.

With this development, OpenAI has shown that it does care about enterprises. The creator of ChatGPT came with distinct reasons why businesses should adopt GPT-3.5 Turbo API. OpenAI elaborated how fine-tuning is a great way to hone the qualitative feel of the model output such as its tone, so it better fits the voice of businesses’ brands.

All of OpenAI’s efforts sound great, but the question is, will fine-tuning for GPT-3.5 Turbo effectively addresses the issues of cost and security for enterprises.

Fine-tuning is essential for broader AI application adoption but I think @OpenAI got this wrong.
1. Data privacy: No company will feel comfortable uploading their data to fine-tune the model.
2. ROI vs Cost: Companies/individuals are paying for training and inference while…

— Dr Ahmed Zaidi (@_ahmedzaidi) August 23, 2023

Breaking Down the Cost

With GPT-3.5 Turbo API, OpenAI has made genuine efforts to cut down on the prices as compared to GPT-4 API. Moreover, the upcoming integration of fine-tuning is expected to lead to even lower costs. Fine-tuning on GPT-3.5 Turbo helps users make prompts shorter, which means using fewer words to get the same results. This can lower the overall cost of using the API. OpenAI found that early testers were able to make prompts much shorter, up to 90% less, by tweaking how they instructed the model. This not only made the API calls faster but also saved money.

According to OpenAI’s blog, fine-tuning expenses are divided into two categories: the initial training cost and the usage cost. The training cost is $0.008 for every 1,000 tokens, while the input usage cost is $0.012 per 1,000 tokens, and the output usage cost is $0.016 per 1,000 tokens. For example, if a fine-tuning task involves the GPT-3.5-Turbo model, utilizing a training file of 100,000 tokens, and undergoes training for 3 epochs, the anticipated cost would amount to $2.40.

What’s important to remember is not to mix up fine-tuning costs with general API expenses. Specifically for the GPT-3.5 Turbo API, the charges are $0.003 per 1,000 tokens for input and $0.004 per 1,000 tokens for output with a 16K context length.

However, it’s still uncertain if it can compete with Llama 2 in terms of pricing. Several users on Hacker News expressed that GPT-3.5 Turbo is better than Llama 2. “The Llama 2 70B performance is probably between GPT-3.5 and GPT-4. But running it personally isn’t cheap. The cheapest I found is about $4/hr to run the whole thing. I only spend around $3 on average a month on GPT-3.5 API for my personal stuff,” a Hacker News user said.

The new update can handle 4k tokens, double of the previous fine-tuned models. OpenAI said that fine-tuning is the most powerful when combined with other techniques like prompt engineering, information retrieval, and function calling.

One of the users of X praising availability of GPT-3.5 fine-tuning said that a fine-tuned GPT-3.5-Turbo has trivial training cost and inference is about 1/2 of GPT-4. “I can see this being the best play in a wide variety of scenarios, especially low-intelligence agentic flows w/ idiosyncratic tool usage and low latency requirements.”

OpenAI Cares About Data Privacy

Enterprises possess valuable data that they handle with great care and are cautious about sharing with external entities. After the fine-tuning announcement was made for GPT 3.5 Turbo, some users on social media platforms questioned whether OpenAI would use the data that the enterprises provide for training. To this, OpenAI’s Logan Kilpatrick replied, “No data is used for training, it’s the same across all our endpoints.” He corroborated the statement with an updated blog on OpenAI’s API data privacy measures.

A few days back, Sam Altman had also clarified on X saying that OpenAI does not use any API data to train its models putting all speculations that arose regarding security issues to rest.

Despite all the claims by OpenAI, users are still sceptical about the security of data on the platform. They would rather trust well-established players such as IBM, Azure, AWS, and Databricks as these platforms offer a range of LLMs and provide customers with the capability to train and personalise these models on their respective platforms. It’s clear that if OpenAI plans to establish its position in the enterprise segment, it must establish itself against these companies.

Interestingly, Microsoft’s Azure OpenAI service also permits customers to customise their models using fine-tune on their own datasets. It remains to be seen whether enterprises will opt to approach OpenAI directly or choose the route through Azure.

The post OpenAI Trying Hard to Woo Enterprises appeared first on Analytics India Magazine.

Modular secures $100M to build tools to optimize and create AI models

Modular secures $100M to build tools to optimize and create AI models Kyle Wiggers 7 hours

Modular, a startup creating a platform for developing and optimizing AI systems, has raised $100 million in a funding round led by General Catalyst with participation from GV (Google Ventures), SV Angel, Greylock and Factory.

Bringing Modular’s total raised to $130 million, the proceeds will be put toward product expansion, hardware support and the expansion of Modular’s programming language, Mojo, CEO Chris Lattner says.

“Because we operate in a deeply technical space that requires highly specialized expertise, we intend to use this funding to support the growth of our team,” Lattner said in an email interview with TechCrunch. “This funding will not be primarily spent on AI compute, but rather improving our core products and scaling to meet our incredible customer demand.”

Lattner, an ex-Googler, co-founded Palo Alto-based Modular in 2022 with Tim Davis, a former Google colleague in the tech giant’s Google Brain research division. Both Lattner and Davis felt that AI was being held back by an overly complicated and fragmented technical infrastructure, and founded Modular with a focus on removing the complexity of building and maintaining AI systems at large scale.

Modular provides an engine that tries to improve the inferencing performance of AI models on CPUs — and beginning later this year, GPUs — while delivering on cost savings. Compatible with existing cloud environments, machine learning frameworks like Google’s TensorFlow and Meta’s PyTorch and even other AI accelerator engines, Modular’s engine, currently in closed preview, lets developers import trained models and run them up to 7.5 times faster versus on their native frameworks, Lattner claims.

Modular’s other flagship product, Mojo, is a programming language that aims to combine the usability of Python with features like caching, adaptive compilation techniques and metaprogramming. Currently available in preview to “hundreds” of early adopters, Modular plans to release Mojo in general availability early next month.

“Our developer platform enables our customers, and the world’s developers, to defragment their AI technology stacks — pushing more innovations into production faster and realizing more value from their investment in AI,” Lattner said. “We’re attacking the complexity that slows AI development today by solving the fragmentation issues that plague the AI stack, starting with where AI software meets AI hardware.”

Ambitious much? Perhaps. But none of what roughly-70-employee Modular’s proposing is out of the realm of possibility.

Deci, backed by Intel, is among the startups offering tech to make trained AI models more efficient — and performant. Another in that category is OctoML, which automatically optimizes, benchmarks and packages models for an array of different hardware.

In any case, to Lattner’s point, AI demand is fast approaching the limits of sustainability — making any tech to cut down on its compute requirements hugely desirable. The generative AI models in vogue today are 10 to 100 times bigger than older AI models, as a recent piece in The Wall Street Journal points out, and much of the public cloud infrastructure wasn’t built for running these systems — at least not at this scale.

It’s already had an impact. Microsoft is facing a shortage of the server hardware needed to run AI so severe that it might lead to service disruptions, the company warned in an earnings report. Meanwhile, the sky-high appetite for AI inferencing hardware — mainly GPUs — has driven GPU provider Nvidia’s market cap to $1 trillion. But Nvidia’s become a victim of its own success; the company’s best-performing AI chips are reportedly sold out until 2024.

For these reasons and others, more than half of AI decision makers in top companies report facing barriers to deploying the latest AI tools, according to a 2023 poll from S&P Global.

“The compute power needed for today’s AI programs is massive and unsustainable under the current model,” Lattner said. “We’re already seeing instances where there is not enough compute capacity to meet demand. Costs are skyrocketing and only the big, powerful tech companies have the resources to build these types of solutions. Modular solves this problem, and will allow for AI products and services to be powered in a way that is far more affordable, sustainable and accessible for any enterprise.”

Modular’s Mojo programming language, a ‘fast superset’ of Python.

That’s reasonable. But I’m less convinced that Modular can drive widespread adoption of its new programming language, Mojo, when Python is so entrenched in the machine learning community. According to one survey, as of 2020, 87% of data scientists used Python on a regular basis.

But Lattner argues that Mojo’s benefits will drive its growth.

“One thing that is commonly misunderstood about AI applications is that they are not just a high- performance accelerator problem,” he said. “AI today is an end-to-end data problem, which involves loading and transforming data, pre-processing, post-processing and networking. These auxiliary tasks are usually done in Python and C++, and only Modular’s approach with Mojo can bring all these components together to work in a single unified technology base without sacrificing performance and scalability.”

He might be right. The Modular community grew to over 120,000 developers in the four months since Modular’s product keynote in early May, Lattner claims, and “leading tech companies” are already using the startup’s infrastructure, with 30,000 on the waitlist.

“The most important enemy of Modular is complexity: complexity in software layers that only work in special cases, software that’s tied to specific hardware and complexity driven by the low-level nature of high-performance accelerators,” he said. “The very thing that makes AI such a powerful and transformative technology is the reason it requires so much effort to reach scale, so much talent invested in building bespoke solutions and so much compute power to deliver consistent results. The Modular engine and Mojo together level the playing field, and this is just the start.”

And — at least from a funding standpoint — what an auspicious start it is.