Indian Manufacturers Trail in Digital Transformation Despite Investment

For all the cloud that perception creates, numbers always tell the real story. According to the 8th annual ‘State of Smart Manufacturing Report’ by Rockwell Automation released in March this year, India has the largest number of manufacturing firms investing in technology. The survey was conducted across 1,350 manufacturers in 13 of the major manufacturing countries including India, China, US, Germany, Japan and the UK. The report stated that Indian manufacturers were spending 35% of their operating budgets in technology, far more than the global mean of 23% investment.

India leads investment in manufacturing

But the money hasn’t just translated into tangible results as of yet. India is still lagging behind in digital maturity. A report by Lenovo released just last week found that 48% of Indian businesses are still clung to the first stage of digital maturity. While 87% of businesses believe digital infrastructure is critical for them to make money, only 33% of Indian companies are adequately prepared. For companies to reach stage 3 or 4 of digital maturity, their IT teams must have moved to the hybrid cloud.

And yet falls behind in digital transformation?

But it would seem as though Indian companies have long existed in this state of limbo. Arun Bajaj, Director of smart TV manufacturer Videotex spoke that a big chunk of the companies were still plagued by age-old issues. “Lack of digital infrastructure, knowledge and expertise are significant obstacles preventing Indian businesses from being digitally mature. Progress is also hampered by out-of-date hardware and software and cybersecurity risks,” he said.

He admitted that money was being spent in the right direction, there were still gaps. “Although ongoing spending is crucial, it is not the only factor influencing digital maturity. We believe that manufacturers should invest in digital skills training, infrastructure improvements and increased awareness of the advantages of digital transformation in order to assist more Indian firms in reaching higher levels of digital maturity and competing in the global economy. Collaboration between businesses, governmental organisations and educational institutions is needed to get past these obstacles,” he stated.

But from what Bajaj described, cloud transformation was still a distant reality for most companies here. “Adopting cloud-based technologies and software solutions first will increase operational agility and efficiency. Businesses should also think about partnering with knowledgeable technology providers who can create solutions specifically for them. They should also continuously evaluate their progress towards achieving their objectives for digital transformation and develop a clear digital strategy that is in line with their business goals,” he noted.

Aniruddha Banerjee, co-founder of AI services startup SwitchOn explained that the issues had more to do with mindset of company leaders and could be divided cleanly into three parts. “Firstly, most company leaders face the issue of short-term thinking. Most of the objectives are driven with short-term outlook and rarely lead to organisational wide changes due to obstacles with scale. Companies should instead focus on the long term business outcomes rather than just trying to ‘just do something.

“Secondly, there’s a constant reliability on global products. Products that succeed globally rarely succeed in India due to a lack of Indian context. This is almost never understood well. Companies need to trust products that have worked in the Indian context and have been built in India. They know Indian users the best.

“And thirdly, Indian companies have an unnecessary focus on minor details rather than overall strategy when digitising. This leads to huge adoption time and a lack of interest or focus. Companies should look at the big picture and minor issues will resolve themselves automatically,” he explained.

Baby steps in Smart Manufacturing

But even as India is in the throes of these middling changes, the concept of smart manufacturing has walked in through the doors. Smart manufacturing is a full realisation of the digital dream in manufacturing. Smart factories have advanced sensors, embedded software (like IoT, cloud computing, analytics) and robotics for optimising better decision-making.

The goal of smart manufacturing is to push automation, have predictive maintenance and address customer concerns in real-time. The making of India’s first smart factory, backed by Boeing, is still in progress at the Indian Institute of Science’s (IISc) at Bangalore’s Product Design and Manufacturing (CPDM) Center.

Despite the massive push in India, there are many lessons still to be learned from countries like Japan and South Korea which are far ahead on their journey of modernising manufacturing. “We need to learn from Japan. This country literally transformed the way of working of the industries and the way of doing business. The Academia- institution- industry collaboration in Japan helped in achieving this. It’s time India too went for such collaborative efforts, which will help us grow fast,” Shyam Singh, Tata Motors Plant Head said at his inaugural speech for a week-long Faculty Development Program on smart manufacturing at the DY Patil International University, Pune.

Raghav Gupta, co-founder and CEO of Futurense Technologies reiterated this in a panel discussion at the DES conference held this year. For smart manufacturing to become a solid reality, India too would have to start young. “The challenge is that the number of those people are firstly very low. And secondly, when it comes to re-training the people from the industry, the challenge is un-learning and then learning. There has to be an understanding that education needs to move towards domain-specific courses from a very young age,” Gupta stated.

The post Indian Manufacturers Trail in Digital Transformation Despite Investment appeared first on Analytics India Magazine.

Google’s new Labs page lets you sign up for its AI experiments

Google’s new Labs page lets you sign up for its AI experiments Ivan Mehta 8 hours

This year’s Google IO was all about AI. The company announced a number of AI-powered features across its products on Wednesday. The search giant also launched a new page called Google Labs to let people sign up for these experiments and test out AI-powered features for feedback before the wider release.

Currently, you can look at four projects on the Google Labs home page. AI-powered Google search features; AI in Google Workspaces; Tailwind, which is the company’s new project about smarter note-taking; and MusicLM, a new tool that lets you generate music through text prompts. You can learn more about each project or sign up for the waitlist to try them out.

Google said that it will soon roll out some limited-time experiments. These will include search enhancements under Search Generative Experience (SGE), which will help people give a summary of search topics and prompts to explore more about them; Code Tips, which will give help users with coding problems directly from the search bar for languages like Java, Go, Python, Javascript, C++, Kotlin, shell, Docker and Git; and “Add to Sheets,” which will let you embed search results like vacation suggestion directly into a Google Sheet.

Image Credits: Google

At the moment, search labs are available in only US and in English. Plus, most experiments require users to be 18 and above to join the waitlist. Google Labs is available to access through the website or via the Labs icon on the Google app.

Apart from opening a new Labs page for AI experiments, Google also lifted the waitlist on the Bard chatbot and made it available to users in 180 countries along with support for Japanese and Korean.

Read more about Google I/O 2023 on TechCrunch

Can You Build Large Language Models Like ChatGPT At Half Cost?

Large Language Models (LLMs) like GPT-3 and ChatGPT have revolutionized AI by offering Natural Language Understanding and content generation capabilities. But their development comes at a hefty price limiting accessibility and further research. Researchers estimate that training GPT-3 cost OpenAI around $5 million. Nevertheless, Microsoft recognized the potential and invested $1 billion in 2019 and $10 billion in 2023 in OpenAI’s GPT-3 and ChatGPT venture.

LLMs are machine learning models trained on extensive textual data for NLP applications. They are based on transformer architecture and utilize attention mechanisms for NLP tasks like question-answering, machine translation, sentiment analysis, etc.

The question arises: can the efficiency of these large models be increased while simultaneously reducing computational cost and training time?

Several approaches, like Progressive Neural Networks, Network Morphism, intra-layer model parallelism, knowledge inheritance, etc., have been developed to reduce the computational cost of training neural networks. The novel LiGO (Linear Growth Operator) approach we will discuss is setting a new benchmark. It halves the computational cost of training LLMs.

Before discussing this technique, examining the factors contributing to the high price of making LLMs is essential.

Cost of Building Large Language Models

Three major expenses for developing LLMs are as follows:

1. Computational Resources

Building LLMs require massive computational resources to train on large datasets. They must process billions of parameters and learn complex patterns from massive textual data.

Investment in specialized hardware such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) is required for building and training LLMs to achieve state-of-the-art performance.

For instance, GPT-3 was trained on a supercomputer with 10000 enterprise-grade GPUs (H100 and A100) and 285,000 CPU cores.

2. Energy Consumption

The intensive computational resources required for building LLMs result in significant energy consumption. For instance, training 175 billion parameters GPT-3 took 14.8 days using 10,000 V100 GPUs, equivalent to 3.55 million GPU hours. Such a high level of energy consumption has significant environmental effects as well.

3. Data Storage & Management

LLMs are trained on large datasets. For instance, GPT-3 was trained on a vast corpus of textual data, including Common Crawl, WebText2, Books1, Books2, and Wikipedia, among other sources. Significant infrastructure investment is required to collect, curate and store these datasets.

Also, cloud storage is required for data storage, and human expertise for data preprocessing and version control. Moreover, ensuring that your data strategy complies with regulations like GDPR also adds to the cost.

LiGO Technique: Reduce the Cost of Building Large Language Models to Half

LiGO (Linear Growth Operator) is a novel technique developed by researchers at MIT to reduce the computational cost of training LLMs by 50%. The method involves initializing the weights of larger models from those of smaller pre-trained models, enabling efficient scaling of neural networks.

Image from the Paper: Learning to Grow Pretrained Models For Efficient Transformer Training

Yoon Kim, the senior author of the paper, says:

“It’s been estimated that training models at the scale of what ChatGPT is hypothesized to run on could take millions of dollars just for a single training run. Can we improve the efficiency of these training methods, so we can still get good models in less time and for less money? We propose to do this by leveraging smaller language models that have previously been trained.”

This method maintains the performance benefits of larger models with reduced computational cost and training time compared to training a large model from scratch. LiGO utilizes a data-driven linear growth operator that combines depth and width operators for optimum performance.

The paper utilized various datasets to conduct text-based experiments, including the English Wikipedia corpus for training BERT and RoBERTa models and the C4 dataset for training GPT2.

The LiGO technique experimentation included growing BERT-Small to BERT-Base, BERT-Base to BERT-Large, RoBERTaSmall to RoBERTa-Base, GPT2-Base to GPT2-Medium, and CaiT-XS to CaiT-S.

The researchers compared their approach with several other baselines, including training from scratch, progressive training, bert2BERT, and KI.

LiGO technique offered 44.7% savings in FLOPs (floating-point operations per second) and 40.7% savings in wall time compared to training BERT-Base from scratch by reusing the BERT-Small model. LiGO growth operator outperforms StackBERT, MSLT, bert2BERT, and KI in efficient training.

Benefits of Using a Training Optimization Technique Like LiGO

LiGO is an efficient neural network training method that has various benefits listed as follows:

1. Faster Training

As stated earlier, faster training is the main advantage of the LiGO technique. It trains LLMs in half the time, increasing productivity and reducing costs.

2. Resource Efficient

LiGO is resource-efficient since it minimizes wall time and FLOPs, leading to a more cost-effective and eco-friendly approach to training large transformer models.

3. Generalization

The LiGO technique has improved the performance of both language and vision transformers suggesting that it is a generalizable technique that can be applied to various tasks.

Building commercial AI products is just one facet of the overall expenses associated with AI systems. Another significant component of costs comes from daily operations. For instance, it costs OpenAI about $700,000 every day to answer queries using ChatGPT. Researchers are expected to continue exploring approaches that make LLMs cost-effective during training and more accessible on runtime.

For more AI-related content, visit unite.ai.

ChatGPT’s Code Interpreter May Make Data Scientists Obsolete

In March this year, OpenAI announced that they would be adding plugins to ChatGPT, while teasing the launch of a code interpreter and web browser plugin. Last week, the company started rolling out the code interpreter plugin, which has already caused concern among data scientists with just a sneak peek.

The plugin replaces many of the common workflows of a data scientist, including visualisation, trend analysis, and even data transformation. When looking at the code interpreter in tandem with the other advancements in the data science field, the question remains — will data scientists become obsolete?

Data scientist on steroids?

Put simply, the code interpreter is a plugin for ChatGPT that provides a sandboxed and firewalled execution environment for Python code. For security reasons, the interpreter only runs for the duration of the chat session, and is also hosted on ephemeral disk space, meaning the data is cleared after the conversation is closed.

The interpreter also supports the upload of certain files to the plugin, with outputs from the bot being available to download. In the blog post announcing its launch, OpenAI compared the code interpreter to a “very eager junior programmer working at the speed of your fingerprints”, further stating that it is good at solving mathematical problems, converting files between different formats, and conducting data analysis and visualisation. The interpreter also has access to a variety of Python libraries, including an OCR library and MatPlotLib.

People over the Internet have put ChatGPT to the task, asking it to analyse a variety of datasets from Netflix’s shows to crime data in San Francisco. In these applications, the plugin was able to identify trends, clean the data, and even generate insights.

In addition to this, the chatbot was also able to generate visualisations for the derived insights, presenting the information in an easy-to-understand format. For example, here is a visualisation of every lighthouse in the United States, generated from a simple CSV file of lighthouse locations.

This was kind of delightful: I uploaded a CSV file of every lighthouse location in the US.
"ChatGPT Code Interpreter: Create a gif of a map of the lighthouse locations, where the map is very dark but each lighthouse twinkles." A couple seconds later… pic.twitter.com/f14JLWQCyB

— Ethan Mollick (@emollick) May 2, 2023

Instead of wrestling with spreadsheets and complex visualisation software, anyone can simply prompt the code interpreter to give them the result they want.

This set of roles and responsibilities closely describes the job description of an average data scientist, except that ChatGPT does it way faster. So, what is the value proposition for a data scientist? For many, it might just be about trusting the data.

Not human after all

A data scientists’ responsibilities go beyond wrangling the data and visualising it. An expert data scientist acknowledges the importance of storytelling through data and the value of finding hidden nuggets of insights through the human touch. While ChatGPT’s code interpreter is not capable of doing so, due to its lack of logical thinking, the plugin comes with another set of problems: hallucinations.

While the bot may be able to fulfil some of the roles of a data scientist, it is still based on an LLM, which is prone to hallucinations. Users on Hacker News had this to say about some of the visualisations created by the chatbot.

“Current chatbot AIs have impressive capabilities but are also prone to getting important details wrong. There are also plenty of “obvious” glitches in the graphics simulations, but those concern me less – precisely because they’re obvious.”

It seems that hallucinations follow ChatGPT wherever it goes, and the code interpreter is no different. However, it does seem that these hallucinations are largely restricted to the visualisations created by the code interpreter. In addition to this, there is also the problem of data contamination in ChatGPT’s dataset.

Common visualisations, like plotting a graph from a CSV, are relatively easy for the LLM to carry out. This is likely because these kinds of projects are well-documented all over the Internet, making it more likely for ChatGPT to know about them. However, an actual data scientist in a big organisation is likely to face visualisation problems that go beyond simple graphs or map plots, which the code interpreter cannot handle reliably.

Horace He on Twitter showed an example of this contamination. Picking up the example of Codeforces problems, he found that GPT-4 was able to solve 10/10 of the problems posted pre-2021, but completely failed at solving any of the problems posted post this date.

While these examples don’t show the whole picture, it is clear that ChatGPT’s code interpreter is not going to replace a data scientist any time soon. However, it is quite close to being a ‘personal data analyst’ of sorts for those who are not familiar with data science as a field. It can also grow to be a reliable pair of data scientists to work alongside a human.

The post ChatGPT’s Code Interpreter May Make Data Scientists Obsolete appeared first on Analytics India Magazine.

Google Photos to gain a new ‘Magic Editor’ feature powered by generative AI

Google Photos to gain a new ‘Magic Editor’ feature powered by generative AI Sarah Perez @sarahintampa / 17 hours

Google Photos is expanding its use of AI to help users edit and enhance their photos. While the company has already leveraged AI for its tools like the distraction-removing Magic Eraser and corrective Photo Unblur features in Photos, it’s now turning to AI for more complex edits with the introduction of Magic Editor. The new tool will combine AI techniques, including generative AI, for editing and reimaging photos, says Google.

The company offered a sneak peek at the new experimental feature at this week’s Google I/O developer conference to show off its capabilities.

With Magic Editor, users will be able to make edits to specific parts of the photos — like the foreground or background — as well as fill in gaps in the photo or even reposition the subject for a better-framed shot.

For example, Google showed off how Magic Editor could be used to improve a shot of a person standing in front of a waterfall.

In a demo of the technology, a user is able to first remove the other people from the background of the photo, then remove a bag strap from the subject’s shoulder for a cleaner look. While these types of edits were previously available in Google Photos via Magic Eraser, the ability to reposition the subject is new. Here, the AI “cuts out” the subject in the foreground of the photo, allowing the user to then reposition the person elsewhere in the photo by dragging and dropping.

Image Credits: Google

This is similar to the image cutout feature Apple introduced with iOS 16 last year, which also could isolate the subject from the rest of the photo in order to do things like copy and paste part of the image into another app, grab the subject from images found through Safari search, or position the subject of the photo in front of the clock on the iOS Lock Screen, among other things.

In Google Photos, however, the feature is meant to help users create better photos.

Another demo showed off how Magic Editor’s ability to reposition a subject could also be combined with its ability to fill in the gaps in an image using AI techniques.

In this example, a boy is sitting on a bench holding a bunch of balloons, but the bench is shifted off to the left side of the photo. Magic Editor allows you to pull the boy and bench closer to the photo’s center and, while doing so, it uses generative AI to create more of the bench and the balloons to fill in the rest of the photo. As a final touch, you can brighten the sky behind the photo so it’s a brighter blue with white fluffy clouds, rather than the gray, overcast sky of the original.

Image Credits: Google

The sky-filling feature is similar to what various other photo-editing apps can do, like Lensa or Lightricks’ Photoleap, to name a couple. But in this case, it’s included with users’ main photo organizing app, instead of requiring an additional download of a third-party tool.

The result of the edits, at least in the demos, is that of natural-looking, well-composed images, not those that look like they’ve been heavily edited or AI-created, necessarily.

Google says it will release Magic Editor as an experimental feature later this year, warning that there will be times when it doesn’t quite work correctly. The tests and user feedback will help the feature to improve over time, as users now edit 1.7 billion photos each month using Google Photos, the company said.

It’s unclear if Google will eventually charge for this feature, however, or perhaps make it a Pixel exclusive. Possibly, it will make Magic Editor a Google One subscription perk, as it did with Magic Eraser earlier this year.

The feature will initially become available to “select” Pixel devices, but Google declined to share which phones will receive it first.

The company said it also plans to share more about the AI tech under the hood when it gets closer to the early access release of the feature, but won’t go into detail now.

Read more about Google I/O 2023 on TechCrunch

Google unveils advanced AI model to beef up Bard, Search, Maps and more

Google's AI model.
Image: Google

Google’s focus on artificial intelligence took center stage at its I/O 2023 conference with the unveiling of its PaLM 2 learning language model and AI-based improvements to Bard, Search, Maps, Workspace and its other products. On Tuesday, the search giant demonstrated new capabilities in many of its core products, all designed to capitalize on the growing AI trend.

Jump to:

  • PaLM 2 large language model
  • Improvements to Bard AI
  • Enhancements to Google Search
  • Enhancements to Google Workspace
  • Improvements to Maps and Photos
  • How and where to access the new features

PaLM 2 large language model

Driving the new and improved products will be PaLM 2 (short for Pathways Language Model), a new large language model designed to be Google’s most advanced AI platform. Built to handle a wide range of tasks, PaLM 2 will be the AI engine behind developments across 25 different Google products and services. Though Google didn’t reveal many details about PaLM 2, the company did boast that the model can handle more than 100 languages with the ability to translate between them.

The PaLM 2 LLM is already up for testing by Google Cloud customers in the medical field where its offshoot, Med-PaLM 2, will answer questions from doctors and other health care professionals. Beyond deploying PaLM 2 in the field of medicine, Google touted how the LLM will be used in such areas as security, math, computer coding and more. As one example, the AI will be able to provide watermarking for photos and other files as a way to distinguish real ones from fake ones.

SEE: Artificial Intelligence Ethics Policy (TechRepublic Premium)

Improvements to Bard AI

Launched this past February, Google’s Bard AI was designed as an alternative and rival to OpenAI’s ChatGPT and Microsoft’s Bing AI. But Bard has received a mixed reception as it seemed underpowered and underdeveloped compared with the competition. Now Google is repositioning Bard to be smarter and more capable.

First, Google has removed the Bard waitlist and is expanding the service from just early adopters in the U.S. and U.K. to everyone across more than 180 countries and regions. Beyond supporting just English, Bard is now available in Japanese and Korean with plans to handle 40 languages in total.

Second, Bard will now be powered by PaLM 2, a move that will help it tackle a range of tasks, especially in the areas of math, reasoning and programming. Google said that Bard will be able to generate and debug code in more than 20 different programming languages. Further, the AI can help users understand the generated code by explaining how and why it’s being used. As one example shown at I/O 2023, Google demonstrated how Bard could program a specific chess move using the Python language.

Google also explained how Bard ties into other tools and services from Google and third parties. Users can ask Bard to create an email or document and then export the content directly into Gmail or Google Docs. Bard will also work with Google Lens as users are able to upload an image to the AI for analysis and then ask it to provide a caption or other content.

Bard will also adopt a more visual style by displaying images, tables and other types of formatting in its responses. Users can ask Bard to pinpoint a specific place or landmark cited in a response, and the AI will show them its location via Google Maps. If a table is created as part of a response, users can move that table to a program like Google Sheets where it retains its formatting.

Further, Bard will be able to communicate with third-party products, apps and services via supported extensions. As one example, Bard can create an image using the Adobe Firefly image generator.

Enhancements to Google Search

Google’s core Search page will also benefit from AI enhancements courtesy of PaLM 2. The new search page will integrate AI-based information with the usual results. To help users focus on the key details, Search will summarize its findings in a single snapshot. The snapshot will contain pointers and links that users can follow to drill down to more details.

The new search is also geared to be more efficient, according to Google. Instead of trying to figure out how to phrase a search query or breaking it down into several different questions, users may be able to type a more complex and detailed query. Google Search will then be better equipped to parse it for them and deliver more accurate results right off the bat.

In some cases, Search will prompt users to ask a follow-up question or display potential questions that they might ask. Choosing a question will then bring them into conversation mode where they will be able to chat with the Search tool to continue to narrow down the information they need.

Enhancements to Google Workspace

Google Workspace is another product getting an AI infusion via PaLM 2. A new feature called Help Me Write available in Gmail and Google Apps will automatically create emails and other content based on requests and descriptions. This option will be available for early testers in June and then will roll out later this year for business users as part of a new AI feature called Duet AI for Workspace.

One tool that will aim to help people better use AI is Sidekick. Writing the correct request, or prompt, when working with an AI can be challenging. The right type of prompt can make a big difference in the response. To help in this area, Sidekick will analyze and summarize a document and then suggest prompts users may want to send to improve the content. As one example shown at I/O 2023, Sidekick suggested adding speaker notes to a presentation in Google Slides.

Improvements to Maps and Photos

With Maps providing directions and information on places around the world, it will also benefit from AI. Maps is taking on a new option called Immersive View for Routes. If a user is mapping out a route to take by walking, driving or biking, Immersive View will visually display the route from start to finish, even providing weather and traffic forecasts along the way. This option will roll out over the summer with support for several major cities, including New York and San Francisco.

Google Photos is yet another product being enhanced with AI via a tool called Magic Editor. Expanding on the current Magic Eraser tool that can erase items in a photo, Magic Editor goes a few steps further. It not only erases but actually moves people and objects in a photo. And if a user moves an object that’s near the edge of the photo closer to the center, Magic Editor will use AI to fill in the missing area.

AI will also play a role in other products unveiled at I/O 2023, including the new Pixel 7a, Pixel Fold and Pixel Tablet. To personalize devices, users can create their own wallpaper based on prompts that they send to an AI.

How and where to access the new features

So how can people tap into Google’s new AI-based products and services? Though most of these won’t be officially available for several months, anyone who’d like a sneak peek can try them out through Google Labs. By signing up to be an early tester with Labs, users can check out the new Search and the new Google Workspace as well as two other tools: an AI-based note taker called Project Tailwind and a tool called MusicLM that turns text into music.

Google Weekly

Google Weekly Newsletter

Learn how to get the most out of Google Docs, Google Cloud Platform, Google Apps, Chrome OS, and all the other Google products used in business environments.

Delivered Fridays Sign up today

Google makes its text-to-music AI public

Google makes its text-to-music AI public Kyle Wiggers 15 hours

Google today released MusicLM, a new experimental AI tool that can turn text descriptions into music. Available in the AI Test Kitchen app on the web, Android or iOS, MusicLM lets users type in a prompt like “soulful jazz for a dinner party” or “create an industrial techno sound that is hypnotic” and have the tool create several versions of the song.

Users can specify instruments like “electronic” or “classical,” as well as the “vibe, mood, or emotion” they’re aiming for, as they refine their MusicLM-generated creations.

When Google previewed MusicLM in an academic paper in January, it said that it had “no immediate plans” to release it. The coauthors of the paper noted the many ethical challenges posed by a system like MusicLM, including a tendency to incorporate copyrighted material from training data into the generated songs.

But in the intervening months, Google says it’s been working with musicians and hosting workshops to “see how [the] technology can empower the creative process.” One of the outcomes? The version of MusicLM in AI Test Kitchen won’t generate music with specific artists or vocals. Make of that what you will.

It seems unlikely, in any case, that the broader challenges around generative music will be easily remedied.

In 2020, Jay-Z’s record label filed copyright strikes against a YouTube channel, Vocal Synthesis, for using AI to create Jay-Z covers of songs like Billy Joel’s “We Didn’t Start the Fire.” After initially removing the videos, YouTube reinstated them, finding the takedown requests were “incomplete.”

But deepfaked music still stands on murky legal ground.

Google MusicLM

Image Credits: Google

Increasingly, homemade tracks that use generative AI to conjure familiar sounds that can be passed off as authentic, or at least close enough, have been going viral. Music labels have been quick to flag them to streaming partners, citing intellectual property concerns. And they’ve generally been victorious in contrast to the Jay-Z case — Spotify removed tens of thousands of AI-generated songs from startup Boomy over the past month following a complaint from Universal Music Group.

A whitepaper authored by Eric Sunray, now a legal intern at the Music Publishers Association, argues that AI music generators like MusicLM violate music copyright by creating “tapestries of coherent audio from the works they ingest in training, thereby infringing the United States Copyright Act’s reproduction right.” Indeed, AI like MusicLM “learns” from existing music to produce similars effects, as alluded to in the paper — a fact with which not all artists are comfortable.

It might not be long before there’s some clarity on the matter. Several lawsuits making their way through the courts will likely have a bearing on music-generating AI, including one pertaining to the rights of artists whose work is used to train AI systems without their knowledge or consent.

Time will tell.

Read more about Google I/O 2023 on TechCrunch

How to join the Google Search Labs waitlist to access its new AI search engine early

Google labs

At Google I/O, the tech giant announced some long awaited generative AI upgrades to its platforms, including Google Search. The new Search with AI, known as the Search Generative Experience (SGE), will include AI-powered snapshots useful for everyday queries, shopping and more.

Also: Every major AI feature announced at Google I/O 2023

Before you get too excited, it hasn't been released to the public yet. However, Google did announce that it will be made available in Search Labs, a new program to access early experiments, in the coming weeks.

How to join the Search Labs waitlist

If interested, you can sign up for Search Labs' waitlist starting today. Here's how.

Google

Google brings new generative models to Vertex AI, including Imagen

Google brings new generative models to Vertex AI, including Imagen Kyle Wiggers 12 hours

To paraphrase Andreessen Horowitz, generative AI, particularly on the text-to-art side, is eating the world. At least, investors believe so — judging by the billions of dollars they’ve poured into startups developing AI that creates text and images from prompts.

Not to be left behind, Big Tech is investing in its own generative AI art solutions, whether through partnerships with the aforementioned startups or in-house R&D. (See: Microsoft teaming up with OpenAI for Image Creator.) Google, leveraging its robust R&D wing, has decided to go the latter route, commercializing its work in generative AI to compete with the platforms already out there.

Today at its annual I/O developer conference, Google announced new AI models heading to Vertex AI, its fully managed AI service, including a text-to-image model called Imagen. Imagen, which Google previewed via its AI Test Kitchen app last November, can generate and edit images as well as write captions for existing images.

“Any developer can use this technology using Google Cloud,” Nenshad Bardoliwalla, director of Vertex AI at Google Cloud, told TechCrunch in a phone interview. “You don’t need to be a data scientist or developer.”

Imagen in Vertex

Getting started with Imagen in Vertex is, indeed, a relatively straightforward process. A UI for the model is accessible from what Google calls the Model Garden, a selection of Google-developed models alongside curated open source models. Within the UI, similar to generative art platforms such as MidJourney and Nightcafe, customers can enter prompts (e.g. “a purple handbag”) to have Imagen generate a handful of candidate images.

Editing tools and follow-up prompts refine the Imagen-generated images, for example adjusting the color of the objects depicted in them. Vertex also offers upscaling for sharpening images, in addition to fine-tuning that allows customers to steer Imagen toward certain styles and preferences.

As alluded to earlier, Imagen can also generate captions for images, optionally translating those captions leveraging Google Translate. To comply with privacy regulations like GDPR, generated images that aren’t saved are deleted within 24 hours, Bardoliwalla says.

“We make it very easy for people to start working with generative AI and their images,” he added.

Of course, there’s a host of ethical and legal challenges associated with all forms of generative AI — no matter how polished the UI. AI models like Imagen “learn” to generate images from text prompts by “training” on existing images, which often come from data sets that were scraped together by trawling public image hosting websites. Some experts suggest that training models using public images, even copyrighted ones, will be covered by the fair use doctrine in the U.S. But it’s a matter that’s unlikely to be settled anytime soon.

Google I/O 2023 Vertex AI

Google’s Imagen model in action, in Vertex AI. Image Credits: Google

To wit, two companies behind popular AI art tools, Midjourney and Stability AI, are in the crosshairs of a legal case that alleges they infringed on the rights of millions of artists by training their tools on web-scraped images. Stock image supplier Getty Images has taken Stability AI to court, separately, for reportedly using millions of images from its site without permission to train the art-generating model Stable Diffusion.

I asked Bardoliwalla whether Vertex customers should be concerned that Imagen might’ve been trained on copyrighted materials. Understandably, they might be deterred from using it if that were the case.

Bardoliwalla didn’t say outright that Imagen wasn’t trained on trademarked images — only that Google conducts broad “data governance reviews” to “look at the source data” inside its models to ensure that they’re “free of copyright claims.” (The hedged language doesn’t come as a massive surprise considering that the original Imagen was trained on a public data set, LAION, known to contain copyrighted works.)

“We have to make sure that we’re completely within the balance of respecting all of the laws that pertain to copyright information,” Bardoliwalla continued. “We’re very clear with customers that we provide them with models that they can feel confident they can use in their work, and that they own the IP generated from their trained models in a completely secure fashion.”

Owning the IP is another matter. In the U.S. at least, it isn’t clear whether AI-generated art is copyrightable.

One solution — not to the problem of ownership, per se, but to questions around copyrighted training data — is allowing artists to “opt out” of AI training altogether. AI startup Spawning is attempting to establish industry-wide standards and tools for opting out of generative AI tech. Adobe is pursuing its own opt-out mechanisms and tooling. So is DeviantArt, which in November launched an HTML-tag-based protection to prohibit software robots from crawling pages for images.

Google I/O 2023 Vertex AI

Image Credits: Google

Google doesn’t offer an opt-out option. (To be fair, neither does one of its chief rivals, OpenAI.) Bardoliwalla didn’t say whether this might change in the future, only that Google is “inordinately concerned” with making sure that it trains models in a way that’s “ethical and responsible.”

That’s a bit rich, I think, coming from a company that canceled an outside AI ethics board, forced out prominent AI ethics researchers and is curtailing publishing AI research to “compete and keep knowledge in house.” But interpret Bardoliwalla’s words as you will.

I also asked Bardoliwalla about steps Google’s taking, if any, to limit the amount of toxic or biased content that Imagen creates — another problem with generative AI systems. Just recently, researchers at AI startup Hugging Face and Leipzig University published a tool demonstrating that models like Stable Diffusion and OpenAI’s DALL-E 2 tend to produce images of people that look white and male, especially when asked to depict people in positions of authority.

Bardoliwalla had a more detailed answer prepped for this question, claiming that every API call to Vertex-hosted generative models is evaluated for “safety attributes” including toxicity, violence and obscenity. Vertex scores models on these attributes and, for certain categories, blocks the response or gives customers the choice as to how to proceed, Bardoliwalla said.

“We have a very good sense from our consumer properties of the type of content that may not be the kind of content that our customers are looking for these generative AI models to produce,” he continued. “This is an area of significant investment as well as market leadership for Google — for us to make sure that our customers are able to produce the results that they’re looking for that doesn’t harm or damage their brand value.”

To that end, Google is launching reinforcement learning from human feedback (RLHF) as a managed service offering in Vertex, which it claims will help organizations maintain model performance over time and deploy safer — and measurably more accurate — models in production. RLHF, a popular technique in machine learning, trains a “reward model” directly from human feedback, like asking contract workers to rate responses from an AI chatbot. It then uses this reward model to optimize a generative AI model along the lines of Imagen.

Google I/O 2023 Vertex AI

Image Credits: Google

Bardoliwalla says that the amount of fine-tuning needed through RLHF will depend on the scope of the problem a customer’s trying to solve. There’s debate within academia as to whether RLHF is always the right approach — AI startup Anthropic, for one, argues that it isn’t, in part because RLHF can entail hiring scores of low-paid contractors that are forced to rate extremely toxic content. But Google feels differently.

“With our RLHF service, a customer can choose a modality and the model and then rate responses that come from the model,” Bardoliwalla said. “Once they submit those responses to the reinforcement learning service, it tunes the model to generate better responses that are aligned with … what an organization is looking for.”

New models and tools

Beyond Imagen, several other generative AI models are now available to select Vertex customers, Google announced today: Codey and Chirp.

Codey, Google’s answer to GitHub’s Copilot, can generate code in over 20 languages including Go, Java, Javascript, Python and Typescript. Codey can suggest the next few lines based on the context of code entered into a prompt or, like OpenAI’s ChatGPT, the model can answer questions about debugging, documentation and high-level coding concepts.

Google I/O 2023 Vertex AI

Image Credits: Google

As for Chirp, it’s a speech model trained “millions” of hours of audio that supports more than 100 languages and can be used to caption videos, offer voice assistance and generally power a range of speech tasks and apps.

In a related announcement at I/O, Google launched the Embeddings API for Vertex in preview, which can convert text and image data into representations called vectors that map specific semantic relationships. Google says that it’ll be used to build semantic search and text classification functionality like Q&A chatbots based on an organization’s data, sentiment analysis and anomaly detection.

Codey, Imagen, the Embeddings API for images and RLHF are available in Vertex AI to “trusted testers,” Google says. Chirp, the Embeddings API and Generative AI Studio, a suite for interacting with and deploying AI models, meanwhile, are accessible in preview in Vertex to anyone with a Google Cloud account.

Read more about Google I/O 2023 on TechCrunch

KDnuggets News, May 10: HuggingChat Python API: Your No-Cost Alternative • Exploratory Data Analysis Techniques for Unstructured Data

We have extended our blog writing contest!

Submit a blog and you could win an NVIDIA RTX 3080 Ti!

Features

  • HuggingChat Python API: Your No-Cost Alternative by Matthew Mayo
  • Exploratory Data Analysis Techniques for Unstructured Data by Aryan Garg
  • Stop Doing this on ChatGPT and Get Ahead of the 99% of its Users by Josep Ferrer

This Week's Posts

  • ChatGPT as a Personalized Tutor for Learning Data Science Concepts by Cornellius Yudha Wijaya
  • The Ultimate Open-Source Large Language Model Ecosystem by Abid Ali Awan
  • How to transition into Data Science from a different background? by Yash Gupta
  • 3 Ways to Access GPT-4 for Free by Abid Ali Awan
  • Vector and Matrix Norms with NumPy Linalg Norm by Bala Priya C
  • ChatGPT in Education: Friend or Foe? by Shannon Flynn
  • Can ChatGPT Be Trusted as an Educational Resource? by April Miller
  • From Data Analyst to Data Strategist: The Career Path for Making an Impact by Ben Farrell
  • Build a ChatGPT-like Chatbot with These Courses by Sara Metwalli
  • Managing Model Drift in Production with MLOps by Youssef Rafaat
  • Monitor Model Performance in the MLOps Pipeline with Python by Cornellius Yudha Wijaya

More On This Topic

  • HuggingChat Python API: Your No-Cost Alternative
  • Exploratory Data Analysis Techniques for Unstructured Data
  • KDnuggets™ News 20:n42, Nov 4: Top Python Libraries for Data Science,…
  • KDnuggets News, June 29: 20 Basic Linux Commands for Data Science…
  • KDnuggets News, August 31: The Complete Data Science Study Roadmap • 7…
  • Statistical and Visual Exploratory Data Analysis with One Line of Code