GenAI Will Fundamentally Change How We Use Smartphones 

The innovation in smartphones has achieved the maximum threshold and flattened the sale. The world was waiting for the new technological breakthrough in smartphones until very recently when a few Qualcomm engineers managed to deploy the text-to-image AI model Stable Diffusion on a smartphone. It was a significant moment in the world of AI.

In a demonstration video, the engineers ran the text-to-image model on a Sony Xperia 5 II handset, which comes with a Qualcomm Snapdragon 865 (Adreno 650), 8GB RAM and more than 30 GB of storage, to generate a 512 x 512 pixel image in less than 15 seconds.

Generally, foundational models require significant compute to run and are generally deployed on the cloud. Enabling generative AI models to operate on smartphones has the potential to herald the next significant breakthrough in AI. This achievement addresses a fundamental challenge: cost-effectiveness.

Moreover, the generative AI innovation could give the smartphone market a significant boost. Qualcomm CEO Cristiano Amon believes generative AI could breathe new life into smartphones.

AI will change how we use smartphones

While smartphones already incorporate AI to some extent, the implementation of generative AI models on these devices has the potential to profoundly transform how we interact with and utilise smartphones. It could help developers explore app development like never done before. Running a model on a device could mean all the apps on your smartphone including your camera could be powered by a foundational model.

Jim Yang, senior AI scientist at NVIDIA stated that LLMs could change how smartphone keyboard functions. LLMs are good at predicting the next text in a sentence and this very capability could be a game changer, according to him. Interestingly, three months after Yang tweeted about the same, Apple revealed something similar at its Worldwide Developers Conference 2023. The phonemaker said autocorrect will now use a transformer language model and on-device ML model to make autocorrect better.

Moreover, generative AI can analyse user behaviour, preferences, and patterns to create personalised user interfaces on smartphones. This can result in customised app layouts, recommendations, and user experiences tailored to individual users. This advancement could effectively transform your smartphone into a personalised assistant. Picture your phone autonomously composing and responding to emails on your behalf. With AI processing taking place directly on your device, the realisation of this scenario becomes increasingly plausible.

Democratising AI

LLMs, if deployed locally on smartphones, have the potential to democratise AI on a global scale, considering there are more than 6.5 billion smartphone users worldwide. With LLMs deployed on a smartphone, users can access AI-powered language capabilities even without an internet connection. This is particularly valuable in areas with limited connectivity or when travelling. Processing AI tasks locally on a smartphone significantly reduces latency compared to cloud-based solutions. This results in quicker responses, making real-time applications more efficient.

LLMs running on a smartphone also consume less data because they don’t rely on constant communication with cloud servers. This is cost-effective, especially in regions with expensive or limited data plans. Moreover, local AI processing can lead to a smoother and more responsive user experience in applications such as virtual assistants, chatbots, and predictive text input.

Making AI economical

To keep ChatGPT running and for free, OpenAI could be burning as much as USD 7,00,000 per day, according to SemiAnalysis’ Chief Analyst Dylan Patel. Today, ChatGPT, even though available as a mobile application, still runs on the cloud. For AI companies, the next big breakthrough could be to run these models locally on devices, as it would lead to a significant reduction in cost.

This is because there is no additional cost involved when it comes to running AI models on smartphones, the users already pay for the hardware cost upfront, unlike running AI models on the cloud. “You need to make the AI hybrid, which means running it both on the data centres as well as locally otherwise it will cost too much money,” Amon told the Financial Times.

Similar to Qualcomm, Google too, earlier this year, revealed that it successfully ran a version of PaLM 2, its latest LLM, on a Samsung Galaxy smartphone. Mediatek, a Taiwanese fabless chipmaker, has also partnered with Meta (previously Facebook) to bring generative AI to smartphones. Interestingly, Qualcomm has also partnered with Meta to make Llama 2 implementations available on-device, harnessing the capabilities of the new AI-enabled Snapdragon chips.

Emad Mostaque, not only believes it could be the next big leap in AI, he even said it could happen within a year. “I believe you will see a ChatGPT level language model (on at least some metrics) on a mobile phone next year and GPT-4 level year after.”

Is Qualcomm the new NVIDIA?

To run foundational models on smartphones, suitable hardware support is imperative. Currently, several startups are dedicated to developing edge AI chips. For instance, Sima.ai, is developing chips to run AI models on the edge.

However, Qualcomm is renowned for its smartphone processors, which power multiple Android smartphones in the market developed by Samsung, OnePlus, Oppo, Vivo, Realme, Redmi, Xiaomi, Lenovo, and Motorola among others.

During the Qualcomm Summit in Hawaii, this October, the San Diego-based company is poised to unveil new chips with the capability to run foundational generative AI models. It firmly advocates that the optimal approach to alleviate the considerable costs linked with cloud-based generative AI is by relocating these models to edge devices. Just as NVIDIA has capitalised on the AI revolution through the development of AI hardware, particularly GPUs, Qualcomm stands to gain substantially by creating chips that enable AI processing on smartphones.

The post GenAI Will Fundamentally Change How We Use Smartphones appeared first on Analytics India Magazine.

Scikit-learn for Machine Learning Cheat Sheet

ML Tools for Your Kit

You want to get started with machine learning. You have a foundational understanding of machine learning concepts. You know Python. What do you do?

The most obvious answer is to get up and running with Scikit-learn. Scikit-learn is an open-source Python library for all kinds of predictive data analysis. You can perform classification, regression, clustering, dimensionality reduction, model tuning, and data preprocessing tasks.

Scikit-learn's unified API interface makes learning how to implement a variety of algorithms and tasks much easier than it would otherwise be. Once you learn the pattern of how to make Scikit-learn calls, you are off and running. The only thing you need after this, beyond your imagination and determination, is a handy reference.

KDnuggets has put together just the thing you need. This cheat sheet covers the basics of what is needed to learn how to use Scikit-learn for machine learning, and provides a reference for moving ahead with your machine learning projects. Much of the most common functionality that you will be using over and over again is covered. Have a look below for confirmation.

You can download the cheatsheet here.

Scikit-learn for Machine Learning Cheatsheet

In the cheat sheet you will find handy references for the following common Scikit-learn tasks:

  • Loading data
  • Splitting the dataset into train and test sets
  • Preprocessing data
  • Performing supervised machine learning tasks
  • Performing unsupervised machine learning tasks
  • Model fitting
  • Prediction
  • Evaluation
  • Cross validation
  • Model tuning

There's no need to wait another minute to become proficient with one of the most-used tools in the machine learning practitioner's toolkit. Once you have Scikit-learn installed, it's simply a matter of following the relevant code snippets in the cheat sheet to be able to started. Just don't forget to keep it handy while you progress.

Check it out now, and check back soon for more.

More On This Topic

  • Streamlit for Machine Learning Cheat Sheet
  • Machine Learning with ChatGPT Cheat Sheet
  • The Best Machine Learning Frameworks & Extensions for Scikit-learn
  • The Ultimate Scikit-Learn Machine Learning Cheatsheet
  • Getting Started with Scikit-learn for Classification in Machine Learning
  • Clustering with scikit-learn: A Tutorial on Unsupervised Learning

What Is an AI Art Generator? Features, Benefits and More

AI art generators are tools that use artificial intelligence algorithms and technologies to create visual artwork. These solutions have become increasingly popular among users of all experience levels, from hobbyists to professional artists. The vast array of creative features and editing options offered by AI art generators have also drawn the attention of businesses for many purposes, including strengthening brands through visual content generation.

This article addresses the top questions people have about AI art generators, including how they work, their features, benefits and challenges. So read on to learn more about AI art generation and how these tools harness artificial intelligence to make users’ visions a reality.

Jump to

  • How do AI art generators work?
  • What are the key features of AI art generators?
  • What are some business use cases of AI art generators?
  • Benefits of using an AI art generator
  • Challenges of using an AI art generator

How do AI art generators work?

AI art generators utilize machine learning techniques to generate unique and original artwork or enhance the creative process for users. While the process of generating a desired output is simple for users, often involving inputting information in the form of a text prompt depending on the AI art generator, it’s a little more complicated on the back end.

Before an AI art generator can get to the point where it can take simplistic input and transform it into a unique image, it must undergo training on specific datasets throughout its development process. AI generators use machine learning algorithms and deep neural networks to learn based on the training data and thereby become able to generate new output utilizing this knowledge.

In the case of AI art generation, an AI art generator is trained using provided data in the form of existing artwork and images. Through deep learning techniques, the software becomes able to recognize the relationships within the data and identify patterns. It can then use this knowledge to produce the desired output based on a text prompt or other querying method.

What are the key features of AI art generators?

The following features and capabilities are included in most AI art generators and enable them to produce visually compelling content for users.

Realistic rendering

Advanced AI art generators can produce outputs that are extremely realistic and can even closely resemble specific artistic mediums (Figure A). The secret is their sophisticated, realistic rendering techniques. Generators with algorithms that use these techniques can take AI-generated artwork to the next level by replicating aspects of existing artwork to generate incredibly lifelike output.

Figure A

DALL·E 2 AI-generated image and the corresponding text prompt.
DALL·E 2 can generate photo-realistic images based on input from simple text queries.

Generative AI algorithms

AI art generators can produce one-of-a-kind artwork based on input data, which is made possible through advanced generative AI algorithms. This involves training the algorithm on existing art data and using this knowledge to generate fresh output in the form of original images.

Creative tools

AI art generators can offer an extensive range of artistic capabilities and options for their users to experiment with creatively. In doing so, these tools encourage creative exploration and freedom for users by letting them modify and adjust their creations every step of the way as they work to achieve their desired outcomes (Figure B).

Figure B

Screencapture of NightCafe AI art generator interface.
Even after the image has been generated, NightCafe provides tools and capabilities for users to enhance and edit their artwork.

Interactive interfaces

To encourage creativity and personalization throughout the art generation process, many AI art generators offer intuitive and user-friendly interfaces. This aspect enables artists to engage with and manipulate their artistic generations as they create them (Figure C).

For instance, interfaces with real-time interactive properties allow users to apply prompt changes, adjust their artwork through unique effects and filters and instantly preview the finished product. Interactive and intuitive abilities like these can encourage users to express their creativity and develop their skills in art composition.

Figure C

Screencapture of getimg.ai AI art generator interface.
getimg.ai lets users experiment with the interactive interface to enhance their creative process and sharpen their artistic skills.

Style transferring

A common feature of AI art generators is style transferring, which grants users the ability to apply aspects of existing artworks and styles to their art creations. Users can choose specific stylistic characteristics or aesthetic designs and apply them to the AI art generator’s output. This functionality empowers users to produce artwork that emulates the distinctive look and atmosphere of renowned artists or artistic movements (Figure D).

Figure D

A collection of AI-generated images from Pikazo.
Pikazo is an AI art generator with a focus on style transferring, as it can transform images into artworks in similar styles to those of famous artists.

Inclusive aspects

An advanced AI art generator should have an interface that is designed to promote inclusivity by providing opportunities for users of all experience levels to express their creativity. This can be achieved in various ways. For example, Midjourney has a Community Showcase feature on its platform where members can view other users’ artwork.

NightCafe also lets users interact through comments and messages and even sends out daily challenges and hosts competitions among users (Figure E). AI art generators like these foster inclusivity through community by creating ways for users to interact with one another and bond over their shared interest in AI art.

Figure E

Screencapture of NightCafe Daily Challenges feature.
NightCafe lets users participate in daily challenges and vote on winners, encouraging interaction among members of the community.

What are some business use cases of AI art generators?

AI art generators can provide various advantages to businesses when applied to professional and occupational purposes. By streamlining certain elements within a business’s creative workflows, AI art generation can save the organization time and resources by generating impressive visuals that meet their specific needs in just a few moments and keystrokes.

Advertising and marketing

Businesses in and outside the creative industries can benefit from the competitive advantages offered by AI art generators, especially regarding advertising. Companies can leverage the output of AI generated images and artwork to create compelling visual advertising content.

Marketing efforts for digital ads, social media posts and other commercial resources can integrate AI-generated graphics, and companies can even gain inspiration for brand logos through these tools. Using these tools for marketing image generation can result in faster content output and a higher likelihood of gaining consumer attention in a crowded marketplace.

Entertainment

Creative sectors like the entertainment industry can use AI art generators to develop concept art, allowing them to convey their ideas through visuals. AI art puts the creative power into the user’s hands so their vision can become a reality. In this case, a creative can use these tools to generate visual graphics representing their desired outcomes. These AI art generator outputs can help people express ideas and streamline collaboration when developing films, video games, comic books and other media forms.

Non-creative industries

Many industries that are not considered creative can improve their processes with these tools and their ability to quickly produce detailed and impressive visuals. For example, e-commerce companies and other sales-centric organizations can query impressive product images to display to potential customers. The graphics generated by these tools can represent existing merchandise prototypes and even show examples of customized products for consumers.

Benefits of using an AI art generator

AI art generators are designed to produce the user’s desired output quickly and efficiently. In doing so, these tools can offer a range of benefits for all users, whether they be artists, companies or anybody looking to express their creativity through AI.

  • Accessibility: These tools allow anyone to participate in the creative process and produce artwork, regardless of their artistic skills, training or capabilities.
  • Speed and efficiency: By automating certain aspects of the creative process, such as editing images, applying filters, generating new designs and performing other creative tasks, AI art generators save users effort and time to reach their desired outcomes.
  • Creative inspiration: AI art generator tools provide unlimited inspiration through their capabilities and can be the perfect artistic outlet for users that are interested in the creative side of AI.

With the future steadily evolving, these solutions can enable people to incorporate art alongside technology with limitless capabilities. These tools let users experience their envisioned image generated in mere moments before their very eyes.

Challenges of using an AI art generator

AI bias

AI art generators are trained on a range of existing artwork and, therefore, learn from content created by humans. Unfortunately, this means it is not uncommon for AI generators to learn from existing content that can include negative representations, damaging stereotypes or other harmful biases.

While most AI art generators are designed to filter out and ignore this content, it is possible that they may produce offensive artwork as an unintended result of their training data.

Legal considerations

Another primary challenge for AI art generators involves the legality surrounding these solutions. While generating artwork through these solutions is currently legal, the use of this artwork is still under debate, with concerns over who would receive credit for the artwork’s creation.

For example, the U.S. has no copyright protections for works generated by AI art generators. In August, a federal judge in Washington, D.C., ruled that AI-created artwork is not eligible for copyright protection. This is due to the artwork’s lack of “human involvement.”

Therefore, businesses looking to take advantage of the benefits of AI art generators will want to consider the legal aspects surrounding the protection of the art generated by these solutions.

The ethics of AI art

The crux of the issue surrounding AI art generators stems from the fact that these tools are trained on existing data and thereby learn based on artists’ work found online. Current backlash indicates that the artists who created the original work did not consent to having their work utilized by these tools. With these tools gaining popularity, more and more debate has arisen regarding the ethics and legality of producing content through their use.

SEE: Check out TechRepublic Premium’s AI ethics policy.

Most AI art generation tools claim to create unique outputs. But still, the knowledge that this output is generated based on existing artwork has caused some to argue that the tools are unethical.

However, human-generated art is often inspired by existing designs, styles and works. Therefore, it is difficult to determine whether these tools are infringing on other artists’ work or whether their adoption of aspects of existing work is no different than a young artist emulating the styles of their favorite classical painters.

Subscribe to the Innovation Insider Newsletter

Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more.

Delivered Tuesdays and Fridays Sign up today

Businesses need pricing clarity as generative AI services hit the market

Dollar sign on abstract AI background

There needs to be clarity around how exactly generative artificial intelligence (AI) will be charged, as market players rush to push out their offerings and businesses look to avoid bill shock.

Transparency around the usage and commercial model is something organizations are asking for, so they can avoid escalating hidden costs, said Tim Dillon, founder and director of Tech Research Asia. There is general concern they will experience bill shock from a consumption-based model, similar to how some had to deal with this challenge during the early days of cloud, he noted.

Also: We're not ready for the impact of generative AI on elections

It is something vendors such as Salesforce will have to figure out as they ramp up their generative AI service offerings, said Dillon in an interview with ZDNET, on the sidelines of Dreamforce 2023 held in San Francisco this week.

The adoption of these tools can grow organically within an organization and, hence, can lead to a lack of control and awareness of their consumption. There also often are no policies guiding the use of generative AI, he said, adding that research suggests 40% of organizations in Asia-Pacific Japan have informal policies around such tools, while 60% have formal policies in place.

Concerns around bill shock are compounded by the softening economy, with companies in the region facing potential budget cuts, he noted. And if prices are tagged in US dollars, consuming generative AI can be an expensive proposition for businesses in some Asia-Pacific markets.

Also: The moment I realized ChatGPT was a game-changer for my business

Acknowledging that concerns about bill shock were valid, Gavin Barfield, Salesforce's Asean vice president and CTO of solutions, said pricing models still are being defined as generative AI services gradually are rolled out.

"We're in the early stages, so all companies are wrangling with these issues," Barfield told ZDNET. He noted that the same issues had surfaced when cloud services were first launched.

"As the market and product mature, these things will get ironed out," he said, adding that market players will need to find ways to price generative AI services. Salesforce itself is looking at a variety of pricing models but for now has opted for a credits-based system for a couple of services, according to Barfield. How much credits are consumed depends on how the AI model is called to run the query.

In July, Salesforce announced that Sales GPT, which is shipped with Sales Cloud Einstein, is available at $50 per user per month and includes a limited number of Einstein GPT credits. Service GPT, shipped with Service Cloud Einstein, also is priced at $50 per user per month and includes a limited number of Einstein GPT credits.

Also: How trusted generative AI can improve the customer experience

Customers of either generative AI services can purchase Enterprise Expansion packs for more credits when their usage grows.

Because generative AI services are based on a usage model, it is critical that companies can monitor their consumption, said Jan Morgenthal, chief digital officer of Singapore telco, M1.

Speaking to ZDNET at the conference, he noted the need to be able to measure and forecast how much these tools are used within his organization. M1 currently uses several AI tools from various vendors, including Salesforce, and is also testing generative AI services.

Having a dollar a value, for instance, will enable him to manage the number of queries that should be made with these tools.

Morgenthal noted that, depending on the complexity of a particular use case and the AI model needed to automate or generate a response, it may not make sense in terms of the ROI (returns on investment) to power the query with generative AI.

This is an issue that companies will need to be cautious about or costs can escalate. The automation gained from the generative AI then may not be worth the cost of its delivery, he said.

Also: Everyone wants responsible AI but few people are doing anything about it

It also means organizations have to map out the processes, including data availability, needed to run a query and achieve the desired outcome, so they can measure the cost of applying generative AI to the use case.

Letting customers create their own prompts

Salesforce this week previewed new generative AI offerings that its executives said would enable enterprise customers to more easily customize these tools to support their operations.

Among them is the Einstein Copilot, touted as a conversational AI assistant that can be integrated with any Salesforce application, enabling users to ask questions in natural language.

Responses are generated based on proprietary company data powered by Salesforce Data Cloud, previously called Genie. The data engine pulls together any datasets, including customer data, telemetry data, and Slack conversations, to create a unified view of the customer.

Data Cloud currently processes 30 trillion transactions per month and connects 100 billion records daily, according to Salesforce. The data engine is now natively integrated with Einstein 1 Platform, enabling businesses to apply AI, automation, and analytics to every customer experience.

It enables Einstein Copilot to provide options for additional actions beyond the user's query, such as a recommended action plan after a sales call.

Also: 4 ways to increase the usability of AI, according to industry experts

Organizations that want to build generative AI applications with their own customized prompts, skills, and AI models also can do so via the Einstein Copilot Studio. It encompasses the Prompt Builder, which lets users create and test generative AI prompts that are aligned with their corporate brand and communication style. And they can do so without any technical expertise, enabling marketing executives to ask Prompt Builder to generate a tailored message based on a customer's purchase or order history.

Einstein Copilot Studio also includes Skills Builder, which allows companies to create custom AI-driven actions to run specific tasks. For example, it can create a "competitor analysis" skill that analyzes current market data, sales figures, and send API calls to extract data from external sources.

In addition, a Model Builder component enables organizations that want to use their own AI models. They can choose one of Salesforce's proprietary LLMs (large language models) or integrate their preferred predictive and generative partner AI models. They can train these on data in Data Cloud without moving or copying data.

This means Einstein Copilot can provide more accurate insights and content that are tailored to the company's employee or customer dynamics.

Model Builder eventually will support external LLMs that include Amazon Bedrock, Google Cloud's Vertex AI, Anthropic, and Cohere. For now, it only supports OpenAI.W

Also: What technology analysts are saying about the future of generative AI

Einstein Copilot currently is in pilot, while Copilot Studio will enter pilot later this fall. Einstein Trust Layer enhancements will be generally available in the vendor's Einstein platform from October 2023. No pricing details are available for the new offerings.

Data Cloud currently is included for Enterprise Edition or above customers at no cost. It encompasses capabilities that allow organizations to unify 10,000 customer profiles and includes two Tableau Creator licenses.

In addition, a new Einstein Trust Layer will underpin all Einstein products, providing a secure AI architecture that Salesforce said will ensure its customers' generative AI responses are powered by quality data that are checked against potential bias and security and privacy standards.

Integrated with Salesforce Data Cloud, the Trust Layer mitigates such risks, checking data against toxicity and brand risks, masking personal identifiable information, and not retaining customer data.

Einstein Trust Layer enhancements will be generally available in October 2023, according to Salesforce.

"The reality is every company will undergo an AI transformation to increase productivity, drive efficiency, and deliver incredible customer and employee experiences," said Marc Benioff, Chair and CEO, Salesforce. "With Einstein Copilot and Data Cloud we're making it easy to create powerful AI assistants and infuse trusted AI into the flow of work across every job, business, and industry. In this new world, everyone can now be an Einstein."

Based in Singapore, Eileen Yu reported for ZDNET from Dreamforce 2023 in San Francisco, USA, on the invitation of Salesforce.com.

Artificial Intelligence

OneDrive users will soon be able to access their files offline

onedrive-gettyimages-1232120148

Microsoft is planning a major improvement for OneDrive that will allow you to work fully with all your files offline. In an update to its Microsoft 365 roadmap, the company revealed a OneDrive option known as Offline mode.

Slated to be available for preview this November and then start rolling out for general release in December, Offline mode will let you access your files via your browser when offline and then sync any changes when you're back online.

Also: The best cloud storage services

"This feature will allow you to launch OneDrive in your browser and view, sort, rename, move, copy, and delete files even without internet access," Microsoft said in its roadmap. "Additionally, for locally stored OneDrive files (those that are marked as 'always available offline') you will be able to open and work on these in your browser even if you are offline. All of the changes you make offline will be automatically synced back when internet connection is restored."

Currently, OneDrive's offline capabilities are limited. Without internet access, you can't sign into your OneDrive online storage or work with your files via the browser. You can retrieve your OneDrive files in Windows but only if you've opted to always keep them on the device. Letting you more easily use and manage your files offline will help anyone who needs to work in a remote location that lacks an internet connection.

OneDrive is one of those tools that does its job in the background and so doesn't always get a lot of love and attention. Even Microsoft hasn't launched any major enhancements for the platform, opting instead to dole out minor updates from time to time. But now it looks as if OneDrive will be the focus of some much-needed improvements.

Last week, Microsoft teased an online event scheduled for October 3 at 1pm ET. Dubbed "Microsoft OneDrive: The Future of File Management is Here," the event will spotlight new features for the file storage service and provide a "sneak peek" into how AI will fuel new search, sharing, and queries for your files.

Also: How to take advantage of Microsoft OneDrive in Windows 11

Amid all the buzz around artificial intelligence, Microsoft has been enhancing many of its core products with AI smarts. Now it will be OneDrive's turn to feed at the AI trough.

Following the event, Microsoft will allow for questions in a live chat-based Q&A. Interested participants can add the OneDrive event to their calendar and follow a link to join the event via Microsoft Teams when it kicks off.

More Microsoft

Now, Build Software Engineering Teams Using AI within Minutes 

Yes, it’s possible. With the rise of AI agents that use LLMs to autonomously run tasks, the next step of evolution involves the integration of multiple agents that work together to accomplish tasks. With MetaGPT already serving the same purpose, it looks like more such agents are coming to the forefront — the recent one being ChatDev, a virtual chat-powered company that aids software development. The question is, what uniqueness does this agent bring to the table?

Communicative Agents

A team of 12 researchers from Dalian University of Technology, Beijing University and Brown University have built a multi-agent team ChatDev that will help build a software within minutes. ChatDev follows a structured approach similar to the waterfall model, a linear, sequential approach for software development.

It breaks down the development process into four clear phases: design, coding, testing, and documentation. Each phase involves a team of agents, including programmers, code reviewers, and test engineers, promoting teamwork and ensuring a smooth workflow.

Representation of ChatDev Functioning. Source: ChatDev

On receiving an assignment/task such as creating ‘a gomoku game’ as explained in the paper, the ChatDev agents actively engage in effective communication and mutual verification through collaborative chatting. This process enables them to automatically craft comprehensive software solutions that encompass source codes, environment dependencies, and user manuals.

A chat chain serves as a mediator, dividing each stage into smaller, individual tasks. This dual role allows for the suggestion and confirmation of solutions through context-aware communication, ultimately leading to the effective completion of specific subtasks.

What about MetaGPT?

When MetaGPT was introduced, the multi-agent framework was trending on GitHub with 20,000 stars. Similar to ChatDev, MetaGPT connects different AI agents that have been assigned various roles such as product managers, architects, project managers, and engineers, to function together. Though similar in implementing multiple agents, the purpose and approach taken by both are different.

Development vs Solution-based

ChatDev, a chat-powered company, is specifically focused on software development, whereas, MetaGPT is designed to enhance the capabilities of existing multi-agent systems that will specifically address the limitation in solving complex tasks.

MetaGPT achieves it by encoding Standardised Operating Procedures (SOPs) into prompts to improve structured coordination among agents. It also mandates modular outputs, empowering agents with domain expertise to validate results and reduce errors. Instead of relying solely on the language model’s inherent knowledge, specific guidelines and procedures are provided to guide the agents in their interactions. ChatDev, on the other hand, follows a waterfall method, a project management and development methodology, dividing the work into multiple stages such as designing, coding, etc, which is particularly catered for software development. It uses a chat chain to facilitate communication and task breakdown.

Large Language Model

ChatDev has been experimented on the gpt3.5-turbo-16k version of ChatGPT. On the other hand, MetaGPT employs GPT4- 32k, and is said to have surpassed GPT-4 in percentage of pass rates on MBPP and HumanEval. ChatDev has not been compared with other LLMs.

Costing

ChatDev paper mentions the astounding efficacy in software generation. It claims that the entire software development process took under seven minutes at a cost of less than $1. For a project using the MetaGPT framework, it takes 516 seconds on an average, and $1.12, with a maximum cost of $1.35.

Minimising Hallucinations

Creating software systems directly with LLMs can also produce code-related hallucinations. These issues might manifest as incomplete implementations, absent dependencies, and undetected bugs. Such hallucinations can arise due to task vagueness and a lack of cross-checking in the decision-making process. However, this is largely addressed in ChatDev by introducing thought instruction mechanisms into each autonomous chat process during code completion, reviewing and testing stage. By performing a ‘role flip’, an instructor injects specific thoughts for code modifications into instructions.

MetaGPT framework incorporates efficient human workflows as a meta programming approach into a LLM-based multi-agent collaboration, and looks to address hallucinations through it. However, no further details on how it will achieve it is given in the paper.

With ChatDev, the process of multiple teams and people to accomplish various tasks can be eliminated. Building an entire software within minutes is no easy feat, and ChatDev effortlessly accomplishes it saving time, cost and resources. If put to use, ChatDev-type models can probably revolutionise software development workflow.

The post Now, Build Software Engineering Teams Using AI within Minutes appeared first on Analytics India Magazine.

KDnuggets News, September 13: Getting Started with SQL in 5 Steps • Introduction to Databases in Data Science

Features

From Our Partners

This Week on KDnuggets

From Around The Web

More On This Topic

  • Getting Started with SQL in 5 Steps
  • KDnuggets News, September 21: 7 Machine Learning Portfolio Projects to…
  • Introduction to Databases in Data Science
  • Getting Started with Python Data Structures in 5 Steps
  • KDnuggets News, September 14: Free Python for Data Science Course •…
  • Getting Started with SQL Cheatsheet

GenAI Adoption, By the Numbers

GenAI Adoption, By the Numbers September 13, 2023 by Alex Woodie

As the freight train that is generative AI continues barreling down the track to an uncertain destination, we thought it would be good to take some time to stop and ponder where we’re currently at in terms of GenAI adoption.

It’s been quite a ride since the launch of ChatGPT in late November 2022 ignited the GenAI revolution. While we have been tracking the development of generative AI technology for more than five years here at Datanami, there’s no denying the huge impact that OpenAI’s debut of ChatGPT is having on the world.

To gain a better understanding of GenAI’s impact on business and society, we aggregated recent research on GenAI adoption, and present the findings here.

50% Productivity Increase

Adopting GenAI brings the potential for significant efficiency gains in the workplace, on the order of 50%, according to Teradata Chief Product Officer Hillary Ashton.

“We know as an industry that productivity is significantly improved, somewhere over 40%, when you start bringing these capabilities into the market,” the CPO tells Datanami, citing reports. “The low end is 40%, the higher end is 60% improvement in productivity through GenAI capabilities in terms of getting work done faster.”

GenAI will make somebody rich — but probably not you (pathdoc/Shutterstock)

$4.4 Trillion Economic Gain

GenAI’s impact on the world’s economy could between $2.6 trillion and $4.4 trillion annually, according to a June McKinsey report. The impact would be twice that if McKinsey factored embedded GenAI, it said.

Much of the benefit would come right where you expect it: automating away the work done by human workers. “Current generative AI and other technologies have the potential to automate work activities that absorb 60% to 70% of employees’ time,” the company says.

More Than 40% Experimenting

According to Deloitte’s third-quarter CFO Signals report, 42% of companies are currently experimenting with GenAI, with 15% actively incorporating it into their business strategy. The survey found that 24% are reading and talking about it, while 17% of survey respondents say it’s too soon to make a decision on its use in their companies.

The potential impact to risk and internal controls was the top concern of GenAI adoption, according to Deloitte’s report, with 57% citing that concern, followed by 52% citing data infrastructure and technology needs and 51% citing investment needs. Governance, ethical, and legal impact fell further down the list.

80% Have Been Exposed To It

A McKinsey study from April found that 79% percent of respondents say they’ve had at least some exposure to GenAI, “either for work or outside of work.” Twenty-two percent say they are regularly using it in their own work, McKinsey says.

The early GenAI projects seem to be going well, according to the McKinsey report, which found that 40% of the companies who have already adopted GenAI say they plan to increase their overall AI investments as a result of their GenAI projects.

Yes, we know GenZ has the GenAI riz. Please get off the lawn anyway (Wpadington/Shutterstock)

70% of GenZ Using It

A Salesforce survey found that 70% of the Gen Z populace, or people born between about 1996 and 2012, are using GenAI. “Gen Z is paving the path for generative AI,” the company said in its Generative AI Snapshot Research report released September 7.

Nearly two-thirds of GenAI users, or 65%, are Millennials or Gen Z, the report found. Among the people of the world who don’t use GenAI, 68% are Gen X or Baby Boomers, the report found. The report found that 48% of Gen Z believe they are on their way to mastering GenAI technology.

Nearly 100% Adoption by Devs

A new Sonatype survey of 800 developers released today found that 97% of DevOps and SecOps personnel are using GenAI today. Not all software engineers are using it for development, however. Sonatype’s survey found that 45% of SecOps engineering leads have already adopted GenAI, compared to 31% for DevOps.

The survey found that 74% report feeling pressure to use GenAI despite identified security risks. “Adoption has been widespread across the board, and the software development cycle is no exception,” said Brian Fox, Sonatype’s co-founder and CTO, in a press release. “While productivity dividends are clear, our data also exposes a concerning, hand-in-hand reality: the security threats posed by this still-nascent technology.”

75% of Companies to Ban It

Despite the documented productivity gains, many companies around the world are planning to ban its use. According to a new report by BlackBerry, which surveyed 2,000 IT decision-makers across North America, Europe, and Asia, 75% of organizations are considering or implementing a ban on ChatGPT and other GenAI apps in the workplace.

Security and privacy are the biggest threats of ChatGPT and GenAI, cited by 67% of organizations that are implementing a ban or considering one. Fifty-seven percent of the survey-takers cited risk to corporate reputation.

But it’s not all doom and gloom, as the BlackBerry survey shows that there are clear benefits to the tech. Possible advantages of GenAI and ChatGPT, according to the survey, includes increasing efficiency (cited by 55%), innovation (cited by 52%), and enhancing creativity (cited by 51%).

Biggest Impact in 3 to 5 Years

A new survey by KPMG US found that nearly two-thirds (65%) of 225 U.S. executives surveyed earlier this year believe GenAI will have a high or extremely high impact on their organization in the next three to five years.

However, 60% of the executives say they are still a year or two away from implementing their first GenAI solution, the company says, adding that companies are planning on taking about six to 12 months to research GenAI, evaluating internal capabilities, and investing in GenAI tools.

Related

About the author: Alex Woodie

Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.

Personalized news app Artifact becomes a discovery engine for the web with new Links feature

Personalized news app Artifact becomes a discovery engine for the web with new Links feature Sarah Perez @sarahintampa / 7 hours

Artifact, the personalized news aggregator built by Instagram’s co-founders, is launching a new feature today that takes the app in a different direction beyond tracking, summarizing, and commenting on the news. Now, users will be able to share any link from the web in order to view a personalized feed of links based on their interests — something that puts the app in more direct competition with social apps for sharing text or links, like X or Threads.

The feature, simply called Links, is meant to showcase what’s possible with Artificat’s AI technology, the company explains.

To get started, you only have to share a URL, which is then presented in a visual feed within a new Links tab in the app. Here, a familiarly named “For You” feed will show you other links based on your interests.

In addition to sharing a link, Artifact’s users are also able to optionally write a caption or add photos to share their thoughts or “hot takes,” or curate the most interesting photos as part of their post. This has the effect of turning users into creators of a sort who could build a following on Artifact. The company says the app will begin showing posted links to other relevant people to help the creator build an audience.

In addition, the new Links feature will offer creative tools that help users to quickly upload, crop and reorder their images. And, to help with caption writing, creators can turn to Artificat’s AI which was originally used to summarize long news stories.

Creators can also share quotes when reading articles on Artifact by highlighting text and then choosing “Share to Links” from the new pop-up menu.

The company says early adopters have shared new products, videos, headline recaps, app reviews, recipes, architecture slideshows, and much more via Links. This makes the app feel like a competitor to not only social apps like X, but also Flipboard, whose social magazines are curated by its users, or even a link pinboarding site like Pinterest.

Though Artifact had been inching into social networking territory with recent launches that added commenting on articles and tools for writers to claim their profiles and track their readership, the new Links feature goes a step further.

Now, users can search for and follow anyone who’s created a social profile on the app. Here, new posts from people you follow will appear under a new “Following” section. These profiles have been redesigned to feature recent links posted, as well.

With Links, using Artifact becomes less of a passive experience for news reading and one where its users can actively participate in news discovery. But that could also mean some users may try to feed the app with content from smaller sites, blogs, and other less legitimate news sources where information isn’t as thoroughly fact-checked as the mainstream publications Artifact first supported.

The company says it will address this potential problem with a two-layered approach. First, it will leverage AI that utilizes a number of third-party moderation services to look for bad content, as defined in its community guidelines. The second layer is a manual review that is both crowdsourced and also actively monitored by the team, we’re told.

“The same AI that powers article discovery can be used for anything on the web — though it’s less tractable to crawl it all, and likely better to let users choose the best stuff,” explains Artifact co-founder Kevin Systrom, who built the app alongside his Instagram co-founder Mike Krieger. “Artifact is about feeding your curiosity, and that goes beyond the best publishers in the world. Sometimes the best content is on a small blog that deserves to be discovered on a platform like this,” he adds.

The Links feature is available on iOS and Android, with posting coming very soon to Android.

Kevin Systrom explains why Artifact wants to treat writers like the creators they are

Artifact, the news aggregator from Instagram’s co-founders, adds a social discussions feature

AnomalyGPT: Detecting Industrial Anomalies using LVLMs

Recently, Large Vision Language Models (LVLMs) such as LLava and MiniGPT-4 have demonstrated the ability to understand images and achieve high accuracy and efficiency in several visual tasks. While LVLMs excel at recognizing common objects due to their extensive training datasets, they lack specific domain knowledge and have a limited understanding of localized details within images. This limits their effectiveness in Industrial Anomaly Detection (IAD) tasks. On the other hand, existing IAD frameworks can only identify sources of anomalies and require manual threshold settings to distinguish between normal and anomalous samples, thereby restricting their practical implementation.

The primary purpose of an IAD framework is to detect and localize anomalies in industrial scenarios and product images. However, due to the unpredictability and rarity of real-world image samples, models are typically trained only on normal data. They differentiate anomalous samples from normal ones based on deviations from the typical samples. Currently, IAD frameworks and models primarily provide anomaly scores for test samples. Moreover, distinguishing between normal and anomalous instances for each class of items requires the manual specification of thresholds, rendering them unsuitable for real-world applications.

To explore the use and implementation of Large Vision Language Models in addressing the challenges posed by IAD frameworks, AnomalyGPT, a novel IAD approach based on LVLM, was introduced. AnomalyGPT can detect and localize anomalies without the need for manual threshold settings. Furthermore, AnomalyGPT can also offer pertinent information about the image to engage interactively with users, allowing them to ask follow-up questions based on the anomaly or their specific needs.

Industry Anomaly Detection and Large Vision Language Models

Existing IAD frameworks can be categorized into two categories.

  1. Reconstruction-based IAD.
  2. Feature Embedding-based IAD.

In a Reconstruction-based IAD framework, the primary aim is to reconstruct anomaly samples to their respective normal counterpart samples, and detect anomalies by reconstruction error calculation. SCADN, RIAD, AnoDDPM, and InTra make use of the different reconstruction frameworks ranging from Generative Adversarial Networks (GAN) and autoencoders, to diffusion model & transformers.

On the other hand, in a Feature Embedding-based IAD framework, the primary motive is to focus on modeling the feature embedding of normal data. Methods like PatchSSVD tries to find a hypersphere that can encapsulate normal samples tightly, whereas frameworks like PyramidFlow and Cfl project normal samples onto a Gaussian distribution using normalizing flows. CFA and PatchCore frameworks have established a memory bank of normal samples from patch embeddings, and use the distance between the test sample embedding normal embedding to detect anomalies.

Both these methods follow the “one class one model”, a learning paradigm that requires a large amount of normal samples to learn the distributions of each object class. The requirement for a large amount of normal samples make it impractical for novel object categories, and with limited applications in dynamic product environments. On the other hand, the AnomalyGPT framework makes use of an in-context learning paradigm for object categories, allowing it to enable interference only with a handful of normal samples.

Moving ahead, we have Large Vision Language Models or LVLMs. LLMs or Large Language Models have enjoyed tremendous success in the NLP industry, and they are now being explored for their applications in visual tasks. The BLIP-2 framework leverages Q-former to input visual features from Vision Transformer into the Flan-T5 model. Furthermore, the MiniGPT framework connects the image segment of the BLIP-2 framework and the Vicuna model with a linear layer, and performs a two-stage finetuning process using image-text data. These approaches indicate that LLM frameworks might have some applications for visual tasks. However, these models have been trained on general data, and they lack the required domain-specific expertise for widespread applications.

How Does AnomalyGPT Work?

AnomalyGPT at its core is a novel conversational IAD large vision language model designed primarily for detecting industrial anomalies and pinpointing their exact location using images. The AnomalyGPT framework uses a LLM and a pre-trained image encoder to align images with their corresponding textual descriptions using stimulated anomaly data. The model introduces a decoder module, and a prompt learner module to enhance the performance of the IAD systems, and achieve pixel-level localization output.

Model Architecture

The above image depicts the architecture of AnomalyGPT. The model first passes the query image to the frozen image encoder. The model then extracts patch-level features from the intermediate layers, and feeds these features to an image decoder to compute their similarity with abnormal and normal texts to obtain the results for localization. The prompt learner then converts them into prompt embeddings that are suitable to be used as inputs into the LLM alongside the user text inputs. The LLM model then leverages the prompt embeddings, image inputs, and user-provided textual inputs to detect anomalies, and pinpoint their location, and create end-responses for the user.

Decoder

To achieve pixel-level anomaly localization, the AnomalyGPT model deploys a lightweight feature matching based image decoder that supports both few-shot IAD frameworks, and unsupervised IAD frameworks. The design of the decoder used in AnomalyGPT is inspired by WinCLIP, PatchCore, and APRIL-GAN frameworks. The model partitions the image encoder into 4 stages, and extracts the intermediate patch level features by every stage.

However, these intermediate features have not been through the final image-text alignment which is why they cannot be compared directly with features. To tackle this issue, the AnomalyGPT model introduces additional layers to project intermediate features, and align them with text features that represent normal and abnormal semantics.

Prompt Learner

The AnomalyGPT framework introduces a prompt learner that attempts to transform the localization result into prompt embeddings to leverage fine-grained semantics from images, and also maintains the semantic consistency between the decoder & LLM outputs. Furthermore, the model incorporates learnable prompt embeddings, unrelated to decoder outputs, into the prompt learner to provide additional information for the IAD task. Finally, the model feeds the embeddings and original image information to the LLM.

The prompt learner consists of learnable base prompt embeddings, and a convolutional neural network. The network converts the localization result into prompt embeddings, and forms a set of prompt embeddings that are then combined with the image embeddings into the LLM.

Anomaly Simulation

The AnomalyGPT model adopts the NSA method to simulate anomalous data. The NSA method uses the Cut-paste technique by using the Poisson image editing method to alleviate the discontinuity introduced by pasting image segments. Cut-paste is a commonly used technique in IAD frameworks to generate simulated anomaly images.

The Cut-paste method involves cropping a block region from an image randomly, and pasting it into a random location in another image, thus creating a portion of simulated anomaly. These simulated anomaly samples can enhance the performance of IAD models, but there is a drawback, as they can often produce noticeable discontinuities. The Poisson editing method aims to seamlessly clone an object from one image to another by solving the Poisson partial differential equations.

The above image illustrates the comparison between Poisson and Cut-paste image editing. As it can be seen, there are visible discontinuities in the cut-paste method, while the results from Poisson editing seem more natural.

Question and Answer Content

To conduct prompt tuning on the Large Vision Language Model, the AnomalyGPT model generates a corresponding textual query on the basis of the anomaly image. Each query consists of two major components. The first part of the query consists of a description of the input image that provides information about the objects present in the image along with their expected attributes. The second part of the query is to detect the presence of anomalies within the object, or checking if there is an anomaly in the image.

The LVLM first responds to the query of if there is an anomaly in the image? If the model detects anomalies, it continues to specify the location and the number of the anomalous areas. The model divides the image into a 3×3 grid of distinct regions to allow the LVLM to verbally indicate the position of the anomalies as shown in the figure below.

The LVLM model is fed the descriptive knowledge of the input with foundational knowledge of the input image that aids the model’s comprehension of image components better.

Datasets and Evaluation Metrics

The model conducts its experiments primarily on the VisA and MVTec-AD datasets. The MVTech-AD dataset consists of 3629 images for training purposes, and 1725 images for testing that are split across 15 different categories which is why it is one of the most popular dataset for IAD frameworks. The training image features normal images only whereas the testing images feature both normal and anomalous images. On the other hand, the VisA dataset consists of 9621 normal images, and nearly 1200 anomalous images that are split across 12 different categories.

Moving along, just like the existing IAD framework, the AnomalyGPT model employs the AUC or Area Under the Receiver Operating Characteristics as its evaluation metric, with pixel-level and image-level AUC used to assess anomaly localization performance, and anomaly detection respectively. However, the model also utilizes image-level accuracy to evaluate the performance of its proposed approach because it uniquely allows to determine the presence of anomalies without the requirement of setting up the thresholds manually.

Results

Quantitative Results

Few-Shot Industrial Anomaly Detection

The AnomalyGPT model compares its results with prior few-shot IAD frameworks including PaDiM, SPADE, WinCLIP, and PatchCore as the baselines.

The above figure compares the results of the AnomalyGPT model in comparison with few-shot IAD frameworks. Across both datasets, the method followed by AnomalyGPT outperforms the approaches adopted by previous models in terms of image-level AUC, and also returns good accuracy.

Unsupervised Industrial Anomaly Detection

In an unsupervised training setting with a large number of normal samples, AnomalyGPT trains a single model on samples obtained from all classes within a dataset. The developers of AnomalyGPT have opted for the UniAD framework because it is trained under the same setup, and will act as a baseline for comparison. Furthermore, the model also compares against JNLD and PaDim frameworks using the same unified setting.

The above figure compares the performance of AnomalyGPT when compared to other frameworks.

Qualitative Results

The above image illustrates the performance of the AnomalyGPT model in unsupervised anomaly detection method whereas the figure below demonstrates the performance of the model in the 1-shot in-context learning.

The AnomalyGPT model is capable of indicating the presence of anomalies, marking their location, and providing pixel-level localization results. When the model is in 1-shot in-context learning method, the localization performance of the model is slightly lower when compared to unsupervised learning method because of absence of training.

Conclusion

AnomalyGPT is a novel conversational IAD-vision language model designed to leverage the powerful capabilities of large vision language models. It can not only identify anomalies in an image but also pinpoint their exact locations. Additionally, AnomalyGPT facilitates multi-turn dialogues focused on anomaly detection and showcases outstanding performance in few-shot in-context learning. AnomalyGPT delves into the potential applications of LVLMs in anomaly detection, introducing new ideas and possibilities for the IAD industry.