Hugging Face Takes on ChatGPT With HuggingChat 

Hugging Face, the go-to AI platform for open-source, has just released an open-source alternative to the internet’s favourite chatbot, ChatGPT. Named HuggingChat, this release offers various functionalities and integrations catering to both developers and users alike.

With its sleek web interface, HuggingChat is available for testing, allowing firsthand experience. Additionally, it can be integrated with existing apps and services through HF’s API, opening up possibilities to customise and use in various domains. From writing complex code to composing emails and even crafting awe-inspiring rap lyrics, HuggingChat is quite versatile.

The model behind the chatbot was developed by Open Assistant, a passion project organised by LAION, a German nonprofit recognised for creating the data set to train Stable Diffusion, an text-to-image AI model. Open Assistant’s efforts is empowering users to personalise and extend HF’s product as per their needs and also remains efficient enough to run on a common hardware.

However, challenges do exist. HuggingChat, like its counterparts, is not immune to setbacks. Depending on the questions it’s asked, it may veer off course, a fact that Hugging Face acknowledged. As per HF, the chatbot showcases that it is now possible to build an open source alternative to ChatGPT.

For now, it runs on OpenAssistant’s latest LLaMA based model but the long term plan is to expose all good-quality chat models from the Hub. As Meta’s LLaMA is bound by industrial licences it is not possible to directly distribute LLaMa-based models. Instead Open Assistant provided XOR weights for the OA models.

Including Meta’s LLaMA, large language models were in a legal grey area as they were being trained on ChatGPT output up until two weeks ago. Databricks figured out a way around this with Dolly 2.0. The differentiating factor between other ‘open source’ models and Dolly 2.0 is that it is available for commercial purposes without the need to pay for API access or share data with third parties unlike the rest.

But Meta’s commercial licence has not stopped the developer community from taking full advantage of the technology. They have optimised the model to operate on even the most basic of devices, introduced additional functionality, and even employed LLaMA to create use cases.

Read more: 7 Ways Developers are Harnessing Meta’s LLaMA

The post Hugging Face Takes on ChatGPT With HuggingChat appeared first on Analytics India Magazine.

OpenAI Spoils Users With More Data Control for ChatGPT 

OpenAI Spoils Users With More Data Control for ChatGPT

Amid rising concerns around data misuse, OpenAI today announced a new feature that lets users turn off chat history in ChatGPT. The company also announced its plans to launch ChatGPT Business in the coming months.

OpenAI, in its blog post, said that conversations that are initiated when chat history is disabled won’t be used to train and improve their models, and will not appear in the history sidebar.

Further, it said when the chat history is disabled, OpenAI will retain new conversations for 30 days and review them only when needed to monitor for abuse, before deleting them permanently.

These controls, which are rolling out to all users starting today, can be found in ChatGPT’s settings and can be changed at any time. We hope this provides an easier way to manage your data than our existing opt-out process.

This new update comes against the backdrop of recent incidents of employees leaking sensitive information on ChatGPT. For instance, a Samsung employee leaked critical information while using ChatGPT to correct errors in their source code. The company even cautioned its employees against using the chatbot.

Read: ChatGPT Has Its Eyes on Your Data

Besides Samsung, many organisations in the last few weeks have instructed employees to strictly not use ChatGPT due to risk confined to intentional leaks or cyber breaches, that could stem from the usage of such tools.

OpenAI said that it is also working on a new ChatGPT Business subscription for professionals who need more control over their data as well as enterprise looking to manage their end users. “ChatGPT Business will follow our API’s data usage policies,” said the company, stating that end users’ data will not be used to train their model by default.

Lastly, OpenAI has also released a new export option in settings, where users get to export their ChatGPT data and understand what information ChatGPT can store. “You’ll receive a file with your conversations and all other relevant data in email.”

The post OpenAI Spoils Users With More Data Control for ChatGPT appeared first on Analytics India Magazine.

Beginning of the End of OpenAI

Beginning of the End of OpenAI

Google has been pushing hard to get ahead of the Microsoft and OpenAI partnership in AI by bringing together DeepMind and Google Brain as one single entity. Meanwhile, OpenAI, already in the lead, has decided to take a different approach when it comes to their products and staying on top of the game by putting up a trademark on ‘GPT’.

Yes, OpenAI has filed for a trademark on ‘GPT’ with the United States Patent and Trademark Office (USPTO). The application was made in December 2022, but OpenAI recently petitioned the USPTO to hasten the process because a lot of apps named after GPT were springing up. However, the application is still up and pending and might take up to 4-5 months more to get approved, as Jefferson Scher, a partner in the intellectual property group of Carr & Ferrell told TechCrunch. In a bid to catch up on the delay, the company has released brand guidelines on its website to ensure that no claims are made by people building an AI or GPT disguising as OpenAI.

OpenAI’s concerns are understandable. Ever since the company decided to make its ChatGPT API publicly available, most of the products being made and launched using it are named with the GPT suffix. OpenAI would clearly not want their popularity to falter.

Guidelines Claiming Ownership

OpenAI believes that it owns the ‘GPT’ technology. And it might appear like that since they are the most notable provider of this LLM or generative AI technology, and some might say probably the first ones to make it to the public as well. Yet, putting a trademark on it might be a long stretch from the company.

That is probably why the company has filed for a trademark, not a copyright. This means that they just do not want anyone else to pretend to be them and release the technology. They want to still allow people to use it, to some extent, just not as them. That is probably why OpenAI has also released these brand guidelines for people to appropriately attribute the GPT-based technology to OpenAI.

The guidelines highlight the correct usage of words to use when building a product powered by OpenAI’s technology. For example, MeowlyticsGPT should be renamed to Meowlytics powered by GPT-4. For plugins, the name should specifically mention that the product is made for ChatGPT as a plugin instead of OpenAI or citing compatibility with ChatGPT.

Moreover, apart from registering the trademark in the United States, OpenAI’s China-based subsidiary also tried to trademark ‘GPT-4’ in the country. The country has banned the technology for its citizens, and is also developing its own chatbot. There is probably no reason why China would accept OpenAI’s technology even in the future. Probably, OpenAI did not want China-based AI companies to use ‘GPT’ in their products as well.

There’s a Catch

But the problem is pointed out by many Reddit and HackerNews users. OpenAI did not introduce GPT to the world, and not even coined the word for the first time. Plus, the company is too late even if it wanted to trademark its technology. GPT was first coined in 2018 with GPT-1 by the company, and the words ‘generative pre-trained’ date back even earlier in some research papers by Google.

Maybe OpenAI was not anticipating its success with ChatGPT technology back then. Now, the explanation for the trademark application can be just so that no one clones the company makes the most sense currently. Or maybe not. Maybe the Sam Altman led company has bigger plans. The company had already registered with AI.com to redirect it to ChatGPT — a pretty strong statement.

Well, now that the AI arms race is in full glory, there might be something that Google can do as well to catch up. Up until now, Google made strides by improving its technology, but it might have another trick up its sleeve. If OpenAI files for a trademark on ‘GPT’, which is more than just a product name, but a name of technology, and the USPTO accepts it or even considers it, the application will be moved for an ‘opposition period’. This is where other competitors and businesses such as Google or Meta would be able to raise their concern about the ‘GPT’ trademark. Google might have a chance there.

OpenAI may be getting a bit too possessive about their products. GPT stands for Generative Pre-trained Transformers and interestingly, ‘Transformer’ was introduced by Google in 2017 as a neural network architecture, for which the company has also filed a patent.

Still, Google hasn’t enforced it because it understands that it would not really make any difference since the patent did not cover the part which OpenAI used. That is a classic problem with patents – there is always a way to circumvent it by tweaking technology. Ultimately, GPT is a technology, not a product. Google could not patent it, nor can OpenAI. They can only try to put up a trademark on it, not that it makes complete sense.

GPT is a decoder-only architecture, and does not use an encoder. Therefore, Google’s patent on Attention-based Transformers cannot be slapped on OpenAI. Moreover, Google has released many open-source repositories with Transformers. Plus, a lot of Google products also leverage technology from Microsoft/OpenAI patents. So no, Google wouldn’t make this move unless it wants a legal battle with the GPT leader at the moment. But then, if Google continues to fall behind in this AI race, who knows what guns it might pull to hold on.

Dearly Loved, Dearly Hated

OpenAI has been criticised a number of times for not making its research publicly available. Researchers also called for a pause on training models beyond GPT-4. Clearly, this was an indication that OpenAI is moving too fast or making strides that the competition is afraid of.

When OpenAI decided to release its APIs, allow plugins, etc, people started using it, and eventually misusing it to some extent, by creating ChatGPT clones and even using similar names. This clearly is not sitting well with the company. Fair enough. But it might get a lot of backlash from a lot of people who have built their products using their APIs. Maybe they just want to cut it down now.

OpenAI just released new branding guidelines for GPT-based apps, which will kill 99% of the AI hustler apps. https://t.co/iOhRWJgEId pic.twitter.com/lwh0Vx4OaV

— Max Woolf (@minimaxir) April 24, 2023

A lot of product names appear with the term ‘GPT’ in it. Now, if OpenAI manages to get its trademark application decided in favour, all of these applications would have to change their name, and ultimately not look appealing to customers. It is hard to decide if OpenAI would actually want less people to know that its technology is being used for almost all products these days. This can possibly also lead to a drop in the number of people using their APIs as well.

Many call OpenAI’s move like a bid to control AI. “What if the first person who coined the term AI put a trademark on it?” It doesn’t seem plausible, though OpenAI might be able to achieve it as pointed out by Scher that a lot of trademark deals just depend on the fame of the company. “Just because IBM is called Internal Business Machines, does not mean that no other company cannot use any of those terms in their business.”

The post Beginning of the End of OpenAI appeared first on Analytics India Magazine.

MetaGPT — Realising the GPT-4 Dream

During the launch of GPT-4, OpenAI’s researchers showed that the LLM could create a website from scratch using just a sketch on paper as a reference. Even as users dream of creating a website from the outset using the power of GPT-4, OpenAI has still not released this capability of their multimodal LLM.

However, Pico Apps’ MetaGPT seems to have taken steps to realise this dream, albeit from a different angle. This GPT-4-powered application can create websites, apps, and more based only on natural language prompts. The service has been used to create dashboards, code-based visualisations, and even a marriage proposal!

What is MetaGPT?

Simply put, MetaGPT is a web application that allows users to build other web applications. The service first asks users what they want to create, and takes the prompt as a basic idea of what the website can be. MetaGPT then asks for a few additional details, such as the required inputs from the user.

The part that makes MetaGPT stand out from other no-code website-building platforms is its integration with ChatGPT. Upon taking the initial prompt and inputs from the user, we can choose to integrate ChatGPT’s functionalities into the application. The prompts encased within curly brackets will be passed on to ChatGPT, which will then generate the required text. This can be done completely without any code or API calls, relying on natural language prompts to serve the user’s purpose.

The service also allows users to iterate on their prompts, showing a visual representation of what the website looks like while GPT-4 codes it. Then, the user is given the option to iterate on the output of the chatbot, with the website recommending the user to go through multiple iterations to reach a good application. These iterations can range from UI/UX changes to bug fixes to complete redesigns of the site.

We tried building a basic website that generates an op-ed given a topic and the desired word length. We prompted the application with the simple sentence “an application that can write an op-ed”. After this prompt, the web app clarified a few additional details, such as what the user input should be and what syntax should be used to pass on the work to ChatGPT.

This advancement in web applications builds upon the promise offered by GPT-4, which OpenAI is still in the process of deploying safely. However, it seems that the AI world is hungry for innovation, and it isn’t waiting for OpenAI to fulfil its dreams.

Taking over OpenAI

Shortly after the launch of GPT-4, OpenAI released ChatGPT plugins. In a move which many called the ‘App Store’ moment for LLMs, the company not only released 12 plugins which allowed the chatbot to extend its functionality, but also released a standard that would allow developers to create more plugins.

However, the expectations for this feature have slowly eroded, as plugins continue to be available for a small percentage of ChatGPT’s users. What’s more, the feature is only available to ChatGPT Plus users, with others needing to join a waitlist for access.

The developer community has found novel ways to deploy the GPT-4 API, picking up on OpenAI’s slack. One only needs to look at the success of AutoGPT, an open-source project looking to allow GPT-4 to function autonomously. Other similar projects include BabyAGI, a GPT API powered task management system, and AgentGPT, a platform to create autonomous AI agents to automate repetitive tasks.

These open-source projects have captured lightning in a bottle, igniting the imaginations of many who wish to use GPT-4 for new use-cases. The hype created by OpenAI around the launch of GPT-4 has not died away, but shifted towards these community-driven projects, as seen by the runaway success of AutoGPT, MetaGPT, Baby AGI and others.

As OpenAI continues to delay the launch of GPT-4 features like multimodality and ChatGPT Plugins, the community is working hard to find ways to deploy this powerful LLM in increasingly innovative ways. While some are just wrappers of OpenAI’s APIs with added functionality like Forefront.ai or AnonChatGPT, others, like MemeCam or Bing Chat use the GPT-4 API to facilitate new use-cases altogether. OpenAI now needs to move faster, or risk their dream being stolen by others who are on the bleeding edge.

The post MetaGPT — Realising the GPT-4 Dream appeared first on Analytics India Magazine.

Council Post: Exploring the Pros and Cons of Generative AI in Speech, Video, 3D and Beyond

It is safe to say that Generative AI is the new Pandora’s box. There is no end to unleashing this box. The trend of using generative AI is creeping into every occupation. From text to speech to video to code. We have moved on from the question of whether it will replace jobs and dwell on a new approach on how to use it skillfully and use it to our advantage.

When does the relationship between humans and machines change from its current state into one that is so different that we can no longer regard one as being superior to the other in terms of creativity? This is a revolutionary question that the concept of generative artificial intelligence (GAI) raises. The development of generative AI is primarily driven by three developments: better models, better and more data, and increased processing power.

Machine learning models have become more complex in recent years. Computers can now understand intricate patterns in data that were previously challenging for them to find thanks to deep learning. This has had a significant impact on generative AI.

Our previous articles focused on the pros and cons of text, code and image. This article will dwell further on to the other industries stated below.

1- Speech

Although fascinating applications of generative AI have surfaced recently, primarily in speech-to-image creation using well-known models like Stable Diffusion and DALL-E, the technology’s commercial potential has largely gone untapped. And while both image and video have a place in business, speech is emerging as a strength.

Pros:

Generative AI models can produce more natural and realistic speech than traditional text-to-speech systems. This can improve the quality of automated voice assistants, audiobooks, and other applications that rely on synthesized speech. It can be used to create speech for people who have difficulty communicating verbally, such as those with speech disorders or hearing impairments. This can help improve accessibility for these individuals and make it easier for them to communicate with others. For faster content generation it can make speech quickly and efficiently, making it useful for applications such as automated customer service, where speed and efficiency are important.

Cons:

According to Mehrabian’s Rule, human speech may be divided into three components: words, tone of voice, and facial expression. Machine comprehension is text-based, and only recent advances in (NLP) have made it possible to train AI models on elements like sentiment, emotions, timbre, and other significant but not necessarily spoken components of language.While the analysis and AI synthesis processes can take some time, real-time speech-to-speech communication is often where it counts. Voice conversion must occur instantly when speaking is being done and translated correctly. Speech-to-speech technology must accommodate a wide range of accents, languages, and dialects and be accessible to everyone in order to realise its full potential.All users will need to support this AI infrastructure with thousands of different architectures for a particular solution because emerging technology solutions are not universally applicable. Additionally, users must plan for consistent model testing.

2- Video

Machine learning algorithms called generative video models create fresh video data based on patterns and relationships discovered in training datasets. These models enable the creation of synthetic video data that closely resembles the original video data by learning the fundamental structure of the video data. There are numerous forms of generative video models, including GANs, VAEs, CGANs, and others. Each type adopts a different training strategy based on its particular infrastructure.

Pros:

Efficiency: To create new videos fast and effectively in real time, generative video models can be trained on enormous databases of videos and images. This enables the quick and inexpensive production of significant amounts of new video content.

Customization: Generative video models can create video content that is tailored to a number of requirements, including style, genre, and tone, with the appropriate modifications. This makes it possible to create video material more freely and adaptably.

Diversity: Generative video models may create a variety of video content, including films made from text descriptions as well as creative scenes and characters. New avenues are now available for the creation and distribution of video content.

Cons:

Generative AI can produce unexpected results that may not be in line with the desired outcome. This lack of control can be frustrating and time-consuming to manage. Producing repetitive content or something that lacks diversity, as it can only generate content based on the data it has been trained on. The content produced can get very mainstream for the users. It can perpetuate biases present in the training data, resulting in biased video content. In the age of deep fakes it can create videos that depict people or events that are not real, raising ethical concerns about the authenticity of the video content.

3- 3D

According to recent data, the global market for generative design technology is anticipated to increase at a compound yearly growth rate of 17.4% to reach $46.1 billion by 2025. Similar to this, it is anticipated that the global market for creative AI will expand at a rate of 29.5% annually and reach $3.3 billion by 2025.

Pros:

By automating numerous steps in the 3D modelling process, generative AI enables designers to produce more intricate and detailed models in less time. As a result, designers can produce more realistic and intricate 3D models, giving users more immersive experiences. can assist designers in exploring fresh design ideas and developing modifications of current models, resulting in more imaginative and cutting-edge designs. Generative AI can lower the cost of creating high-quality 3D models by automating several processes involved in 3D modelling.

Cons:

The high computational resource requirements of generative AI approaches make them unfit for various applications.Models can occasionally create unexpected or challenging results, giving designers little control over the output and forcing them to manually alter or refine it. Even though generative AI models often claim to be accurate, this is not always the case, especially when working with large or highly detailed models. Some designers may find it challenging to embrace this strategy because it requires some level of competence in both domains to use generative AI in 3D modelling.

Generative AI is booming and we should not be shocked. Many technologists view AI as the next frontier, thus it is important to follow its development. The potential applications of AI are limitless, and in the years to come, we might witness the emergence of brand-new industries.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.

The post Council Post: Exploring the Pros and Cons of Generative AI in Speech, Video, 3D and Beyond appeared first on Analytics India Magazine.

NVIDIA Open-Sources Guardrails for Hallucinating AI Chatbots

The great powers of Generative AI carry great risks along with them. The AI chatbots hallucinate, often veer off topic and tend to scrape through user data. The danger is that companies rushing to integrate these tools within their system can potentially overlook these massive risks. NVIDIA may have a solution.

Yesterday, the Jensen Huang-led company released a new open-source framework called NeMo Guardrails to help resolve this problem. These guardrails ensure that the organisations building and deploying LLMs for a range of different functions can stay on track.

There are three types of guardrails that the project has – Topical controls to prevent applications from responding to sensitive questions, Safety controls to ensure accurate information from credible sources and security controls to restrict them from connecting with vulnerable external third-party applications.

Jonathan Cohen, VP of Applied Research at the company explained how the guardrails could be implemented. “While we have been working on the Guardrails system for years, a year ago we found this system would work well with OpenAI’s GPT models,” he stated. The blog posted on the guardrails stated that it works on top of all major LLMs like GPT3 or Google T5 or even AI image generation models like Stable Diffusion 1.5 and Imagen.

Considering they are open-source, NeMo Guardrails can work with all the tools used by enterprise application developers. For instance, it can run on top of the open-source toolkit, LangChain that developers have been working on for third-party applications.

Harrison Chase, the creator of the LangChain toolkit stated, “Users can easily add NeMo Guardrails to LangChain workflows to quickly put safe boundaries around their AI-powered apps.”

Interestingly, the guardrails themselves use the LLM to check itself much like the SelfCheckGPT technique. Cohen admitted that while using the guardrails was “relatively inexpensive” in terms of the compute, there was space to optimise the controls but it still was

The guardrails are built on CoLang, a natural language modelling language which provides a readable and extensible interface for users to better control the behaviour of their AI bots.

NVIDIA has incorporated the guardrails into the NVIDIA NeMo framework which is already open-sourced on GitHub. In addition, NeMo Guardrails will be included in the AI Foundations service which offers several pre-trained models and frameworks for companies to rent out.

The post NVIDIA Open-Sources Guardrails for Hallucinating AI Chatbots appeared first on Analytics India Magazine.

Enterprises Die for Domain Expertise Over New Technologies

Foundational models such as BERT and GPT-n series have changed the world overnight – at least in perception. But not quite for the enterprises, for whom finding the right technology partner is more important than the technology itself. Today, customers no longer consider just one application on a single technology, they now evaluate multiple technologies and integrate them to facilitate expansion.

There may be multiple APIs currently available for integrating applications, but the use case that you’re trying to solve depends on the domain expertise of the provider. Conducting a Proof of Concept (POC) and testing the technologies is crucial to ensure that they are effective for decision-making and beneficial for the business. Otherwise, there are many API integrations available in the market.

Domain expertise is important to build a complete ecosystem that can scale. This can help businesses leverage relevant knowledge and datasets to develop custom solutions. This is why enterprises look for enablers that can bring in the domain expertise for particular use cases.

Data challenges everywhere

Today, ESRI, a geographic information system (GIS) software company, for instance, has data from all over India pertaining to, say, what is being cultivated where. Deepak Kolekar, AVP & head of IT at Godrej Agrovet, explains that commodities such as maize or soybean extraction make up 60% of their raw material cost. If they can leverage ESRI’s data to source material from specific locations and bring down their costs, it will ultimately benefit consumers and give them a competitive advantage. This is just one example of multiple possibilities that can be explored.

Godrej’s data journey began almost 13 to 14 years ago, and, Kolekar maintains, now is the time for them to leverage the data that has been accumulated over the years. They are exploring the implementation of AI/ML technologies for exceptional reporting, rather than for daily routine use. In addition to this, they have developed planning solutions and are seeking to enhance it with image-based processing for their oil palm plantation business. Moreover, they have conducted a pilot project to predict which customers are likely to default on payments, which can aid in better decision-making.

The pilot was primarily launched in the B2B segment of their business, where they were able to achieve up to a 70% prediction rate and provide timely alerts to both the finance and operations teams. In order to validate the accuracy of the model, they instructed the finance teams to simulate the predictions using traditional methods in Excel. This exercise gave them confidence that such technologies are certainly capable of providing valuable assistance.

On the other hand, for a company like boAt, the journey with data is still in its nascent stages. “We have limited historical data, which means we are not yet able to utilise predictive or prescriptive insights. Our primary use case would have been demand forecasting, but given that 80-85% of our sales come from marketplaces like Amazon and Flipkart which are extremely fluctuating, demand sensing is not effective in that context,” explained Shashwat Singh, CIO at boAt.

At present, the company is concentrating on big data challenges such as identifying customer needs to aid in the New Product Introduction (NPI) process. This task involves analysing a significant amount of text and does not necessitate an extensive data history.

Role of tech enablers

One of the challenges that companies encounter today is how to utilise data effectively as per their business needs. According to a global survey conducted by Oracle and Seth Stephens-Davidowitz, 91% of respondents in India reported a ten-fold increase in the number of decisions they make every day over the past three years. As individuals attempt to navigate this increased decision-making, 90% reported being inundated with more data from various sources than ever before.

“Some interesting findings we came across was that respondents who wanted technological assistance also said that the technology should know its workflow and what it is trying to accomplish,” Joey Fitts, vice president, Analytics Product Strategy, Oracle told ET.

The current major players in the ERP solutions market are SAP, Microsoft Dynamics, and Oracle Fusion. Regarding boAt’s decision to choose SAP, Singh cited three reasons. Firstly, the feature functionality gap between SAP and the other competitors they evaluated was the smallest. Secondly, the strong partner ecosystem with SAP was a key driver for them. Thirdly, the RISE program by SAP was an attractive feature to them, as it enables customers to focus on their core business while outsourcing the overheads associated with managing the infrastructure and uptime.

The post Enterprises Die for Domain Expertise Over New Technologies appeared first on Analytics India Magazine.

Amazon Web Services’ New-found Love for Open Source

Unbeknownst to many, the world’s leading enterprise cloud service providers like Google Cloud, Microsoft Azure, and Amazon Web Services, all use a troupe of open-source software in their tech stacks. From Kubernetes to Linux to PostgreSQL, open-source software is ever-present in cloud services, allowing CSPs to keep the wheels running at no additional cost.

To offset the cost that these providers save by using open-source software, many of the companies contribute to them. Google has been vocal about its love of open-source software offerings, and is one of the most active contributors to open-source. However, it seems that Amazon is now catching up to its customers in giving back to the community, finally righting the scales on open-source responsibility.

Shift from customer-focused service

Amazon is well-known for their ‘customer obsession’ strategy and has often made this their first priority when providing services. However, the public discourse against their free ride of open-source softwares has grown in the past few years. Open-source advocates have been vocal about their disdain towards AWS’ ‘strip-mining’ of open source technology.

Strip-mining is a term used to denote the strategy of ‘intercept and monetise’, where Amazon takes an open-source project, ‘steals’ the code and creates a proprietary paid service based on it. A prime example of this is the Elasticsearch incident from 2015. AWS forked an open-source project created by a company named Elastic, building a product named Elasticsearch on top of it.

After fighting the case in court, Elastic was forced to change their licensing from a permissive Apache License V 2.0 to a different license system called Elastic license. However, this still did not stop Amazon, as they simply forked the repo, combined it with another software called Kibana, and released the resulting product as an open-source project called “OpenSearch”.

This is just one of the many cases where Amazon has taken open-source software and repurposed it for its own use. However, it seems that this strategy is now changing, as AWS has climbed to the top 5 in open-source contributions over the last year.

While Google and Microsoft dominate the leaderboards in first and second respectively, the third and fourth position is taken up by Red Hat and Intel, both open-source giants in their own right. Considering Amazon’s sluggish attitude towards supporting open-source, it seems they shouldn’t be in the top 5, yet they are.

AWS open-source contributors’ growth. Source: OSCI

When looking at the data provided by the Open Source Contributors Index, it is clear when AWS pivoted to their open-source contribution strategy. Up to 2019, the number of open source contributors consistently stayed below 250, but spiked to 852 in March 2021. Today, it has 14.9% more contributors than it did in 2021 – around 2700 – and is the only company that has seen a positive growth in contributors in this time period.

Top 4 open source contributors’ decline. Source: OSCI

Microsoft, Google, Red Hat and Intel have all cut down on the number of open-source contributors over the past few years. So the question remains – what drove Amazon to contribute so heavily to open source?

Pivot to open-source-as-a-service

While open-source projects offer ease of access and no licensing costs, it often costs significant resources to keep them running in an enterprise tech stack. This is due to the fact that there is no ever-present support to take care of obscure bugs and issues that might arise. Some companies, like Red Hat, saw this as a market in need of a business model, and began offering open-source-as-a-service.

In this business model, the company takes over the support and documentation responsibilities while offering open-source software. It now seems that AWS has also seen the benefits of this business model, if not for profitability, for customer focus. Vertically integrating open-source software into their tech stack offers multiple advantages, the primary among them being ease of use.

By contributing to and maintaining open source repositories, Amazon does not need to rely on community contributors to fix bugs, nor do they need to package the software into a new web service. Matt Asay, vice president of developer relations at MongoDB, who has had prior experience working at AWS, stated, “The company has always been great at running open source projects as services for its customers. As I found while working there, most customers just want something that works. But getting it to “just work” in the way customers want requires that AWS get its hands dirty in the development of the project.”

Indeed, many of the company’s new offerings leverage the power of open-source in a sustainable manner. One only needs to look at the new AWS Bedrock services – an effort to bring foundational models into the hands of AWS customers. Even some of the models offered on the Bedrock service are open source, such as Stability AI’s StableLM and Stable Diffusion.

Whether it is for the sake of customer satisfaction or removing technical debt, Amazon is now catching up to its competitors in terms of giving back to the community. Moreover, this new strategy allows for the creation of a new business model for AWS, moving past wrapping open-source projects in proprietary skins.

The post Amazon Web Services’ New-found Love for Open Source appeared first on Analytics India Magazine.

AI Price Decline: How to Capitalize, Challenges & Key Considerations

Image of a standing futuristic robot

AI has been gathering the attention of organizations globally due to its ability to automate repetitive tasks and enhance decision-making capabilities. Earlier, AI was only available to big corporations and universities for conducting academic research or building high-cost proprietary tools. But in recent years, companies are experiencing a significant AI price decline.

AI price decline refers to a reduction in the cost of hardware, software, and services related to AI. The primary driver of this decline is a decreasing cost of computational resources. For instance, in the 1950s, the cost of computational power was $200,000/month, which has dropped significantly in recent years due to modern advances like cloud computing.

Hence, business leaders can effectively capitalize on declining AI costs to build valuable products. However, the AI domain presents some major challenges which the business leaders should carefully consider before investing in AI. Let’s explore this idea in detail below.

Major Challenges Faced While Investing In AI

Business leaders mainly face two major challenges while executing their AI initiatives, i.e., getting their hands on relevant datasets and keeping AI’s computational expenses within their budget. Let’s look at them one by one.

1. Data Quality

AI needs high-quality data. Lots of it. But it is not easy to collect high-value data since more than 80% of the data in enterprises is unstructured.

The primary step in the AI life cycle is to identify and collect raw data sources, transform them into the required high-quality format, execute analytics, and build robust models.

Hence, for business leaders, it is necessary to have a comprehensive data strategy that can leverage this data to integrate AI into their business. If relevant data is not available, then investing in an AI venture is not a good idea.

2. Computationally Expensive

The computational capacity required to execute AI can be an entry barrier for small organizations. AI needs significant computation depending on the complexity of the models which leads to high costs. For instance, reportedly, it costs about $3 million/month for OpenAI to run ChatGPT.

Hence, to fulfill the computational needs, specialized and expensive hardware such as Graphic Processing Units (GPUs) and Tensor Processing Units (TPUs) are required to optimize AI operations.

On the software front, researchers are working on reducing the AI model size and memory footprint, which will significantly decrease the training time and eventually save computational costs.

Capitalizing on AI Price Decline

In recent years, the AI domain has progressed immensely in all dimensions, i.e., software, hardware, research, and investment. As a result, AI business leaders have overcome and minimized many AI-related challenges.

Accelerated Development of AI Applications

Today, most AI tools offer free variants. Their paid subscription models are also reasonable. Businesses and individuals are using these applications to increase efficiency, improve decision-making, automate repetitive tasks, and enhance customer experience.

For instance, generative AI tools like Bard, ChatGPT, or GPT-4 can assist users in generating new ideas and writing various types of content, such as product summaries, marketing copies, blog posts, etc. Over 300 applications are built on top of GPT-3 API.

There are various examples in other domains as well. For example, Transfer Learning techniques are being used for medical image classification to improve application accuracy. Salesforce Einstein is a generative AI CRM (Customer Relationship Management) that can analyze data, predict customer behavior, and deliver personalized experiences.

Greater Investment in AI

The decline in AI prices has led to mass technology adoption, making AI a lucrative investment opportunity. For instance, in 2022, the AI market size was valued at $387.5 billion. It is expected to reach a whopping $1395 billion in 2029, growing at a CAGR of 20.1%.

AI products are being used to make new advancements in major industries, like healthcare, education, finance, etc. All the big tech giants and startups are investing heavily in AI research and development.

Key Considerations For Business Leaders Before Capitalizing on AI Price Decline

Understand Business Goals and Evaluate How AI Fits In

Before capitalizing on AI price decline, identifying your business strategy and goals is essential. Unrealistic expectations are one of the leading causes of AI project failure. Report suggests that 87% of AI initiatives don’t make it to production. Hence, assessing your data strategy and how AI can be integrated into business to enhance the overall efficiency are important aspects to consider before investing in AI.

Build a High-Quality AI Team & Equip Them With the Right Tools

Before investing in AI, it is vital to identify the required hardware and software resources for your AI team. Equip them with the right datasets which they can leverage to build better products. Provide them with necessary training to ensure the success of your AI initiatives. Research suggests that both lack of AI expertise in employees and non-availability of high-quality data are major reasons for the failure of AI ventures.

Estimate AI Cost & Return On Investment (ROI)

Many AI projects fail because they are unable to deliver the promised outcome or returns. In 2012, IBM’s AI software Watson for Oncology received funding worth $62 million. It was designed to diagnose and suggest treatments for cancer patients based on the patient’s personal data, medical history, and medical literature.

This project was criticized for its accuracy and reliability. Moreover, it was costly to set up this software in hospitals. Ultimately, in 2021 IBM abandoned its sales for Watson for Oncology. Hence, it is essential to evaluate the cost of acquiring or building AI technologies before investing in them.

Evaluate AI Regulations

Business leaders must ensure that their AI initiatives comply with relevant regulations. Recently, AI regulations have become the focus of global watchdogs. These AI regulations aim to address the concerns related to AI data bias, explainability. data privacy and security.

For instance, GDPR (General Data Protection Regulation) is one such EU regulation that came into effect in 2018. It regulates organizational policies on personal data collection, its processing, and usage in AI systems.

Moreover, in November 2021, all 193 member countries in UNESCO agreed on adopting common values and principles of AI ethics to ensure risk-free AI development.

The Right Time To Invest In AI Is NOW!

Global tech giants are investing heavily in AI which tells us that AI has a bright future. For instance, Microsoft has invested $10 billion in AI while Google has invested $400 million in their AI ventures at the start of 2023.

For businesses to stay competitive, it is important to capitalize on AI’s declining prices. At the same time, it is important for them to address and overcome the challenges that AI presents to build robust systems.

For more interesting AI-related content, visit unite.ai.

How AI is Creating Explosive Demand for Training Data

Artificial Intelligence (AI) has rapidly evolved in recent years, leading to groundbreaking innovations and transforming various industries. One crucial factor driving this progress is the availability and quality of training data. As AI models continue to grow in size and complexity, the demand for training data is skyrocketing.

The Growing Importance of Training Data

At the heart of AI lies machine learning, where models learn to recognize patterns and make predictions based on the data they are fed. In order to improve their accuracy, these models require large amounts of high-quality training data. The more data that AI models have at their disposal, the better they can perform in various tasks, from language translation to image recognition.

As AI models continue to grow in size, the demand for training data has increased exponentially. This growth has led to a surge in interest in data collection, annotation, and management. Companies that can provide AI developers with access to vast, high-quality datasets will play a vital role in shaping the future of AI.

The State of AI Models Today

One notable example of this trend is the state-of-the-art GPT-3, released in 2020. According to ARK Invest’s “Big Ideas 2023” report, the cost to train GPT-3 was a staggering $4.6 million. GPT-3 consists of 175 billion parameters, which are essentially the weights and biases adjusted during the learning process to minimize error. The more parameters a model has, the more complex it is and the better it can potentially perform. However, with increased complexity comes a higher demand for quality training data.

GPT-3’s performance, and now GPT-4, has been impressive, demonstrating a remarkable ability to generate human-like text and solve a wide range of natural language processing tasks. This success has further fueled the development of even larger and more sophisticated AI models, which in turn will require even larger datasets for training.

The Future of AI and the Need for Training Data

Looking ahead, ARK Invest predicts that by 2030, it will be possible to train an AI model with 57 times more parameters and 720 times more tokens than GPT-3 at a much lower cost. The report estimates that the cost of training such an AI model would drop from $17 billion today to just $600,000 by 2030.

For perspective, the current size of Wikipedia’s content is approximately 4.2 billion words, or roughly 5.6 billion tokens. The report suggests that by 2030, training a model with an astounding 162 trillion words (or 216 trillion tokens) should be achievable. This increase in AI model size and complexity will undoubtedly lead to an even greater demand for high-quality training data.

In a world where compute costs are decreasing, data will become the primary constraint for AI development. The need for diverse, accurate, and vast datasets will continue to grow as AI models become more sophisticated. Companies and organizations that can supply and manage these massive datasets will be at the forefront of AI advancements.

The Role of Data in AI Advancements

To ensure the continued growth of AI, it is essential to invest in the collection and curation of high-quality training data. This includes:

  1. Diversifying data sources: Collecting data from various sources helps to ensure that AI models are trained on a diverse and representative sample, reducing biases and improving their overall performance.
  2. Ensuring data quality: The quality of training data is crucial for the accuracy and effectiveness of AI models. Data cleansing, annotation, and validation should be prioritized to ensure the highest quality datasets. Additionally, techniques like active learning and transfer learning can help maximize the value of available training data.
  3. Expanding data partnerships: Collaborating with other companies, research institutions, and governments can help to pool resources and share valuable data, further enhancing AI model training. Public and private sector partnerships can play a key role in driving AI advancements by fostering data sharing and cooperation.
  4. Addressing data privacy concerns: As the demand for training data grows, it’s essential to address privacy concerns and ensure that data collection and processing follow ethical guidelines and comply with data protection regulations. Implementing techniques like differential privacy can help protect individual privacy while still providing useful data for AI training.
  5. Encouraging open data initiatives: Open data initiatives, where organizations share datasets for public use, can help democratize access to training data and spur innovation across the AI ecosystem. Governments, academic institutions, and private companies can all contribute to the growth of AI by promoting the use of open data.

Real-World Implications of the Growing Demand for Training Data

The explosive demand for training data has far-reaching implications for various industries and sectors. Here are some examples of how this demand could reshape the AI landscape:

  1. AI-driven data marketplace: As data becomes an increasingly valuable resource, a thriving marketplace for AI training data is likely to emerge. Companies that can curate, annotate, and manage high-quality datasets will be in high demand, creating new business opportunities and fostering competition in the data market.
  2. Growth of data annotation services: The increasing need for annotated data will drive the growth of data annotation services, with companies specializing in tasks like image labeling, text annotation, and audio transcription. These services will play a crucial role in ensuring that AI models have access to accurate and well-structured training data.
  3. Increased investment in data infrastructure: As the demand for training data grows, so too will the need for robust data infrastructure. Investments in data storage, processing, and management technologies will be essential to support the vast amounts of data required by next-generation AI models.
  4. New job opportunities: The demand for training data will create new job opportunities in data collection, annotation, and management. Data science and AI-related skills will be increasingly valuable in the job market, with data engineers, annotators, and AI trainers playing a critical role in the development of advanced AI systems.

As AI continues to evolve and expand its capabilities, the demand for quality training data will grow exponentially. The findings from ARK Invest’s report highlight the importance of investing in data infrastructure to ensure that future AI models can reach their full potential. By focusing on diversifying data sources, ensuring data quality, and expanding data partnerships, we can pave the way for the next generation of AI advancements and unlock new possibilities across various industries. The future of AI will be shaped not only by the algorithms and models we create but also by the data that fuels them.