Another player heats up generative AI race as China introduces interim laws

Abstract AI

JD.com is heating up the artificial intelligence (AI) race in China with the release of its large language model, even as regulators move to introduce regulations to manage generative AI services.

China's second-largest online shopping platform, JD.com, said its ChatRhino(or yanxi in Chinese) has been customized to support several verticals such as logistics, retail, healthcare, and finance. The large language model comprises 70% general data and 30% "native intelligent" supply chain data, said the e-commerce player, which has a logistics arm as well as a healthcare business unit.

Also: Is Temu legit? What to know about this shopping app before you place your order

ChatRhino boasts a base of 100 billion parameters, up from the 10 billion-parameter benchmark clocked by its previous model Vega early last year. Vega had led the General Language Understanding Evaluation (GLUE) list, outpacing models from Microsoft and Facebook, said JD.com in a statement Thursday.

GLUE measures and ranks natural language processing based on nine tasks, spanning a range of "dataset sizes, text genres, and degrees of difficulty." OpenAI's GPT-4 reportedly, though unconfirmed by the company, has more than 1 trillion parameters based on eight models. Its previous GPT-3 model has more than 175 billion parameters, while GPT-2 runs on 1.5 billion.

ChatRhino offers more than 100 training and inference optimization tools that JD.com says support domain-specific application development, letting clients more quickly build their own specialized models. The vendor claims a generative AI model for the healthcare industry, for instance, can be built in "minutes" with two algorithm engineers, compared to the traditional method that typically requires a week and at least 10 scientists.

JD Health's own large language model, Jingyi Qianxun, is built on ChatRhino and has been trained on medical scenarios to automatically deploy services, including telemedicine, according to JD.com.

Also: Google tests its new AI medical chatbot at Mayo Clinic

E-commerce merchants also can tap ChatRhino to create a range of visuals, marketing posters, and product images, from one product image. The AI model can cut production cost for each visual asset by 90%, reducing the time needed to complete the task from a week to half a day, JD.com said.

Interim laws to guide generative AI rollouts

The launch of ChatRhino comes the same week China introduced interim regulations to manage generative AI services in the country.

To kick in from Aug. 15, the new laws are necessary to ensure the healthy development of the technology and safeguard both national security and public interests, the Chinese government said.

Also: Generative AI is coming for your job. Here are 4 reasons to be excited

In a joint statement issued by various agencies including Cyberspace Administration of China (CAC) and Ministry of Science and Technology, the government noted that while generative AI had created new economic and social development opportunities, it also brought along challenges such as fake news and data privacy and safety risks.

The interim legislation outlines various measures that aim to facilitate the sound development of the technology, while protecting national and public interests and legal rights of citizens and businesses, the statement noted.

Generative AI developers, for instance, will have to ensure their pre-training and model optimization processes are carried out in compliance with the law. These include using data from legitimate sources that adhere to intellectual property rights. Should personal data be used, the individual's consent must be obtained or it must be done in accordance with existing regulations.

Also: Leadership alert: The dust will never settle and generative AI can help

Measures also have to be taken to improve the quality of training data, including its accuracy, objectivity, and diversity.

Under the interim laws, generative AI service providers assume legal responsibility for the information generated and its security. They will need to sign service-level agreements with users of their service, thereby clarifying each party's rights and obligations.

When illegal content is uncovered, the service provider must take various measures such as preventing its transmission and rectifying its use in model training. The relevant authority also be notified.

In addition, service providers would have to take necessary measures should a user engage in illegal activities using the generative AI service. These include restriction functions, suspending or terminating the service, maintaining relevant records, and reporting to the relevant authority.

Service providers that breach the new laws will face penalties laid out in China's various existing relevant legislations, including the Network Security Law, Data Security Law, and Personal Information Protection Law. In instances where the are no provisions for violations, a warning will be issued alongside order corrections that must be fulfilled within a period of time. Failure to comply with such orders may result in a suspension of services.

Also: 6 harmful ways ChatGPT can be used by bad actors, according to a new study

China in April released a draft preview of the legislation, saying the development of generative AI technologies such as ChatGPT could lead to abuse if left unregulated. A separate legislation came into effect in January that laid out ground rules to prevent "deep synthesis" technology, including deepfakes and virtual reality, from being abused. Anyone using these services must label the images accordingly and refrain from tapping the technology for activities that breach local regulations.

In May, the Chinese government unveiled plans to build AI industrial hubs and tech platforms across the country to support research and development work. To date, development plans have been launched for 18 national AI pilot areas and 32 innovation platforms, including in Beijing and Tianjin.

Apart from JD.com, local players such as Tencent and Alibaba also have announced efforts to offer or integrate generative AI into their products. Alibaba Cloud in April unveiled its large language AI platform, called Tongyi Qianwen, which is currently available to customers in China for beta testing and as an API to developers. The Chinese cloud vendor also introduced a partnership program to drive the development of AI applications for verticals, including finance and petrochemicals.

Artificial Intelligence

AI Risks & Extinction: The Precarious Future of Humanity Amidst an AI Revolution

Featured Blog Image-AI Risks & Extinction: The Precarious Future of Humanity Amidst an AI Revolution

In an era marked by technological advancements, Artificial Intelligence (AI) has been a transformative force. From revolutionizing industries to enhancing everyday life, AI has shown remarkable potential. However, experts are raising alarm bells about inherent AI risks and perils.

The AI risk statement, a collective warning from industry leaders like Elon Musk, Steve Wozniak, Stuart Russell, and many more, sheds light on several concerning aspects. For instance, the weaponization of AI, the proliferation of AI-generated misinformation, the concentration of advanced AI capabilities in the hands of few, and the looming threat of enfeeblement are some serious AI risks that humanity cannot ignore.

Let’s discuss these AI risks in detail.

The Weaponization of AI: Threat to Humanity’s Survival

Technology is a crucial part of modern warfare and AI systems can facilitate weaponization with a lot of ease, posing a serious danger to humanity. For instance:

1. Drug-Discovery Tools Turned Chemical Weapons

AI-driven drug discovery facilitates the development of new treatments and therapies. But, the ease with which AI algorithms can be repurposed magnifies a looming catastrophe.

For example, a drug-developing AI system suggested 40,000 potentially lethal chemical compounds in less than six hours, some of which resemble VX, one of the strongest nerve agents ever created. This unnerving possibility unveils a dangerous intersection of cutting-edge science and malicious intent.

2. Fully Autonomous Weapon

The development of fully autonomous weapons fueled by AI presents a menacing prospect. These weapons, capable of independently selecting and engaging targets, raise severe ethical and humanitarian concerns.

The lack of human control and oversight heightens the risks of unintended casualties, escalation of conflicts, and the erosion of accountability. International efforts to regulate and prohibit such weapons are crucial to prevent AI’s potentially devastating consequences.

Misinformation Tsunami: Undermining Societal Stability

Misinformation Tsunami: Undermining Societal Stability

The proliferation of AI-generated misinformation has become a ticking time bomb, threatening the fabric of our society. This phenomenon poses a significant challenge to public discourse, trust, and the very foundations of our democratic systems.

1. Fake Information/News

AI systems can produce convincing and tailored falsehoods at an unprecedented scale. Deepfakes, AI-generated fake videos, have emerged as a prominent example, capable of spreading misinformation, defaming individuals, and inciting unrest.

To address this growing threat, a comprehensive approach is required, including developing sophisticated detection tools, increased media literacy, and responsible AI usage guidelines.

2. Collective Decision-Making Under Siege

By infiltrating public discourse, AI-generated falsehoods sway public opinion, manipulate election outcomes, and hinder informed decision-making.

“According to Eric Schmidt, former CEO of Google and co-founder of Schmidt Futures: One of the largest short-term hazards of AI is the misinformation surrounding the 2024 election.”

The erosion of trust in traditional information sources further exacerbates this problem as the line between truth and misinformation becomes increasingly blurred. To combat this threat, fostering critical thinking skills and media literacy is paramount.

The Concentration of AI Power: A Dangerous Imbalance

As AI technologies advance rapidly, addressing the concentration of power becomes paramount in ensuring equitable and responsible deployment.

1. Fewer Hands, Greater Control: The Perils of Concentrated AI Power

Traditionally, big tech companies have held the reins of AI development and deployment, wielding significant influence over the direction and impact of these technologies.

However, the landscape is shifting, with smaller AI labs and startups gaining prominence and securing funding. Hence, exploring this evolving landscape and understanding the benefits of the diverse distribution of AI power is crucial.

2. Regimes' Authoritarian Ambitions: Pervasive Surveillance & Censorship

Authoritarian regimes have been leveraging AI for pervasive surveillance through techniques like facial recognition, enabling mass monitoring and tracking of individuals.

Additionally, AI has been employed for censorship purposes, with politicized monitoring and content filtering to control and restrict the flow of information and suppress dissenting voices.

From Wall-E to Enfeeblement: Humanity's Reliance on AI

From Wall-E to Enfeeblement: Humanity's Reliance on AI

The concept of enfeeblement, reminiscent of the film “Wall-E,” highlights the potential dangers of excessive human dependence on AI. As AI technologies integrate into our daily lives, humans risk becoming overly reliant on these systems for essential tasks and decision-making. Exploring the implications of this growing dependence is essential to navigating a future where humans and AI coexist.

The Dystopian Future of Human Dependence

Imagine a future where AI becomes so deeply ingrained in our lives that humans rely on it for their most basic needs. This dystopian scenario raises concerns about the erosion of human self-sufficiency, loss of critical skills, and the potential disruption to societal structures. Hence, governments need to provide a framework to harness the benefits of AI while preserving human independence and resilience.

Charting a Path Forward: Mitigating the Threats

In this rapidly advancing digital age, establishing regulatory frameworks for AI development and deployment is paramount.

1. Safeguarding Humanity by Regulating AI

Balancing the drive for innovation with safety is crucial to ensure responsible development and use of AI technologies. Governments need to develop regulatory rules and put them into effect to address the possible AI risks and their societal effects.

2. Ethical Considerations & Responsible AI Development

The rise of AI brings forth profound ethical implications that demand responsible AI practices.

  • Transparency, fairness, and accountability must be core principles guiding AI development and deployment.
  • AI systems should be designed to align with human values and rights, promoting inclusivity and avoiding bias and discrimination.
  • Ethical considerations should be an integral part of the AI development life cycle.

3. Empowering the Public with Education as Defense

AI literacy among individuals is crucial to foster a society that can navigate the complexities of AI technologies. Educating the public about the responsible use of AI enables individuals to make informed decisions and participate in shaping AI's development and deployment.

4. Collaborative Solutions by Uniting Experts and Stakeholders

Addressing the challenges posed by AI requires collaboration among AI experts, policymakers, and industry leaders. By uniting their expertise and perspectives, interdisciplinary research and cooperation can drive the development of effective solutions.

For more information regarding AI news and interviews visit unite.ai.

FTC investigates ChatGPT-maker OpenAI for possible harm to users. Here’s what you need to know

holographic data creative image

OpenAI's ChatGPT has become incredibly popular since its launch, making it the fastest-growing application of all time. With this many users, questions have arisen about how user data is being collected, utilized, and protected — and now the Federal Trade Commission (FTC) wants answers.

In a 20-page FTC document obtained by the Washington Post, the government agency asks OpenAI to present documentation on nearly all aspects of its large language model, with a special focus on OpenAI's user data handling and ChatGPT's output of false statements.

Also: Train AI models with your own data to mitigate risks

According to the document, the FTC is investigating whether OpenAI is engaging in "unfair or deceptive privacy or data security practices" or if it has engaged in "deceptive practices" that could risk harm to consumers.

The FTC is seeking detailed explanations on how OpenAI obtains its data, how that data is used to train the model, and the procedures in place to assess risk and safety.

In addition, the FTC is also asking OpenAI to disclose what steps it has taken to mitigate the risk of LLM generation of "false, misleading or disparaging" statements about real individuals.

These investigations follow several incidents with ChatGPT that have caused concerns and even provoked lawsuits against OpenAI.

For example, on March 20 there was a data breach that exposed ChatGPT users' conversations and information on payments by subscribers. The breach highlighted the potential risks of using AI tools and even led Italy to ban ChatGPT as a whole, although the ban has been lifted.

Also: Most workers want to use generative AI to advance their careers but don't know how

Other instances include two class action lawsuits filed against OpenAI. One of the lawsuits against OpenAI claims that the AI company has been using "stolen data" from customers to train and develop its products.

The other lawsuit relates to ChatGPT's hallucinations and its ability to produce false statements about people. In OpenAI's first defamation case, a Georgia radio host sued OpenAI after finding that ChatGPT allegedly was spreading false information about him, accusing him of embezzling money.

Artificial Intelligence

Workers that made ChatGPT less harmful ask lawmakers to stem alleged exploitation by Big Tech

Workers that made ChatGPT less harmful ask lawmakers to stem alleged exploitation by Big Tech Annie Njanja 9 hours

Kenyan workers who helped remove harmful content on ChatGPT, OpenAI’s smart search engine that generates content based on user prompts, have filed a petition before the country’s lawmakers calling them to launch investigations on big tech outsourcing content moderation and AI work in Kenya.

The petitioners want investigations into the “nature of work, the conditions of work, and the operations” of the big techs that outsource services in Kenya through companies like Sama – which is at the heart of several litigations on alleged exploitation, union-busting, and illegal mass layoffs of content moderators.

The petition follows a Time report that detailed the pitiable remuneration of the Sama workers that made ChatGPT less toxic, and the nature of their job, which required reading and labelling graphic text, including describing scenes of murder, bestiality, and rape. The report stated that in late 2021 Sama was contracted by OpenAI to “label textual descriptions of sexual abuse, hate speech, and violence” as part of the work to build a tool (that was built into ChatGPT) to detect toxic content.

The workers say they were exploited, and not offered psychosocial support, yet they were exposed to harmful content that left them with “severe mental illness.” The workers want the lawmakers to “regulate the outsourcing of harmful and dangerous technology” and to protect the workers that do it.

FTC reportedly looking into OpenAI over ‘reputational harm’ caused by ChatGPT

They are also calling on them to enact legislation regulating the “outsourcing of harmful and dangerous technology work and protecting workers who are engaged through such engagements.”

Sama says it counts 25% of Fortune 50 companies, including Google and Microsoft, as its clients. The San Francisco-based company’s main business is in computer vision data annotation, curation, and validation. It employs over 3,000 people across its hubs including the one in Kenya. Earlier this year Sama dropped content moderation services to concentrate on computer vision data annotation, laying off 260 workers.

OpenAI’s response to the alleged exploitation acknowledged that the work was challenging, adding that it had established and shared ethical and wellness standards (without giving further details on the exact measures) with its data annotators for the work to be delivered “humanely and willingly”.

They noted that to build safe and beneficial artificial general intelligence, human data annotation was one of the many streams of its work to collect human feedback and guide the models toward safer behavior in the real world.

“We recognize this is challenging work for our researchers and annotation workers in Kenya and around the world—their efforts to ensure the safety of AI systems has been immensely valuable,” said OpenAI’s spokesperson.

Sama told TechCrunch it was open to working with the Kenyan government “to ensure that baseline protections are in place at all companies.” It said that it welcomes third party audits of its working conditions, adding that employees have multiple channels to raise concerns, and that it has “performed multiple external and internal evaluations and audits to ensure we are paying fair wages and providing a working environment that is dignified.”

Can AI commit libel? We’re about to find out

DSC Weekly 11 July 2023

Announcements

  • With the hybrid workplace here to stay, employers are facing unprecedented challenges to recruiting and retaining workers in these uncertain times. Join the Employee Experience in the Hybrid Workplace summit to hear from leading experts on the newest solutions for enhancing the workplace learning process, the leading tools and technology to get a pulse on what employees really need, and the talent technology companies can use to ensure they recruit the right people for the right positions.
  • Data management and analytics have never been more critical to defining long-term success. The Optimal Data Analytics summit will explore how AI and ML are shaping the future of data analytics, and discover strategies to implement deep learning, neural networks, RPA, NLP and more. Join us to learn how to unleash the power of augmented analytics to optimize core business processes, source new revenue streams, improve customer satisfaction and drive long-term success.

Top Stories

  • AIOps above the radar – Using AI to monitor your AI infrastructure
    July 10, 2023
    by Kirk Borne
    When an enterprise project is low-profile (“below the radar”), then it is not likely to be the target of bad actors. Similarly, if some part of that project’s infrastructure fails or falters, then the consequences of the problem and/or the urgency of providing a solution are usually manageable. But when a high-profile (“above the radar”)… Read […]
  • Sentience: AI has demystified human consciousness, intelligence
    July 10, 2023
    by David Stephen
    There is a recent article, Unraveling the Mystery of Human Consciousness, where it was stated that, “Consciousness makes us capable of experiencing the scent of a rose, the touch of a breeze, the taste of food, the sound of music, and the sight of a sunrise. We also have a unique ability to be aware… Read […]
  • The Golden Rule and the AI Utility Function – Part II
    July 9, 2023
    by Bill Schmarzo
    In part 1 of the series on integrating the Golden Rule into the AI Utility Function, we reviewed the Golden Rule and brainstormed the key Golden Rule principles that one would want to encode into the AI Utility Function. We then reviewed a simple process that any Citizen of Data Science could leverage to ensure… Read […]
Education_DSC_160x600

In-Depth

  • Security data lakes and the future of organizational security
    July 10, 2023
    by Erin Hamm
    Evolving technological advancements have created a far more data-centric world. This has dramatically changed the enterprise landscape, while also creating more data silos. The explosion of cybersecurity tools and mounds of data in modern enterprises have made it difficult to combine data to create a unified view. This has resulted in siloed data that’s also… Read […]
  • Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers
    July 10, 2023
    by Alexander Demchuk
    Amazon Kendra and LLamaIndex can help with knowledge integration but fall short in connecting diverse knowledge sources, to enable efficient intelligent search. In this article, we compare the existing solutions and explain how to overcome their limitations using a Google Drive crawler. Companies often face difficulties in consolidating their knowledge base when their data is… Read […]
  • What’s missing from ChatGPT and other LLMs?
    July 10, 2023
    by Alan Morrison
    Recent developments in artificial intelligence remind me of the automotive industry in the late 19th and early 20th century. In that case, it took the industry several decades to commit to internal combustion engines. And while that picture was still unclear, there were over 250 different car manufacturers, some of whom were producing steam-powered cars.… Read […]
  • Data integration in IoT environments: Enhancing connectivity and insights
    July 5, 2023
    by Ovais Naseem
    In the dynamic world of the Internet of Things (IoT), data integration plays a crucial role in harnessing the full potential of connected devices. By seamlessly combining data from diverse sources, data integration enables organizations to unlock valuable insights, optimize operations, and make informed decisions. This blog will explore the significance of data integration in… Read […]
  • Ushering in the 5th epoch of distributed computing with accelerated AI technologies
    July 5, 2023
    by RobFarber
    This is the first in a series of articles based on interviews with Intel technology leaders about AI/HPC acceleration. The post Ushering in the 5th epoch of distributed computing with accelerated AI technologies appeared first on Data Science Central.
  • The hour is later than you think for AI impacting our jobs
    July 5, 2023
    by ajitjaokar
    In the movie the Lord of the Rings – the wizard Sauron says that “The hour is later than you think” I was reminded of this phrase when I read a report from McKinsey The economic potential of generative A There are some key findings on the future of AI which shows you how fast… Read […]
  • Role of AI in Web3: Ensuring seamless content moderation for dating websites
    July 5, 2023
    by Roger Brown
    As Web3 evolves and transforms into a more decentralized and user-centric ecosystem, the role of artificial intelligence or AI cannot be understated. By leveraging its capabilities, AI is contributing to various aspects of the Web3 landscape, such as managing data, executing contracts, generating insights, securing identities, curating content, governing organizations, and enhancing user experiences. An… Read […]

How Do Companies Use Artificial Intelligence?

How Do Businesses Use Artificial Intelligence

By now, AI-based tools have totally changed the way companies operate across all industries. The use of AI in them to streamline operations, make informed decisions, and enhance customer experiences.

Companies utilize AI in a multitude of ways, such as automating repetitive tasks, predicting customer behavior, and optimizing supply chain management. Today, we will dive deeper into how the use of AI in various businesses can benefit their productivity and KPIs. Here we go!

Unleashing the power of AI in farming

The use of AI in the farming business is taking the agricultural industry to the next level. According to a report by the World Economic Forum, AI could increase global food production by 70% by 2050. With the world’s population projected to reach 9.7 billion by 2050, AI can play a crucial role in meeting the growing demand for food.

AI-powered technologies such as precision agriculture and autonomous farming systems have already shown promising results. For example, farmers can use AI algorithms and satellite imagery to optimize irrigation, fertilizer usage, and pest control, resulting in higher crop yields and reduced environmental impact.

In addition, AI can help address labor shortages in the agricultural sector. With the aging population of farmers and the difficulty in attracting younger generations to the field, autonomous robots and drones equipped with AI capabilities can perform tasks such as planting, harvesting, and monitoring crops, increasing efficiency and productivity.

Furthermore, AI can enable predictive analytics in farming, allowing farmers to make data-driven decisions based on real-time weather data, soil conditions, and crop health indicators. This proactive approach can help prevent diseases, optimize resource allocation, and maximize profits.

How AI helps to enhance logistics and retailing

AI can greatly enhance logistics and retailing by improving efficiency, accuracy, and customer experience. With AI-powered algorithms and predictive analytics, companies can optimize supply chain management, reducing costs and minimizing delays.

AI can also enable intelligent inventory management, helping retailers accurately forecast demand and prevent stockouts or overstock situations. According to recent research, AI-powered inventory management systems can help retailers reduce out-of-stock situations by up to 80% and overstock situations by up to 65%.

Additionally, AI-enhanced chatbots and various virtual assistants can provide personalized customer support and recommendations, enhancing the shopping experience. Salesforce found that 64% of consumers prefer interacting with chatbots and virtual assistants for customer support, highlighting the growing popularity and effectiveness of AI-powered solutions in enhancing the shopping experience.

Furthermore, AI can analyze customer data and behavior to provide targeted marketing campaigns and promotions, increasing customer engagement and loyalty, and delivering a seamless and more satisfying shopping experience for consumers.

AI is advancing medicine & healthcare sector

This is not only about improving diagnostic accuracy, but also about enhancing patient monitoring, drug discovery, surgical outcomes, and mental health support possible due to sophisticated AI tools.

A study by Nature Medicine found that an AI system outperformed human doctors in detecting breast cancer from mammogram images, reducing false negatives by 9.4% and false positives by 5.7%.

AI also has significant potential in predicting disease progression. For example, by analyzing electronic health records and genetic data, AI algorithms accurately predicted the onset of Alzheimer’s disease up to six years in advance, allowing for early intervention and personalized treatment plans.

AI is also transforming drug discovery and development. AI algorithms can rapidly analyze vast amounts of data to identify potential drug candidates for various diseases, significantly speeding up the drug discovery process.

Furthermore, AI is playing an important role in improving mental health services: AI-based chatbot has already proved to be effective in reducing symptoms of depression and anxiety among users, providing accessible and personalized mental health support.

<strong>How Do Companies Use Artificial Intelligence?</strong>
InData Labs

Automotive sector is transformed with AI

AI tools are boosting innovation in the automotive sector by improving safety through advanced driver assistance systems (ADAS). These AI-powered algorithms can analyze real-time data from sensors, cameras, and radars to detect potential hazards, alert drivers, and even autonomously control the vehicle to prevent accidents.

AI is also enhancing the driving experience by enabling features such as voice recognition, natural language processing, and gesture control. This allows drivers to interact with their vehicles more intuitively, reducing distractions and improving overall convenience.

Self-driving cars also utilize AI algorithms to perceive the environment, make decisions, and navigate complex traffic scenarios. Companies like Tesla, Waymo, and Uber are investing heavily in AI technology to develop fully autonomous vehicles that can operate safely and efficiently on public roads.

AI algorithms are also being used to optimize fuel efficiency, reduce emissions, and predict when a vehicle component is likely to fail. Intelligent algorithms can analyze driving patterns, road conditions, and other factors to optimize engine performance and reduce fuel consumption, contributing to a more sustainable transportation system.

Choosing the right AI provider

AI offers really impressive benefits for businesses of various industries, but only when implemented and customized properly. Choosing the right AI provider for your company is crucial for successful AI implementation and your business growth. Here are 8 key points to consider when making this decision:

1. Define your specific needs

Begin by identifying the specific AI solutions you require. Determine the problems you want to solve or the areas where you need assistance.

2. Evaluate expertise and experience

Look for an AI provider with a strong track record in your industry. Assess their experience, expertise, and success stories related to your specific needs.

3. Consider scalability and flexibility

Ensure that the AI provider can scale their solutions to meet your company’s future requirements. Verify if they offer flexible options that can adapt as your business evolves.

4. Assess data security and privacy measures

AI involves handling sensitive data, so prioritize providers with robust security protocols and a clear commitment to data privacy compliance.

5. Evaluate integration capabilities

Determine if the AI provider’s solutions can seamlessly integrate with your existing systems and technologies. Compatibility is essential for a smooth implementation process.

6. Request product demonstrations and trials

Ask for product demonstrations or trials to assess the provider’s offerings firsthand. This allows you to evaluate usability, functionality, and how well it aligns with your goals.

7. Check customer support and maintenance

Inquire about the level of customer support and maintenance services offered by the provider. Reliable support ensures timely assistance and minimizes disruptions.

8. Consider cost-effectiveness

Finally, compare pricing models and value for money among potential AI providers. Evaluate the total cost of ownership, including implementation, maintenance, and potential future upgrades.

By carefully considering these factors, you can choose an AI provider for your business that aligns with your needs, goals, and long-term vision, setting the stage for successful integration and growth.

Wrapping up

The significance of AI and AI-driven solutions in the current data-driven business landscape cannot be overstated. In this article, we’ve touched upon only several industries, but the use of AI in business is really worldwide. Through AI-powered tools and services, businesses can provide round-the-clock customer support, improving response times and overall satisfaction.

AI also plays a crucial role in data analysis, enabling companies to extract valuable insights and make data-driven decisions. We recommend businesses harness the power of AI right now, and select the fitting AI provider that best aligns with their business goals.

Meta claims its new art-generating model is best-in-class

Meta claims its new art-generating model is best-in-class Kyle Wiggers 8 hours

Over the past two years, AI-powered image generators have become commodified, more or less, thanks to the widespread availability of — and decreasing technical barriers around — the tech. They’ve been deployed by practically every major tech player, including Google and Microsoft, as well as countless startups angling to nab a slice of the increasingly lucrative generative AI pie.

That isn’t to suggest they’re consistent yet, performance-wise — far from it. While the quality of image generators has improved, it’s been incremental, sometimes agonizing progress.

But Meta claims to have had a breakthrough.

Today, Meta announced CM3leon (“chameleon” in clumsy leetspeak), an AI model that the company claims achieves state-of-the-art performance for text-to-image generation. CM3leon is also distinguished by being one of the first image generators capable of generating captions for images, laying the groundwork for more capable image-understanding models going forward, Meta says.

“With CM3leon’s capabilities, image generation tools can produce more coherent imagery that better follows the input prompts,” Meta wrote in a blog post shared with TechCrunch earlier this week. “We believe CM3leon’s strong performance across a variety of tasks is a step toward higher-fidelity image generation and understanding.”

Most modern image generators, including OpenAI’s DALL-E 2, Google’s Imagen and Stable Diffusion, rely on a process called diffusion to create art. In diffusion, a model learns how to gradually subtract noise from a starting image made entirely of noise — moving it closer step by step to the target prompt.

The results are impressive. But diffusion is computationally intensive, making it expensive to operate and slow enough that most real-time applications are impractical.

CM3leon is a transformer model, by contrast, leveraging a mechanism called “attention” to weigh the relevance of input data such as text or images. Attention and the other architectural quirks of transformers can boost model training speed and make models more easily parallelizable. Larger and larger transformers can be trained with significant but not unattainable increases in compute, in other words.

And CM3leon is even more efficient than most transformers, Meta claims, requiring five times less compute and a smaller training data set than previous transformer-based methods.

Interestingly, OpenAI explored transformers as a means of image generation several years ago with a model called Image GPT. But it ultimately abandoned the idea in favor of diffusion — and might soon move on to “consistency.”

To train CM3leon, Meta used a data set of millions of licensed images from Shutterstock. The most capable of several versions of CM3leon that Meta built has 7 billion parameters, over twice as many as DALL-E 2. (Parameters are the parts of the model learned from training data and essentially define the skill of the model on a problem, like generating text — or, in this case, images.)

One key to CM3leon’s stronger performance is a technique called supervised fine-tuning, or SFT for short. SFT has been used to train text-generating models like OpenAI’s ChatGPT to great effect, but Meta theorized that it could be useful when applied to the image domain, as well. Indeed, instruction tuning improved CM3leon’s performance not only on image generation but on image caption writing, enabling it to answer questions about images and edit images by following text instructions (e.g. “change the color of the sky to bright blue”).

Most image generators struggle with “complex” objects and text prompts that include too many constraints. But CM3Leon doesn’t — or at least, not as often. In a few cherrypicked examples, Meta had CM3Leon generate images using prompts like “A small cactus wearing a straw hat and neon sunglasses in the Sahara desert,” “A close-up photo of a human hand, hand model,” “A raccoon main character in an Anime preparing for an epic battle with a samurai sword” and “A stop sign in a Fantasy style with the text ‘1991.’”

For the sake of comparison, I ran the same prompts through DALL-E 2. Some of the results were close. But the CM3Leon images were generally closer to the prompt and more detailed to my eyes, with the signage being the most obvious example. (Until recently, diffusion models handled both text and human anatomy relatively poorly.)

Meta image generator

Meta’s image generator.

DALL-E 2

The DALL-E 2 results.

CM3Leon can also understand instructions to edit existing images. For example, given the prompt “Generate high quality image of ‘a room that has a sink and a mirror in it’ with bottle at location (199, 130),” the model can generate something visually coherent and, as Meta puts it, “contextually appropriate” — room, sink, mirror, bottle and all. DALL-E 2 utterly fails to pick up on the nuances of prompts like these, at times completely omitting the objects specified in the prompt.

And, of course, unlike DALL-E 2, CM3leon can follow a range of prompts to generate short or long captions and answer questions about a particular image. In these areas, the model performed better than even specialized image captioning models (e.g. Flamingo, OpenFlamingo) despite seeing less text in its training data, Meta claims.

But what about bias? Generative AI models like DALL-E 2 have been found to reinforce societal biases, after all, generating images of positions of authority — like “CEO” or “director” — that depict mostly white men. Meta leaves that question unaddressed, saying only that CM3leon “can reflect any biases present in the training data.”

“As the AI industry continues to evolve, generative models like CM3leon are becoming increasingly sophisticated,” the company writes. “While the industry is still in its early stages of understanding and addressing these challenges, we believe that transparency will be key to accelerating progress.”

Meta didn’t say whether — or when — it plans to release CM3leon. Given the controversies swirling around open source art generators, I wouldn’t hold my breath.

Docker Tutorial for Data Scientists

Docker Tutorial for Data Scientists
Image by Author

Python and the suite of Python data analysis and machine learning libraries like pandas and scikit-learn help you develop data science applications with ease. However, dependency management in Python is a challenge. When working on a data science project, you’ll have to spend substantial time installing the various libraries and keeping track of the version of the libraries you’re using amongst others.

What if other developers want to run your code and contribute to the project? Well, other developers who want to replicate your data science application should first set up the project environment on their machine—before they can go ahead and run the code. Even small differences such as differing library versions can introduce breaking changes to the code. Docker to the rescue. Docker simplifies the development process and facilitates seamless collaboration.

This guide will introduce you to the basics of Docker and teach you how to containerize data science applications with Docker.

What Is Docker? Docker Tutorial for Data Scientists
Image by Author

Docker is a containerization tool that lets you build and share applications as portable artifacts called images.

Aside from source code, your application will have a set of dependencies, required configuration, system tools, and more. For example, in a data science project, you’ll install all the required libraries in your development environment (preferably inside a virtual environment). You’ll also ensure that you’re using an updated version of Python that the libraries support.

However, you may still run into problems when trying to run your application on another machine. These problems often arise from mismatched configuration and library versions—in the development environment—between the two machines.

With Docker, you can package your application—along with the dependencies and configuration. So you can define an isolated, reproducible, and consistent environment for your applications across the range of host machines.

Docker Basics: Images, Containers, and Registries

Let’s go over a few concepts/terminologies:

Docker Image

A Docker image is the portable artifact of your application.

Docker Container

When you run an image, you’re essentially getting the application running inside the container environment. So a running instance of an image is a container.

Docker Registry

Docker registry is a system for storing and distributing Docker images. After containerizing an application into a Docker image, you can make it available for the developer community by pushing them to an image registry. DockerHub is the largest public registry, and all images are pulled from DockerHub by default.

How Does Docker Simplify Development?

Because containers provide an isolated environment for your applications, other developers now only need to have Docker set up on their machine. And they can start containers they can pull the Docker image and start containers using a single command—without having to worry about complex installations—in remote

When developing an application, it is also common to build and test multiple versions of the same app. If you use Docker, you can have multiple versions of the same app running inside different containers—without any conflicts—in the same environment.

In addition to simplifying development, Docker also also simplifies deployment and helps the development and operations teams to collaborate effectively. On the server side, the operations team doesn't have to spend time resolving complex version and dependency conflicts. They only need to have a docker runtime set up

Essential Docker Commands

Let's quickly go over some basic Docker commands most of which we’ll use in this tutorial. For a more detailed overview read: 12 Docker Commands Every Data Scientist Should Know.

Command Function
docker ps Lists all running containers
docker pull image-name Pulls image-name from DockerHub by default
docker images Lists all the available images
docker run image-name Starts a container from an image
docker start container-id Restarts a stopped container
docker stop container-id Stops a running container
docker build path Builds an image at the path using instructions in the Dockerfile

Note: Run all the commands by prefixing sudo if you haven’t created the docker group with the user.

How to Containerize a Data Science App Using Docker

We’ve learned the basics of Docker, and it’s time to apply what we’ve learned. In this section, we’ll containerize a simple data science application using Docker.

House Price Prediction Model

Let’s take the following linear regression model that predicts the target value: the median house price based on the input features. The model is built using the California housing dataset:

# house_price_prediction.py  from sklearn.datasets import fetch_california_housing  from sklearn.model_selection import train_test_split  from sklearn.preprocessing import StandardScaler  from sklearn.linear_model import LinearRegression  from sklearn.metrics import mean_squared_error, r2_score    # Load the California Housing dataset  data = fetch_california_housing(as_frame=True)  X = data.data  y = data.target    # Split the dataset into training and test sets  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)    # Standardize features  scaler = StandardScaler()  X_train = scaler.fit_transform(X_train)  X_test = scaler.transform(X_test)    # Train the model  model = LinearRegression()  model.fit(X_train, y_train)    # Make predictions on the test set  y_pred = model.predict(X_test)    # Evaluate the model  mse = mean_squared_error(y_test, y_pred)  r2 = r2_score(y_test, y_pred)    print(f"Mean Squared Error: {mse:.2f}")  print(f"R-squared Score: {r2:.2f}")

We know that scikit-learn is a required dependency. If you go through the code, we set as_frame equal to True when loading the dataset . So we also need pandas. And the requirements.txt file looks like so:

pandas==2.0  scikit-learn==1.2.2

Docker Tutorial for Data Scientists
Image by Author

Create the Dockerfile

So far, we have the source code file house_price_prediction.py and the requirements.txt file. We should now define how to build an image from our application. The Dockerfile is used to create this definition of building an image from the application source code files.

So what is a Dockerfile? It is a text document that contains step-by-step instructions to build the Docker image.

Docker Tutorial for Data Scientists
Image by Author

Here’s the Dockerfile for our example:

# Use the official Python image as the base image  FROM python:3.9-slim    # Set the working directory in the container  WORKDIR /app    # Copy the requirements.txt file to the container  COPY requirements.txt .    # Install the dependencies  RUN pip install --no-cache-dir -r requirements.txt    # Copy the script file to the container  COPY house_price_prediction.py .    # Set the command to run your Python script  CMD ["python", "house_price_prediction.py"]

Let’s break down the contents of the Dockerfile:

  • All Dockerfiles start with a FROM instruction specifying the base image. Base image is that image on which your image is based. Here we use an available image for Python 3.9. The FROM instruction tells Docker to build the current image from the specified base image.
  • The SET command is used to set the working directory for all the following commands (app in this example).
  • We then copy the requirements.txt file to the container’s file system.
  • The RUN instruction executes the specified command—in a shell—inside the container. Here we install all the required dependencies using pip.
  • We then copy the source code file—the Python script house_price_prediction.py—to the container’s file system.
  • Finally CMD refers to the instruction to be executed—when the container starts. Here we need to run the house_price_prediction.py script. The Dockerfile should contain only one CMD instruction.

Build the Image

Now that we’ve defined the Dockerfile, we can build the docker image by running the docker build:

docker build -t ml-app .

The option -t allows us to specify a name and tag for the image in the name:tag format. The default tag is latest.

The build process takes a couple of minutes:

Sending build context to Docker daemon  4.608kB  Step 1/6 : FROM python:3.9-slim  3.9-slim: Pulling from library/python  5b5fe70539cd: Pull complete   f4b0e4004dc0: Pull complete   ec1650096fae: Pull complete   2ee3c5a347ae: Pull complete   d854e82593a7: Pull complete   Digest: sha256:0074c6241f2ff175532c72fb0fb37264e8a1ac68f9790f9ee6da7e9fdfb67a0e  Status: Downloaded newer image for python:3.9-slim   ---> 326a3a036ed2  Step 2/6 : WORKDIR /app  ...  ...  ...  Step 6/6 : CMD ["python", "house_price_prediction.py"]   ---> Running in 7fcef6a2ab2c  Removing intermediate container 7fcef6a2ab2c   ---> 2607aa43c61a  Successfully built 2607aa43c61a  Successfully tagged ml-app:latest

After the Docker image has been built, run the docker images command. You should see theml-app image listed, too.

docker images

Docker Tutorial for Data Scientists
You can run the Docker image ml-app using the docker run command:

docker run ml-app

Docker Tutorial for Data Scientists

Congratulations! You’ve just dockerized your first data science application. By creating a DockerHub account, you can push the image to it (or to a private repository within the organization).

Conclusion

Hope you found this introductory Docker tutorial helpful. You can find the code used in this tutorial in this GitHub repository. As a next step, set up Docker on your machine and try this example. Or dockerize an application of your choice.

The easiest way to install Docker on your machine is using Docker Desktop: you get both the Docker CLI client as well as a GUI to manage your containers easily. So set up Docker and get coding right away!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more.

More On This Topic

  • 12 Docker Commands Every Data Scientist Should Know
  • What will the demand for Data Scientists be in 10 years? Will Data…
  • Docker for Data Science Cheat Sheet
  • dbt for Data Transformation — Hands-on Tutorial
  • Data Ingestion with Pandas: A Beginner Tutorial
  • You Don’t Have to Use Docker Anymore

Internet has Become An AI Dumping Ground, No Solution in Sight

When Amazon’s Kindle Unlimited young adult romance bestseller list was filled with dozens of nonsensical AI-generated books last month, the Jeff Bezos run tech giant figured out a way to monetise it.

After realising the potential of generative AI models like GPT, people have taken a step ahead and started filling websites with junk generated by AI to get the attention of advertisers. This content aims to attract paying advertisers according to a report from the media research organisation NewsGuard. The companies behind the models generating this content have been vocal about the measures they are taking to deal with the issue but no concrete plan has yet been executed.

According to the report, more than 140 major brands are currently paying for advertisements that end up on unreliable AI-written sites, likely without their knowledge. The report further clarifies that the websites in question are presented in a way that a reader could assume that it’s produced by human writers, because the site has a generic layout and content typical to news websites. Furthermore, these websites do not clearly disclose that its content is AI produced.

Hence, it is high time authorities step in and take charge of not just keeping an eye on false but also non-human generated content.

Google Search in the picture

According to a recent report by NewsGuard, a staggering 90% of advertisements from well-known brands appearing on AI-generated news websites were pushed by Google, despite the company’s own policies prohibiting the ad placement of ads on pages containing “spammy automatically generated content”. This trend not only poses a threat of a spammy internet dominated by AI-generated material but also questions the massive amount of money spent on advertising.

Earlier this year, Google issued a statement asserting its commitment to safeguard search results against spam, emphasising that employing AI-generated content to manipulate search rankings is a violation of spam policies within Alphabet.

The Sundar Pichai-led firm announced at the latest Google I/O conference significant steps to identify and contextualise AI content available on its Search. While measures like watermarking and implementing metadata aims to ensure transparency and enable users to differentiate between AI-generated and authentic images, it can only be applied to images as there is no obvious way to watermark AI-generated text.

Mass Produced

The rise of false information has been a major cause of concern but now the monetisation of the activity has clearly skyrocketed. A few months ago, several media houses fell prey to a hoax image of an explosion near the Pentagon causing collateral damage to the US stock market.

Since generative AI models gained popularity on the internet, many instances of false information have surfaced — Former US President Donald Trump apparently being arrested, or Tesla CEO Elon Musk holding hands with GM CEO Mary Barra. Also, who can forget Pope Francis wearing a stylish white puffer jacket walking around with coffee in one hand? These events highlight how difficult it is going to be to separate AI-generated content from facts.

Why Pope Francis Is the Star of A.I.-Generated Photos – The New York Times
Read more here: https://t.co/o10H9AFk0X#ArtificialIntelligence #AI #DataScience #100DaysOfCode #Python #MachineLearning #BigData #DeepLearning #NLP #Robots #IoT

— Iain Brown, PhD (@IainLJBrown) April 8, 2023

Incoming model collapse

Unlike Google, NewsGuard has figured out a clever way to identify unreliable AI-written content on the internet. Since many of these sites lack human intervention, they often contain error messages commonly seen in AI-generated content. For instance, CountyLocalNews.com displayed messages like “Sorry, I cannot fulfil this prompt as it goes against ethical and moral principles… As an AI language model, it is my responsibility to provide factual and trustworthy information.” NewsGuard’s AI scans for these messages, and then a human analyst reviews them.

The increasing spammy AI-generated content on the internet can become a problem for the AI companies behind these AI models. Reason being, the foundational large language models of chatbots like ChatGPT and Bing, train on publicly available data. As these data sets are constantly filled with AI-produced content, researchers are raising concern that the language models will become less useful, a phenomenon known as “model collapse”.

Ilia Shumailov, a research fellow at Oxford University’s Applied and Theoretical Machine Learning Group who co-wrote The Curse of Recursion: Training on Generated Data Makes Models Forget — a paper on this phenomenon, believes the collapse is ‘inevitable’ and might not be such a bad thing after all. “Maybe we’ll get rid of captchas, and it will become normal to be a computer on the internet,” he told the Wall Street Journal, referring to the picture-puzzles that websites impose to distinguish computers from humans.

The post Internet has Become An AI Dumping Ground, No Solution in Sight appeared first on Analytics India Magazine.

Unlock DataOps Success with DataOps.live – Featured in Gartner Market Guide!

Sponsored Post

Unlock DataOps Success with DataOps.live - Featured in Gartner Market Guide!

We have fantastic news to share with you! DataOps.live has been featured in the highly esteemed Gartner Market Guide for DataOps Tools, released on December 5, 2022. This recognition signifies a major milestone in the DataOps market and solidifies DataOps.live as an essential player in this evolving landscape.

Gartner Market Guides are invaluable resources for staying ahead of emerging market trends. With over 100 Market Guide research notes, Gartner provides strategic leaders like you with a comprehensive view of various markets, both mature and smaller, in an easily digestible format.

If you are utilizing Snowflake to build your data infrastructure, applications, data products, and analytic frameworks, it is essential to familiarize yourself with DataOps. Understanding the methodology, requirements, and best practices of DataOps is crucial because it brings the same transformative effects to DataOps as DevOps did to software applications.

To learn more about the DataOps market, download your free copy of the Gartner Market Guide for DataOps Tools.

Download Gartner Market Guide

More On This Topic

  • DataOps: 5 things that you need to know
  • Should You Consider a DataOps Career?
  • DataOps Summit 2021 CFP Is Now Open!
  • Unleashing the Power of MLOps and DataOps in Data Science
  • StreamSets DataOps Platform — Summer ‘21 Public Beta. Sign up today!
  • ETL and ELT: A Guide and Market Analysis