AI — Страница 1457

Will the power of data in the AI era leave startups at a disadvantage?

Will the power of data in the AI era leave startups at a disadvantage? Alex Wilhelm 8 hours

If you read any news about business, technology or startups today, you’re almost certain to find at least one mention of AI. And with good reason: Tech is on the hunt for its next growth vector.

Over the years, we’ve seen lots of interesting technologies strive for that mantle. From blockchain-based technology, and AR & VR for both consumer and enterprise applications, to creator-focused platforms — the list is long indeed.

The Exchange explores startups, markets and money.

Read it every morning on TechCrunch+ or get The Exchange newsletter every Saturday.

Most of those technologies, however, lost much of their luster when it became clear that it would take much longer than many expected for them to reach mass adoption. In some cases, the technology was not ready for everyday use, or it wasn’t as applicable for corporate or consumer usage as everyone thought. In many cases, they were simply too unwieldy to implement.

AI is the latest in that long line of hopefuls. Indeed, it has pretty much earned its place: Large language models are incredibly interesting and can serve a host of new and existing applications. Invariably, that has spurred public-market investors to expect tech companies to unlock new opportunities for growth from AI. Tech CEOs feel the same way, as do venture capitalists.

The industry is suffused with incredible optimism around the use of new AI technologies. Money is flowing into companies of all sizes and shapes that want to build AI models, help customers train and use those models, protect data from (or conserve information inside) LLMs, or apply the technology directly for various use cases.

It’s still unclear how all these new AI-related features and tools will be monetized, but everyone generally seems to agree that this New Thing really does have legs and it’s reasonable to be optimistic about AI’s impact on our lives.

I’m here for it. But I am also worried about who is going to make all the money.

It’s a rich company’s world

Rewinding the clock to July, Reuters noted that of the $173.9 billion that PitchBook counted in the first half of 2023, venture capitalists “poured more than $40 billion into AI startups.” That’s almost a quarter of all the money invested in that time — a simply immense portion at a time when VC activity is declining around the world.

Meta Introduces FACET To Evaluate Computer Vision Models

Meta on Thursday announced that DINOv2, its computer vision model trained through self-supervised learning to produce universal features, is now available under the Apache 2.0 license.

The company further added that it is also releasing a collection of DINOv2-based dense prediction models for semantic image segmentation and monocular depth estimation, giving developers and researchers even greater flexibility to explore its capabilities on downstream tasks.

Alongside DINOv2’s announcement, Meta introduced FACET (FAirness in Computer Vision EvaluaTion), a new comprehensive benchmark for evaluating the fairness of computer vision models across classification, detection, instance segmentation, and visual grounding tasks.

Meta said this move is in response to the challenging nature of benchmarking fairness in computer vision, which has often been hampered by potential mislabeling and demographic biases.

Meta in their blog post said that the FACET’s dataset is made up of 32,000 images containing 50,000 people, which is labeled by expert human annotators for demographic attributes. Additionally, FACET also contains person, hair, and clothing labels for 69,000 masks from SA-1B.

Meta evaluated Dinov2 using FACET which revealed nuances in its performance, particularly in gender-biased classes.

Meta said it hopes that FACET can become a standard fairness evaluation benchmark for computer vision models and help researchers evaluate fairness and robustness across a more inclusive set of demographic attributes. For the same purpose, Meta released the FACET dataset and a dataset explorer.

To make FACET effective, Meta said it hired expert reviewers to manually annotate person-related demographic attributes like perceived gender presentation and perceived age group as well as correlating visual features like perceived skin tone, hair type, and accessories.

Additionally, the dataset includes labels for person-related classes like “basketball player” and “doctor,” as well as attributes related to clothing and accessories.

The post Meta Introduces FACET To Evaluate Computer Vision Models appeared first on Analytics India Magazine.

OdinSchool Review: Making India Job-ready with Industry-aligned Bootcamp

Amidst the ordinary, rise the extraordinary. Naga Lakshmi Pothuguntla took a career gap of 11 years and now thrives as an associate manager at PepsiCo. Dr. Drakshayani Desai, a former professor, explored data science and is pursuing a career as a software engineer at Prolifics. Visweswari Pitchika took a break when she became a mother, though it was fulfilling to nurture her child, she is now also a Senior Analyst at Capgemini. From Karavaka, a coastal village in Andhra Pradesh, Durga Prasanna Devi is the first from her family to work at a corporate firm.

All of their success stories led us to OdinSchool, an online upskilling platform that helps young professionals and graduates launch, advance, and change their careers with well-structured, industry-focused bootcamps and live sessions. “The entire program, from curriculum design to pedagogy and outcomes, are aligned to addressing the needs of the current industry,” said Vijay Pasupulati, founder and CEO of OdinSchool, who added that their bootcamps are reasonably priced keeping in mind that upskilling young professionals is the need of the hour.

In line with this, OdinSchool has launched several new initiatives. Vijay Pasupulati said that early on, since launch in 2021, they realised that graduates and working professionals looking for online education don’t need just the content, but also require guidance and lots of hands-on practice. “Furthermore, through our dedicated placement team, we ensure that our learners transition into industry assets by connecting them with the most relevant job opportunities,” he said.

Currently, some of their students are working in companies like Tech Mahindra, Swiggy, Flipkart, CtrlS, and Capgemini, among other industry leaders. The company employs around 100+ trainers and speakers who are professors and experts in the field, with considerable experience working in companies and startups. It has currently partnered with over 500+ companies, including TCS, Genpact, Meesho, Expedia Group, ABinBev, Google and others actively hiring graduates from OdinSchool. (see below the list of hiring partners)

Besides its upskilling bootcamps, OdinSchool also specialises in running induction and employee upskilling programmes for companies.

Career Guidance for Life

Despite India producing the largest number of engineers annually, there persists a significant scarcity of skilled professionals in the domestic job market. The India Skills Report 2022 by Wheebox reveals that only 50% of the country’s youth are employable. This stark contrast implies that nearly one out of every two Indian graduates lacks the skills needed for employability.

That explains why having the right guidance and mentor makes the whole difference in one’s career growth. OdinSchool director Shruti Jayakumar believes that the engagement that a student shows in an online live class is much more as they can speak to the teacher in real-time. She said that they are, more often than not, making a big career switch and are expected to be a little anxious and in need of guidance. “A 360° upskilling is not just knowing technical knowledge, but learning how to learn, analyse their strengths and weaknesses, presentation and communication skills, which is what we offer in our courses,” she added.

This shift underscores the importance of continuous reskilling and upskilling for young professionals aiming to stay in sync with the industry demands. This is true especially for women, she said, who have taken long career breaks to look after their family, or professionals who took the axe in the recent mass layoffs, and even freshers struggling to land their first job can take up these upskilling courses and realise their dreams of working for leading enterprises.

In a market saturated with online courses, there’s something for everyone. While recorded classes with certifications are popular, courses provided by OdinSchool and Scaler etc take you through the whole process of learning and practical experience until you land a job. OdinSchool decided to keep their prices affordable, making the courses accessible for everyone.

“It has made it easier for freshers, women, and people who were laid off to upskill without having to pay an arm and a leg for it,” said Jayakumar.

What about Generative AI Courses?

With the onslaught of generative AI, the skill gap has become a nagging problem globally, but particularly for the Indian workforce. As per the latest report, 54% of workers believe generative AI will advance their careers, while 62% said they did not possess the skills to effectively and safely use the technology.

When a majority of companies are looking for talent and diverse skills, and are ready to pay hefty packages for it, upskilling and reskilling is the right way ahead. Andrew Ng’s DeepLearning is doing a pretty good job at offering both paid as well as free courses to developers. Besides, there are tons of free courses available online. So, how is OdinSchool competing with them?

The company has included Generative AI topics to their data science bootcamps and students learn how to use open source generative AI models and tools. Srinivas Vedantam, Head of Product at OdinSchool further elaborates, “Generative AI based application development is covered by industry experts as part of OdinTalks familiarise students with what’s happening beyond the courses. They are also required to build their capstone projects on training and fine-tuning LLMs for specific problems using custom datasets.”

This is the perks of online learning, where companies like OdinSchool constantly tweak the curriculum to align with evolving trends in the market is extremely beneficial to students looking to upskill.

The post OdinSchool Review: Making India Job-ready with Industry-aligned Bootcamp appeared first on Analytics India Magazine.

Who Will Make Money from the Generative AI Gold Rush?

The gold rush in Generative AI is well and truly underway. Generative AI (GenAI) is now creating content — words, images, videos, and audio — that is often indistinguishable from that produced by humans. Writing, visual design, coding, marketing, game production, music composition, and product design are just a few of the areas of human creativity that are being rapidly impacted by GenAI. As creative services are integrated into products like Microsoft Office 365, Slack, Discord, Salesforce Cloud, and Gmail, GenAI will increase the productivity of billions of people before we know it. We will all soon use GenAI to create our first drafts of anything and everything.

So who will make money from GenAI? I asked OpenAI’s Dall-E-2 text-to-image service that question, and it produced the image below. Not bad.

Dall-E-2 prompt “Who will make money from Generative AI?”

In 2018, I wrote a popular blog post on Who is going to make money in AI. Here’s my follow-up post on the billions being invested in GenAI across thousands of new use cases. In essence, there are five ‘layers’ of potential value capture in this gold rush:

1. Infrastructure – the companies offering chips and cloud infrastructure that will run the massive underlying GenAI computer models.

2. Foundational Models — the companiesbuilding the huge text, image, audio, and other models that generate creative output.

3. Applications — the large and small firms that are building apps that will be used by consumers, businesses, and governments for creative tasks.

4. Industry and organizations — that, as part of their creative activities, will extract value from GenAI applications, tools, and platforms.

5. Countries — that will create, export, and deploy GenAI technologies both within and across national borders.

In each of these layers who will be the winners?

1. GenAI Infrastructure

BigTech companies already dominate in GenAI infrastructure with their cloud services and hardware chips.

Examples of BigTech and chip companies that will provide the GenAI infrastructure

Microsoft and Google are well-positioned in the US cloud market, while Baidu and Alibaba are well-positioned in China. Their massive supercomputer cloud infrastructure is engineered to run GenAI’s complex, expensive, large text, visual, and audio Foundational Models. There are already many developers using their cloud AI API services and tools to build apps, and this trend is expected to accelerate as entrepreneurs rush to address virtually limitless GenAI use cases. Amazon has been quiet on Foundational Models, so a big question is how will they respond.

GenAI uses massive amounts of computational power to generate creative outputs. Sam Altman, CEO of OpenAI, said:

we will have to monetize it [ChatGPT and Dall-E-e] somewhat at some point; the computer costs are eye-watering.”

Rumour has it that Open AI’s GPT-3 training cost USD$12 million in energy bills alone. No surprises that OpenAI took a further $10 billion investment from Microsoft in early 2023, much of which will be in the form of access credits to Microsoft Azure’s supercomputing infrastructure.

The chip makers are salivating over the need for supercomputer power. With a market cap of over half a trillion dollars, NVIDIA’s (NASDAQ: NVDA) stock price has risen from $60 in 2018 to $240 in early 2023. BigTech is also investing in their own AI-optimized chips. The recent US export ban on advanced AI chips to China will accelerate Chinese State aid and domestic investment in their semiconductor industry (as well as raise geopolitical tensions). Given the amount of investment required, the winners in this space will be those who are or are backed by big players.

2. Foundational Models

BigTech’s size and scope give them a competitive edge when it comes to developing GenAI Foundational Models. These models are trained on vast amounts of data, utilizing BigTech’s vast computational resources. For example, OpenAI’s GPT-3 text model, known as a Large Language Model (LLM), was trained on about 45 terabytes of textual data representing half a trillion words that were “hoovered up” from much of the English-speaking internet. Similarly, OpenAI’s Dall-E-2 text-to-image based model was trained on 650 million image-caption pairs.

BigTech does not want to lose its leadership in cloud services by failing to capture the enormous revenue streams generated by the billions of end users of these Foundational Models in the future. Microsoft has partnered with OpenAI, and Google recently launched its Bard language chatbot which complements its Imagen model for creating photorealistic images from input text.

Chinese BigTech is also not standing still. Alibaba is testing an in-house chat service. Baidu already provides ERNIE-ViLG, a text-to-image parameter model, and is currently testing a new chatbot service. BigTech’s size gives it several advantages that startups will find difficult to replicate.

Examples of Foundational Model providers of text, image, video and audio, as well as tools and services

BigTech has the advantage of scale to address issues of truth, bias, and toxicity in Foundational Models

BigTech may be the only players capable of dealing with GenAI’s darker side. Although GenAI is still in its infancy, problems with Foundational Models are becoming apparent. The issues range from truth (GenAI producing content that is simply wrong), bias (prejudice against specific groups) and toxicity (e.g. racist, misogynistic, or hate speech). In early 2023, a massive $100 billion was knocked off Alphabet’s market cap as the financial markets took fright at the erroneous and offensive answers Google’s Bard chatbot service gave. Microsoft’s limited release Bing chatbot also displayed troubling (and even racist) responses from users jailbreaking the safeguards, though its share price did not fall as precipitously. There is also a new type of cyberattack known as prompt injections which can circumvent guardrails by injecting malicious instructions.

The challenge for those developing these Foundational Models will be ensuring that their output is both responsible and accurate. Foundational Models cannot simply regurgitate biased and toxic content that has been scraped from the far reaches of the internet. These models are also hallucinatory. This means they confidently deliver well-constructed and eloquent answers to questions that may be factually incorrect. As Noam Shazeer, co-founder of Character.AI, stated in the New York Times:

“…these systems are not designed for truth. They are designed for plausible conversation.”

Or put another way they are confident bullshit artists.

BigTech cannot afford the reputational, financial, and strategic risks that Model failures could bring. They are building supervisory oversight systems that include guardrails and model tuning. To build trust with users and meet likely regulatory requirements, BigTech will need to engineer solutions for model transparency, explainability, and citation of sources. Reinforcement learning from human feedback (RLFH) will require a veritable army of people to review and rate model answers to questions. These are not simple problems to solve at scale. Once again, BigTech is well positioned due to its access to capital, engineering talent, datasets, and the scale of its human feedback loops that comes with having billions of users.

BigTech Models are not well suited for every situation

Despite their size and scale, BigTech will not be able to control the entire Foundational Model gold rush. Their models are broadly horizontal and well suited to answering, if not correctly, any conceivable consumer question. They are not, however, always as well suited to the needs of the enterprise with vertical tasks. Why? BigTech’s horizontal models (1) do not always perform well on specialist tasks, (2) frequently do not protect enterprise proprietary data, (3) are not trained on non-English languages, (4) lack transparency and explainability, (5) are not as well suited for use on edge devices and on-premise, (6) can be expensive to run in their cloud, and (7) create company dependence on BigTech.

A few, extremely well-funded startups are offering alternatives to BigTech Foundational Models

BigTech Foundational Models are not for everyone. This leaves room for a few extremely well-funded startups that have raised hundreds of millions of dollars, if not billions.

Anthropic, founded in 2021 is focused on more reliable, explainable and steerable LLMs, and has raised over $1 billionwith the most recent investment of $300 million coming from Google.
AI21labs has raised $119 million for its Jurassic-1 text model. With over 178 billion parameters, Jurassic-1 is similar in size to GPT-3.
Cohere has raised $165 millionfor LLMs and natural language processing (NLP) as a service.
BLOOM is a private–public research LLM project supported by private sector Hugging Face and European research institutes to create an open source LLM with 176 billion parameters. It is has been trained on 46 human languages, including twenty African languages that are underrepresented in most LLMs.
UK based Stability AI recently raised a whopping $100 million for a valuation north of $1 billion for its open-sourced image generation service, Stable Diffusion.

BigTech is aware of their model limitations, particularly Microsoft, which recently announced that enterprises will be able to “fine-tune” their models without fear of proprietary data being shared in order to build a better model for all.

However, these steps will not satisfy everyone. Adelph Alpha, a German startup that has raised $31 million, is addressing enterprise concerns about BigTech Foundational Models with its own “European” centric models. But, it is unclear whether they will be able to compete at scale.

BigTech will win the race for horizontal Foundational Models, leaving room for a few highly capitalized startup alternatives. Perhaps open-source models like BLOOM and Stable Diffusion will get scale or at least find a niche existence. As is customary, there will be tools and service providers who profit from making it easier to work with these Foundational Models. But overall:

BigTech’s market dominance will be amplified by their ability to effectively give away their Foundational Models for free because they will make the majority of their money from their underlying cloud services.

3. Generative AI Applications

While BigTech will win the picks and shovels of the GenAI gold rush, the application layer is much more of a level playing field. Existing enterprise software companies, “full stack” startups, and thousands of startups enabled by these Foundational Models will offer new GenAI applications.

Traditional enterprise software companies, such as Salesforce and Microsoft, will organically or thought acquistion bring GenAI capabilities to their billions of users. Microsoft is also integrating its GenAI chatbot service into its Bing search application, directly challenging Google’s search hegemony.

A small number of well-funded startups will offer specialized “full stack” applications. In domains with specialized data, sequences, and computational requirements, these companies will develop their own underlying Foundational Models. For example, GenAI could revolutionize drug discovery and materials science by building their own models with applications. Investors will be drawn to these startups as they could offer substantial financial rewards as well as strong competitive defensibility.

Adept AI, for example, has raised $65Mto develop the next generation of robotic process automation (RPA) with natural language interfaces based on LLMs. In stealth mode, Inflection.ai is doing something similar. Character.AI, a chatbot that adopts the voice and knowledge of characters, raised $200M — $250M at a circa $1 billionvaluation for a full-stack implementation of specialized LLMs to support live-agent enterprise applications.

The adoption of GenAI will be extremely fast. If afirst draft of, say, an AI generated marketing pitch isn’t perfect, then it is simple to edit. ChatGPT was the fastest growing consumer app in history, with over 100 million monthly active users in just over two months after launch. This means that the battle for the nearly infinite number of GenAI creative applications will be fierce and fast.

Examples of primarily startups that provide apps to address major GenAI use cases

There will be a “Copilot” GenAI app for every imaginable use case

Putting GenAI to use will see consumers, businesses, and organizations around the world use applications enabled by startups built on top of these Foundational Models. Many GenAI startups will use the “Copilot for X” business model to assist users with “creative” tasks like writing or coding, as well as repetitive tasks like data entry or form filling. Here are a few of the startups competing to make money in various vertical use cases.

General text writing startups are assisting users in real-time with day-to-day writing tasks such as email composition, document creation, and text form completion. AI21labs’s Wordtune will “rewrite your text as if it were a professional copywriter.” The king of writing assistants is Grammarly who has banked over $400 million. The list of writing startups is long and includes Lex, HyperWrite, Compose AI, and Rytr.
Sales and marketing startups include the mammoth Jasper.ai which has raised $145M. Anyword has raised over $45 million to provide “high-converting textual content for sales.” Persadoraised over $66 million for language generation and “outperforms your best copy 96% of the time.” Startups are increasingly specializing in specific tasks such as writing product marketing descriptions.
Image generation startups are being powered by Open AI’s DALL-E-2, Stability AI’s Stable Diffusion, and Midjourney’s text-to-image Foundational Models. Startups include Art Breeder that helps users create collages.
Consumer facial and avatar startups include Lightricks’s Facetune app that assists in creating the “perfect’ Instagram image.” Lightricks has raised $350 million. Individual “magic avatars” can be created by users of the very popular Lensa AI app. Reface, which lets users swaptheir faces into different settings, has raised $5.5 million.
Product design startups include Botika who is “reinventing fashion shoots” with hyper-realistic images of models dressed in high-quality clothing in various settings. Maket assists in “generating architectural plans from text prompts in minutes, not months.” Tailorbird expedites the creation of floor plans for homeowners looking to renovate. Swapp has raised $7 million to help automate construction documents for projects. TestFit has raised $22 million to aid inreal-estate design.
Video focusedstartups offer video ideation, generation, editing, and workforce collaboration tools. Runway is the most well-funded with nearly $100 million in the bank. Magnifi has raised over $60 million for video editing, while InVideo has raised over $53 million. Several startups, including Hour One, which has raised $26 million, provide text-to-video services. Synthesia, based in London,has raised over $67 million for its avatar video creation platform. Overall NFX is tracking 54 companies that have raised a total of $0.5 billion for generative video startups.
Audio GenAI startups include music creation companies Soundraw, Boomy and Aiva. Splash has raised $23 million and allows users to create original music and sing lyrics to any melody. DupDub has raised over $250 million for voice overservices and claims a million users. Descript has raised over $100 million and provides voice cloning for audio transcription, podcasting, screen recording, audio, and video editing. Deepgram’s speech to text servicecompetes with BigTech and OpenAI’s Whisper and has received over $87 million in funding.
Games generation startups hope to save production studios $100s millions in production costs. Masterpiece Studio has raised $6 million tocreate 2D to 3D models. Replica has raised $5 million to focus on AI voices actors for games, films and the meta-verse. Latitude/AI Dungeon is a game studio that has raised $4 million for text based game generation. VoiceMod has raised over $7 million to provide real-time voice changing in games like Fortnite and apps like Skype. Ponzu is a startup for creating 3D surface textures, and Charisma AIis a startup for creating non-player creation (NPC) virtual characters. Inworld has raised $70 million for its AI developer platform for the “creation of immersive realities, virtual characters, and metaverse spaces”. Overall A16Z currently tracks more than 50 startups in the games industry.
Chatbot and conversational AI startups include vertical health symptom checkers ada, which has raised $190 million, and UK-based Healthily, whichhas raised about $70 million. Given that AI could save call centre businesses $80B annually, startups are raising massive sums. Cresta AI has raised more than $150 million, and London based PolyAI has raised $68 million for its “superhuman voice assistants.”
Coding co-pilot startups are following the lead of Microsoft’s GitHub Copilot, whichclaims that up to 40% of code can be generated automatically. Warp, a company that that converts natural language into computer commands, has raised $70 million. Tabnine has raised $30 million.
Knowledge management, summarization, and enterprise search startups include Primer AI, which has raised $168 million, and Otter which has raised $63 million. Sana Labs, a Stockholm based startups,has raised $54.6 million to facilitate the discovery, sharing, and repurposing of information within organizations.

So which startups will win?

There is no shortage of capital flowing into GenAI application startups. Full stack startups will raise large sums of money in vertical domains such as drug discovery, where they will create highly specialized models and applications. In the broader B2B space, the race will be horizontal and vertical, with copilot business models at the centre. On the one hand horizontal startups will provide services across industries, such as Jasper’s sales and marketing assistant. On the other hand, startups are increasingly vertically focused by industry, function, and task.

Winners will achieve scale and defensibility by implementing the following:

Strong ROI — for their use case, as well as a short time to proof of value.
Proprietary and customized Foundational Models — “fine tuned” for specific audiences using localized, specialized, and proprietary company data.
Workflows — proving usability and deep integration into customer processes, making it difficult to remove once installed.
Feedback loops — from reinforcement learning from human feedback (RLFH), for example, to improve model alignment with user intent.
Flywheel dynamics — the moreRLFH and other feedback, the better the model performance through “fine tuning”, the greater the usage, and thus momentum grows.
Scale and speed of investment — with lower profit margins as much as much of the IP belongs in the Foundational Models, the game is all about scale. Those who can quickly build their brand and attract a high numbers of users and customers to get the flywheel spinning will thrive as category leader.

In the B2C GenAI consumer space, horizontal players with speed and massive consumer acquistion budgets are likely to win their race.

AutogenAI, based in the UK, is anexample of a B2B startup company that is well positioned to win its category of bid management copilot. They’ve spent the last two years developing an app that helps businesses save time, money while also improving the quality of bids, tenders, and proposals. They have “fine-tuned” the OpenAI LLM using examples of company website content, winning and losing sales bids, marketing copy and annual reports. They also provide a human-machine supervisory user interface to assist in reviewing the source and accuracy of generated content and facts. This also provide a critical human reinforcement learning loop with increased usage. Customers are increasingly using their application as a next generation knowledge management and search tool, making it stickier.

A few GenAI startups will be acquired and become features in larger enterprise and consumer applications. For example, largesocial media companies with millions of users will acquire the latest face and avatar creation startups. Incumbent graphic design software companies will acquire the most promising image and video editing startups. Microsoft, for example, is now offering GenAI “Microsoft Dynamics 365 Copilot” natively as part of its CRM and ERP applications.

In short, a few lucky and brave startups will hit pay dirt if they can quickly build scale and a flywheel for their copilot use cases. Similarly, a few full stack startups will prosper in specialized use cases like drug discovery. Due to their large fundraising rounds, uniforms markets, and quick adoption of innovation by people, businesses, and governments, US startups will dominate. But, the majority of startups will go home empty-handed having contributed to the profits of the providers of the picks and shovels of this gold rush —predominantly American BigTech.

This is the first in a series of posts about who will make money from Generative AI. In subsequent posts I will discuss which organizations will benefit the most from GenAI, as well as which countries and citizens will benefit the most from this technology.

I welcome your feedback.

Simon Greenman is a pioneer in artificial intelligence and technology innovation. As co-founder of MapQuest, he helped launch one of the first internet and AI brands. Currently a Partner at Best Practice AI advising on AI strategy, technology, and governance, he recently served on the World Economic Forum's Global AI Council, contributing to their Board and C-Suite AI toolkits. Simon has spent over a decade as Chief Digital Officer leading digital transformations of directory companies and was CEO of HomeAdvisor Europe that offers leading marketplaces for tradespersons. He worked with prominent companies like Bowers & Wilkins, AOL, and Accenture. He is active in the UK start-up ecosystem and holds an MBA from Harvard Business School as well as a BA in Computing and Artificial Intelligence from the University of Sussex. He is a Fellow of the Royal Geographic Society.

Original. Reposted with permission.

How the LDMs in knowledge graphs can complement LLMs

Large language models (LLMs) fit parameters (features in data topography) to a particular dataset, such as text scraped off the web and conformed to a training set.

Logical data models (LDMs), by contrast, model what becomes shared within entire systems. They bring together the data in a system with the help of various kinds of logic. As such, they are a primary means of managing the data in a system, the connecting glue that brings the system together and makes the data foundation reusable as a whole.

Knowledge graphs provide an any-to-any means of connecting and articulating the meaning of data in a system. At the heart of each knowledge graph is an extensible graph data model called an ontology. The ontology contextualizes the disparate instance data brought into the graph. Ideally, it makes all data in the graph findable, accessible, interoperable, and reusable (FAIR).

Once connected and contextualized, the data in the system becomes a holistic search, data management and decision making resource, rather than just a collection of stranded datasets.

Ontologies in graphs can be used directly: Conceptual, logical and physical are all created at the same time. And given the right design, they can be reliably generated.

In that sense, a knowledge graph is the data fabric. It is the data mesh. It is the key means businesses can use to transform information systems, by harnessing the knowledge power of ontologies within the context of extensible, articulated, graph-connected data.

Data modeling and management at scale versus data sprawl

Businesses have a choice. They can transform their systems using a knowledge graph approach, a feasible means of systems transformation at scale by modeling the essence of what they do and how so that it’s available to both humans and machines. Or they can continue to add to the data sprawl they already have.

Unnecessary complexity gets in the way of transformation, so it’s important to consider how to reduce complexity rather than add to it. Dave McComb, author of Software Wasteland, in a Business Rules Community interview : “Starting with a single, simple core data model is the best prescription to reducing overall complexity. Most application code — and therefore most complexity — comes from the complexity of the schema the code is written to.

How’s the model useful? Later in this interview, McComb says, “I often ask professionals to imagine what a data lake would be like if they could build transactional applications directly on, and write to, the data lake. This is essentially the goal of a data-centric architecture: to be able to read and write to a single, simple, shared data model, with all the requisite data integrity and security required in enterprise applications.”

How shared data models are evolving in the public sector

The Dataverse Project began in 2006 with a common repository framework that allowed researchers to “share, find, cite, and preserve research data. The Dataverse is for all, including individual researchers who need to make their datasets accessible to others “

The Project’s mandate is broader today: To provide a global, open source data management and sharing solution for researchers. Researchers can set up their own Dataverses and federate their data with others. Ideally, they will all be able to search across Dataverses. The Project has over 100 installations worldwide.

It’s interesting to see how the Project’s methods are evolving along with its mandates. Slava Tykhonov, a researcher at Data Archiving and Networking Services (DANS-KNAW) in the Netherlands, presented in August 2023 on his efforts in collaboration with Jim Myers, who’s a senior developer and architect for the Global Dataverse Community Consortium.

DANS’ Dataverse installation has what it calls Data Stations, which are operational and planned data services based on Dataverse technology. The data stations Tykhonov mentioned include the DataverseNL, Data Vault, Social Sciences, Humanities, Archaeology, and Generic. DANS is using international projects to extend the technology, make it more robust and interoperable.

Tykhonov noted that earlier attempts at interoperability involved mapping across conventional metadata schemas, which worked to some extent. But now the goal is to add Linked Data/semantic web technology and global knowledge graphs, with Dataverse instances conceived as nodes in an interlinked data network.

Controlled vocabularies and the interconnecting logic of knowledge graphs

More specifically, Tyknonov and Myers are working on providing external controlled vocabulary (CV) support to Dataverses. Researchers in individual domains have been modelling their own domains, and each CV can be considered a subgraph designed to be connectable to other subgraphs, whether instance or logical model data (descriptions, declarations and rules).

The external controlled vocabulary support Tykhonov refers to is key to adhering to findable, accessible, interoperable and reusable (FAIR) data principles for Dataverse data management purposes. The CV support was first developed and later written up as a part of the Horizon 2020 project funded by the SSHOC EU. This diagram provides an overview of the SKOSMOS plug-in CV support architecture:

Myers and Tykhonov, “A Plug-in Approach to Controlled Vocabulary Support in Dataverse,” Dataverse Project Community (Zenodo) site, August 2023.

So how does the SKOSMOS help with interoperability? The logic of scientific research domains is captured in these various interrelated controlled vocabularies. In other words, scientists have described how domains and the terms that describe them work together. And Tykhonov and Myers are linking those CVs to the previous metadata schemas to scale the interconnection and discovery process with description logic.

Controlled vocabularies are just one variety of logical modeling that can be used to federate the likes of graph-enabled repositories. They are on the humble side of strong semantics and interoperability, but are certainly a necessary and viable starting point. The diagram below puts controlled vocabularies, thesauri, etc. within the context of other complementary logical semantic modeling methods that can help with interoperability.

Building context and reasoning capability into semantic metadata

Using CV metadata as a “supervisor” within an LLM context

As I understand it, Tykhonov has been interacting with Meta’s Llama (an LLM) to give Llama a way to retrieve answers accurately from the Dataverse network to questions that aren’t in the Llama dataset with the help of SPARQL (a standard knowledge graph query language) queries.

This is similar to what Denny Vrandečić of the Wikimedia Foundation spoke about at the Knowledge Graph Conference earlier this year–i.e., augmenting an LLM’s capabilities with a knowledge graph’s capabilities, as well as the capabilities of other tooling. LLMs, Tykhonov points out, can be prompted to answer in JSON-LD. Knowledge graphs can ingest JSON-LD directly.

The evident promise is a multi-capability feedback loop could cure some of the major ills that LLMs have as a standalone resource. Vrandečić also mentioned that LLMs are far too expensive for questions that could be easily asked via knowledge graphs.

Cost factors are first among the many issues that we’ll have to address by stepping back and evaluating how whole systems should evolve in light of LLM advances.

References

Tykhonov, Vyacheslav. (2023, August 22). Knowledge Graphs and Semantic Search in Dataverse. Zenodo.

James D. Myers, & Vyacheslav Tykhonov. (2023). A Plug-in Approach to Controlled Vocabulary Support in Dataverse.

Meta releases a data set to probe computer vision models for biases

Meta releases a data set to probe computer vision models for biases Kyle Wiggers 8 hours

Continuing on its open source tear, Meta today released a new AI benchmark, FACET, designed to evaluate the “fairness” of AI models that classify and detect things in photos and videos, including people.

Made up of 32,000 images containing 50,000 people labeled by human annotators, FACET — a tortured acronym for “FAirness in Computer Vision EvaluaTion” — accounts for classes related to occupations and activities like “basketball player,” “disc jockey” and “doctor” in addition to demographic and physical attributes, allowing for what Meta describes as “deep” evaluations of biases against those classes.

“By releasing FACET, our goal is to enable researchers and practitioners to perform similar benchmarking to better understand the disparities present in their own models and monitor the impact of mitigations put in place to address fairness concerns,” Meta wrote in a blog post shared with TechCrunch. “We encourage researchers to use FACET to benchmark fairness across other vision and multimodal tasks.”

Certainly, benchmarks to probe for biases in computer vision algorithms aren’t new. Meta itself released one several years ago to surface age, gender and skin tone discrimination in both computer vision and audio machine learning models. And a number of studies have been conducted on computer vision models to determine whether they’re biased against certain demographic groups. (Spoiler alert: they usually are.)

Then, there’s the fact that Meta doesn’t have the best track record when it comes to responsible AI.

Late last year, Meta was forced to pull an AI demo after it wrote racist and inaccurate scientific literature. Reports have characterized the company’s AI ethics team as largely toothless and the anti-AI-bias tools it’s released as “completely insufficient.” Meanwhile, academics have accused Meta of exacerbating socioeconomic inequalities in its ad-serving algorithms and of showing a bias against Black users in its automated moderation systems.

But Meta claims FACET is more thorough any of the computer vision bias benchmarks that came before it — able to answer questions like “Are models better at classifying people as skateboarders when their perceived gender presentation has more stereotypically male attributes?” and “Are any biases magnified when the person has coily hair compared to straight hair?”

To create FACET, Meta had the aforementioned annotators label each of the 32,000 images for demographic attributes (e.g. the pictured person’s perceived gender presentation and age group), additional physical attributes (e.g. skin tone, lighting, tattoos, headwear and eyewear, hairstyle and facial hair, etc. ) and classes. They combined these labels with other labels for people, hair and clothing taken from Segment Anything 1 Billion, a Meta-designed data set for training computer vision models to “segment,” or isolate, objects and animals from images.

The images from FACET were sourced from Segment Anything 1 Billion, Meta tells me, which in turn were purchase from a “photo provider.” But it’s unclear whether the people pictured in them were made aware that the pictures would be used for this purpose. And — at least in the blog post — it’s not clear how Meta recruited the annotator teams, and what wages they were paid.

Historically and even today, many of the annotators employed to label data sets for AI training and benchmarking come from developing countries and have incomes far below the U.S.’ minimum wage. Just this week, the Washington Post reported that Scale AI, one of the largest and best-funded annotation firms, has paid workers at extremely low rates, routinely delayed or withheld payments and provided few channels for workers to seek recourse.

In a white paper describing how FACET came together, Meta says that the annotators were “trained experts” sourced from “several geographic regions” including North America (United States), Latin American (Colombia), Middle East (Egypt), Africa (Kenya), Southeast Asia (Philippines) and East Asia (Taiwan). Meta used a “proprietary annotation platform” from a third-party vendor, it says, and were compensated “with an hour wage set per country.”

Setting aside FACET’s potentially problematic origins, Meta says that the benchmark can be used to probe classification, detection, “instance segmentation” and “visual grounding” models across different demographic attributes.

As a test case, Meta applied FACET to its own DINOv2 computer vision algorithm, which as of this week is available for commercial use. FACET uncovered several biases in DINOv2, Meta says, including a bias against people with certain gender presentations and a likelihood to stereotypically identify pictures of women as “nurses.”

“The preparation of DINOv2’s pre-training dataset may have inadvertently replicated the biases of the reference datasets selected for curation,” Meta wrote in the blog post. “We plan to address these potential shortcomings in future work and believe that image-based curation could also help avoid the perpetuation of potential biases arising from the use of search engines or text supervision.”

No benchmark is perfect. And Meta, to its credit, acknowledges that FACET might not sufficiently capture real-world concepts and demographic groups. It also notes that many depictions of professions in the data set might’ve changed since FACET was created. For example, most doctors and nurses in FACET, photographed during the COVID-19 pandemic, are wearing more personal protective equipment than they would’ve before the health crises.

“At this time we do not plan to have updates for this data set,” Meta writes in the whitepaper. “We will allow users to flag any images that may be objectionable content, and remove objectionable content if found.”

In addition to the data set itself, Meta has made available a web-based data set explorer tool. To use it and the data set, developers must agree not to train computer vision models on FACET — only evaluate, test and benchmark them.

OpenAI’s ChatGPT Enterprise Focuses on Security, Scalability, and Customization

OpenAI's ChatGPT has been making waves in the business world, and the recent launch of ChatGPT Enterprise stands as a testimony to its rising prominence. Boasting of enhanced features like enterprise-grade security, unlimited GPT-4 access, longer context windows, and a slew of customization options, ChatGPT Enterprise promises to be an all-in-one AI assistant for modern businesses. With its expansive capabilities, it aims to redefine the role of AI in corporate settings by assisting in diverse operational tasks.

Within just nine months of its original launch, ChatGPT has been adopted by 80% of Fortune 500 companies, signifying its rapidly growing influence. Corporate heavyweights such as Block, Canva, Carlyle, Estée Lauder, PwC, and Zapier have been testing ChatGPT Enterprise for a range of functions. From crafting clearer communications to accelerating coding tasks and aiding creative work, the AI's applicability seems nearly limitless.

“With the integration of ChatGPT Enterprise, we're aimed at achieving a new level of employee empowerment,” said Sebastian Siemiatkowski, CEO at Klarna.

Security and Privacy: A Prime Focus

One of the cornerstone features of ChatGPT Enterprise is its commitment to data security and privacy. The platform is SOC 2 compliant and ensures that all conversations are encrypted both in transit and at rest. The newly added admin console allows seamless team management while offering domain verification, SSO, and usage insights—critical components for large-scale enterprise deployment.

Performance and Scalability

The Enterprise edition of ChatGPT brings a host of performance improvements, including up to twice the speed of the regular version and a 32k token context window for processing inputs that are four times longer. One of the standout features is its advanced data analysis capabilities, formerly known as Code Interpreter, which caters to a variety of professionals—from financial analysts to data scientists.

“ChatGPT Enterprise has cut down research time by an average of an hour per day, increasing productivity for people on our team,” noted Jorge Zuniga, Head of Data Systems and Integrations at Asana.

Customization and Collaboration

ChatGPT Enterprise goes a step further by offering shared chat templates for companies to build common workflows. Organizations looking to mold the AI tool according to their unique needs can also leverage free API credits included in the pricing.

Future Roadmap

OpenAI has revealed plans to enhance ChatGPT Enterprise even further, with impending features like secure customization through integration with existing company data, availability for smaller teams, and specialized tools for roles like data analysts and marketers.

ChatGPT Enterprise comes as a comprehensive solution aimed at increasing productivity while maintaining high security and privacy standards. Its wide range of features and customization options make it a versatile tool for various business applications.

“It’s become a true enabler of productivity, with the dependable security and data privacy controls we need,” said Danny Wu, Head of AI Products at Canva.

With its continued innovation and user-focused approach, ChatGPT Enterprise appears well-positioned to transform the AI application landscape in the corporate world.

Middle East’s AI Ambitions Face Hurdles as US Chip Restrictions Extend

Nvidia in a regulatory filing recently revealed that the U.S. has expanded restrictions on the exports of its chips beyond China, to other regions including some countries in the Middle East. AMD is also said to have received a similar notification.

This comes on the heels of reports that have revealed the procurement of a minimum of 3,000 H100s by Saudi Arabia. These chips, specifically designed to aid the development of generative AI models were obtained through a public research institution—King Abdullah University of Science and Technology.

Similarly, the UAE has gained access to a substantial number of Nvidia chips and has already developed Falcon—its own open-source large language model. This achievement occurred at the state-owned Technology Innovation Institute in Masdar City, Abu Dhabi.

Engineers, researchers, and American chip maker Cerebras also recently came together to create an Arabic language model—Jais for generative AI applications. The new model contains 13 billion parameters and was developed from a vast dataset that blends Arabic and English, including codes.

There has been a demand for thousands of NVIDIA chips from Saudi and UAE. The Gulf nations have publicly declared their aim to become AI leaders as they pursue ambitious strategies to energise their economies. The restrictions seem to come from a place to stifle these ambitions which seem to have triggered concerns about potential misuse by autocratic leaders in these wealthy oil states.

Previously, restrictions were imposed on the supply of NVIDIA’s H100. However, the company found a way around it by introducing the A800.

The post Middle East’s AI Ambitions Face Hurdles as US Chip Restrictions Extend appeared first on Analytics India Magazine.

4 data compliance standards to know for 2023

concept of auditing and evaluating quality and efficiency of personnel ,business document evaluation process ,inspection of business finance tax documents ,Data analysis reports growth results

Data is crucial in most industries today. As the amount of business information grows, so do the standards for people’s protection of their personal information. With advanced cyberattacks, security compliance frameworks and cybersecurity have become essential fields to ensure data is collected, organized, stored, and managed in a safe way. This article will start by explaining what information security compliance means. Then, it will cover different data compliance standards and discuss some of the challenges.

Key Takeaways:

Data compliance is a formal regulatory structure that governments have in place to ensure that data and digital assets are collected, organized, stored, and managed in a safe way.
Not adhering to data compliance standards and regulations can lead to various penalties such as hefty fines, legal recourse, and reputational damage.
A couple of the most common data regulations governing specific industries include HIPAA, PCI DSS, GDPR, and CCPA.

4 data compliance standards to know for 2023

Source: Pexels

What is data compliance?

Data compliance standards contain rules, recommendations, and optimal procedures that organizations must follow while managing data. Following these standards is pivotal to safeguarding data privacy, mitigating information breaches, and upholding the trust between organizations and their clients. Organizations often follow many standards based on their industry, geographic location, and the types of data they handle. Achieving and maintaining compliance can be complex and involve implementing various technical, organizational, and procedural measures.

What is SOC 2 compliance?

SOC 2 compliance standards, created by the American Institute of Certified Public Accountants (AICPA), is a security standard for service organizations to follow in handling customer data. It’s guided by key Trust Services Criteria: security, availability, processing integrity, confidentiality, and privacy. Each organization shapes its SOC 2 report to fit its needs. They make controls that match the trust service principles that apply to them. These reports give information about data management to customers, partners, and stakeholders. It helps everyone understand how the organization operates in terms of information security best practices.

Why is data compliance important?

Not following data protection security frameworks can result in hefty fines and legal troubles. Organizations must follow the relevant standards to avoid these problems and have a good reputation to sign customers and break into new markets. Good data compliance keeps sensitive information safe from unauthorized access or changes. This stops information breaches, which can be very expensive. Following information protection practices shows respect for the consumers’ privacy and builds trust.

When organizations focus on information security compliance, they show they care about security, and about their clients. This can make them stand out and attract customers who request this level of data security compliance. Strict data compliance regulations help organizations manage information better, find issues early, and lower the risks of breaches.

Data compliance vs. data security

Data security is how organizations protect their digital information from cyberattacks from outsiders. Examples of tools for this include antivirus software, firewalls, multi-factor authentication (MFA), and monitoring network security.

Security and compliance are related, but they have differences.

Organizations choose their security tools, while regulations from legislators and agencies shape compliance standards. Information security is part of data compliance, but having a security system doesn’t guarantee data compliance. Organizations can even create stricter security than required by data compliance to fully cover their needs for data security.

Data compliance is a bigger concept than information security. While security focuses on hacking, compliance focuses on your information security practices being in line with relevant security standards or regulations.

Four key data compliance standards

Data compliance standards are external regulations from regulating bodies for safeguarding data. Various data types need different levels of protection under diverse regulations. Now, let’s explore four of the main information compliance standards.

Health Insurance Portability and Accountability Act

Health Insurance Portability and Accountability Act (HIPAA), a significant U.S. legislation, serves as a protective shield for sensitive health information of patients and members of health plans. Its primary goal is to ensure that an individual’s protected health information (PHI) remains confidential and cannot be disclosed without their knowledge or explicit consent.

HIPAA compliance rules cover healthcare information for patients dealing with hospitals, insurers, or anyone involved in healthcare. This information must stay protected in databases, devices, servers, and during transmission. Modern compliant technology ensures secure usage, reducing reliance on older methods like encrypted email.

HIPAA compliance places specific obligations on healthcare organizations beyond mere protection. Alongside safeguarding patient privacy, HIPAA mandates measures to counter healthcare fraud. This is crucial for maintaining the integrity of healthcare systems. Moreover, the law emphasizes the need for timely and transparent communication with patients in case of security breaches involving their information.

By enforcing these requirements, HIPAA sets a standard reinforcing patient trust in the healthcare industry. It promotes responsible information management and aligns with the evolving landscape of digital health information.

Payment card industry data security standard

Payment Card Industry Data Security Standard (PCI DSS) compliance is about securing credit card payments. It ensures that your payment details are kept safe when you buy something. Unlike HIPAA, which focuses on healthcare data, PCI DSS protects payment information when you pay with a credit card. This includes card numbers, names, addresses, and more. While HIPAA is overseen by governments, PCI is created by big credit card companies like Visa, Mastercard, and American Express. They enforce it by punishing stores or payment processors that don’t follow the regulations. Penalties can be fines for each violation or even stopping a store’s ability to accept credit cards.

The PCI Security Standards Council (SSC) offers comprehensive standards and supporting resources to strengthen payment card information security. These contain specification frameworks, tools, measurements, and support materials that aid organizations in maintaining the security of cardholder information while transmitting the information. Central to the council’s efforts, the PCI DSS forms the foundation. It outlines the essential structure for creating comprehensive systems and procedures for safeguarding payment card data. This encompasses measures for prevention, detection, and response to security incidents.

General data protection regulation

General Data Protection Regulation (GDPR) is renowned as one of the world’s most stringent information privacy regulations. It has authority over the entire European Union and several other participating nations. GDPR mandates strict controls for user data by businesses. This encompasses reporting on proper information usage (restricted to genuine business needs), ensuring accessible avenues for users to access and delete their information, and obtaining documented consent whenever users request information. This requirement necessitates businesses to provide detailed justifications to customers for information collection, whether for analytics, recurring payments, email marketing, or other scenarios.

Organizations must implement “appropriate technical and organizational measures” to protect personal information. This entails employing suitable safeguards to prevent unauthorized access or breaches. GDPR places limitations on how information can be processed. It sets strict rules governing individual consent for information processing activities. Permission must be obtained to clearly understand how data will be used.

Under specific conditions, organizations must appoint a “Data Protection Officer (DPO)” who oversees data protection activities and ensures compliance. GDPR emphasizes individual rights to information privacy. This includes the right for individuals to access their own data held by organizations and the ability to request the deletion of their information when appropriate. These measures contribute to the comprehensive framework of GDPR compliance, aiming to uphold individual privacy and information security.

California Consumer Privacy Act

California Consumer Privacy Act (CCPA) draws several parallels with GDPR. Targeting California residents and entities operating within the state, CCPA outlines personal information as commonplace details like names, addresses, phone numbers, email addresses, and similar information. CCPA mandates that businesses establish mechanisms for consumers to opt-out, enabling them to halt engagement or stop the sale of their information to third parties.

A notable difference between GDPR and CCPA (as well as many U.S. regulations) lies in their approach. GDPR operates on an “opt-in” basis, necessitating companies to secure consent before any data-related activities. CCPA follows an “opt-out” framework, allowing businesses to use consumer information until instructed otherwise.

The CCPA allows California consumers the possibility to:

Be informed about the information organizations possess about them
Request the removal of their personal information
Decline the accumulation and use of their personal information
The CCPA further ensures that businesses cannot discriminate against consumers who enact their CCPA rights

How to ensure data compliance within your organization

Here are five effective practices you can implement now to ensure data compliance:

Know the legal landscape

To follow the law, you first need to know it. Stay informed on laws and security standards that apply to your industry, including all places you work or have customers. Stay focused on the rules that matter to your organization. Understand what you must do to meet them and stay informed and prepared.

Train your team

You can’t assume your staff knows all about data regulations without providing them with proper training. Train your team and refresh their knowledge of the legal and regulatory information compliance rules. Focus your training on data breaches and cybersecurity risks as well as how to prevent and report on these matters. Hackers often get access to information by tricking people. So, make sure your team knows how to protect against that.

Keep track of your data practices

How well an organization follows the regulations is shown by its evidence of following them. Make sure you can prove how you kept your information safe. Keep track of all your data practices. When new rules arise, update your policies and update your team again. Suitable proof and preparation keep your compliance strong.

Check vendors who can reach your data

When collaborating with vendors who access your company’s data, you must ensure they follow the same laws and rules as your organization. Assess the information security and protective measures of each vendor, especially with data access privileges.

Use new technologies

Modern technology is crucial in setting up information security compliance for your whole organization. It lets you check your data, find risks, and ensure you follow the rules. Using technology is a great way to identify gaps in your security and compliance that may have gone unnoticed.

Challenges of data compliance

Businesses navigate a complex network of regulations for data storage and processing. The consequences of failing to follow data requirements are steep, encompassing penalties like large fines based on global revenue, and, in severe cases, legal consequences such as imprisonment under federal regulations like HIPAA.

While businesses are driven to adhere to information security regulations, the journey could be more complex. A key hurdle lies in pinpointing data subject to compliance rules. Businesses must grasp information collection, storage locations, authorized access, and retention periods.

Beyond the technical complexities, businesses confront a shifting data compliance landscape. In 2023, California strengthened its data privacy regulations with the new updates to the California Privacy Rights Act (CPRA). Furthermore, states like Colorado, Connecticut, and Utah introduced new privacy laws slated for enforcement this year.

Conclusion

In today’s data-driven world, adhering to data compliance standards is essential. These guidelines provide a crucial framework for businesses to safeguard sensitive information and maintain customer trust. From healthcare to finance, diverse industries are bound by regulations that demand responsible information handling.

Organizations can navigate the complex landscape of data compliance by understanding the requirements, providing staff training, and using modern automation technology. Ensuring information security, protecting privacy, and upholding ethical practices are legal obligations and vital for preserving reputation and customer loyalty.

As regulations evolve, staying informed about changes and adapting practices is critical. The interconnected nature of our digital ecosystem calls for constant vigilance, as inadvertent breaches can occur. Commitment to data compliance is an investment in both long-term success and the well-being of individuals whose information we handle. By embracing these standards, businesses can thrive in a data-driven era while respecting individual rights and contributing to a more secure and trustworthy digital environment.

ChatGPT Isn’t Made For European Union

Data privacy watchdogs from Europe to the United States have been trying to drag OpenAI, the company behind ChatGPT, to court since it was released in November 2022.

Yesterday, the company’s lawyers hit back asking a San Francisco federal court to dismiss most of the lawsuits against the company, which allege that AI outputs infringe on copyright. OpenAI is not only struggling to live the American Dream but it seems to be having a hard time trying to woo the European Union.

While OpenAI was busy making a case for itself in front of the US court, the company got slapped with an accusation questioning its ability to comply with EU laws. The 17-page document filed with the Polish DPA is put together by Lukasz Olejnik, a security and privacy researcher, at a Warsaw-based law firm, GP Partners.

Earlier this year, Italy’s privacy watchdog, the Garante ordered OpenAI to stop processing data locally — directing it to tackle a preliminary set of issues identified regarding lawful basis, information disclosure, user controls and child safety. Eventually, ChatGPT resumed its service fairly quickly after the company tweaked its presentation.

While the Italian data processing agreement investigation continues, other slews of investigation of the General Data Protection Regulation (GDPR) have been lodged in the direction of ChatGPT. Interestingly, the GDPR is the world’s strictest data protection regime, and it has been copied widely around the world.

Earlier in April, the bloc’s data protection authorities joined hands to collectively approach regulating the fast-developing technology. The European Data Protection Board, the umbrella organisation for data protection authorities, announced plans to set up an EU-wide task force to coordinate investigations and enforcement.

Further in May, the EU’s planned legislation was one of the first to legislate on AI, which Altman said was “over-regulating”. But he backtracked after wide-spread coverage of his comments mounted. “We are excited to continue to operate here and of course have no plans to leave,” he later tweeted.

Rise of regional GPT

While the boss of OpenAI Sam Altman refuses to leave Europe, the local government and technologists are not too keen to stay dependent on Silicon Valley’s novel AI technology.

Fed up of regulating American tech goliaths from 8,000 kilometres away, the European AI circle is rooting for local startups to build their technology similar to OpenAI’s. The most recalled names in the AI community include the French startup Mistral, which has managed to raise $100 million without releasing any products. Another famous startup in the market is Aleph Alpha founded by Jonas Andrulis, a former member of the AI team at Apple. Notably, the startup already sells generative AI as a service to 10,000+ paying customers across companies and governments.

Many Europeans remain adamant that they need a contender to counteract America’s dominance, and not simply for economic reasons. The local AI community has put forth the point that European companies are likely to be more sensitive towards data privacy and discrimination than their competitors in the US. For instance, Aleph Alpha is making sure European languages are not excluded from AI developments.

Sceptical views have surfaced in the industry questioning whether the startup has the potential to compete in the same league as giants like Google and OpenAI. But many are hoping that Aleph Alpha can give tough competition to Silicon Valley in what some believe will be an era-defining technology.

While OpenAI is requesting the US courts to let go of its practices, the company remains exposed to regulatory risk in this area across the EU. The Altman-run company could face outreach from DPAs acting on complaints from European individuals. Meanwhile, confirmed violations of the GDPR can result in penalties as high as 4% of global annual turnover.

The tussle over data privacy and security issues between the bloc and the US is long overdue. The DPAs’ corrective orders may end up reworking how technologies function if they wish to continue operating inside the bloc. While the pressure is mounting on regulators as well as OpenAI, it remains to be seen what compliance conclusions may emerge once that assessment has been completed in the EU as well as globally.

The post ChatGPT Isn’t Made For European Union appeared first on Analytics India Magazine.