India Moves to Regulate AI Training by Google, OpenAI

India has moved closer to creating a formal royalty regime for artificial intelligence (AI) developers, with a government committee recommending a blanket licensing system for training models on copyrighted work.

The Department for Promotion of Industry and Internal Trade (DPIIT) has released a working paper on December 8 to that effect. The proposal marks a significant policy move on generative AI and copyright protection.

The eight-member committee, chaired by Himani Pande, an additional secretary at DPIIT, was tasked with assessing the adequacy of India’s copyright law in addressing AI-driven use of creative works.

The committee sides firmly with creator remuneration and rejects the push by tech firms for unrestricted text and data mining (TDM).

Safeguards Against Exploitation

The paper noted that India’s creative economy, spanning films, arts, music, digital content and informal folk traditions, contributes significantly to GDP and livelihoods and therefore requires strong safeguards against unremunerated AI exploitation of human-created works.

India is among the world’s fastest-growing AI markets and a major consumer base for generative AI systems.

The paper noted that the tech sector, led by industry body Nasscom, supported an opt-out rights framework. The Business Software Alliance, whose members include Google, Microsoft, Amazon Web Services, IBM, Salesforce and OpenAI, also argued for explicit TDM exceptions with opt-out provisions.

The proposal directly challenges the data practices of companies such as Google and OpenAI, which rely heavily on large-scale scraping of online material to train their models.

Google and OpenAI did not reply to emails sent by AIM seeking comment.

Content industry bodies such as broadcasters, music labels and creator organisations opposed the opt-out approach, arguing that it disproportionately benefits large platforms.

In a LinkedIn post, Kriti Sharma, director (regulatory, legal and compliance, India and Southeast Asia) at Dun and Bradstreet, said the proposal reflects an attempt to reconcile competing realities. “It feels like India is trying to reconcile two truths. AI needs data to grow. Creators need protection to survive,” Sharma said. She added that the paper approaches the question “thoughtfully, boldly, and practically.”

Rakesh Umarani, partner at Vidyam Legal, said the hybrid model offers a balanced pathway. “A clear licensing and royalty framework could prevent future disputes and give creators real confidence while allowing AI development to continue responsibly,” Umarani said.

After examining global developments and contrasting stakeholder submissions, the committee rejected a blanket text-and-data mining (TDM) exception, widely supported by technology firms, stating that allowing such an exception “would undermine copyright” and “leave human creators powerless to seek compensation.”

Hybrid Licensing System Proposed

The committee recommended a hybrid system allowing AI developers to train on “all lawfully accessed copyrighted works” as a matter of right, but with mandatory royalty sharing through a government-designated non-profit collective formed by rights holders.

Under this framework, creators will not have the option to withhold their works from AI training but will receive statutory remuneration.

Royalties would be administered through a single-window mechanism, the Copyright Royalties Collective for AI Training (CRCAT), with a government-appointed committee determining rate structures. Even non-members would be eligible for payments upon registering their works.

The proposal aims to reduce transaction costs, provide legal certainty, widen access to copyrighted datasets, mitigate AI bias and hallucinations, and create a level playing field between large platforms and startups.

Burden of Proof

Under the proposed framework, if a copyright owner alleges that their work was used to train an AI system without paying royalties, the law will presume the claim is valid unless the developer can prove otherwise.

Even though the blanket licence system gives developers little reason to hide training data, disputes may arise, for example, when a developer claims that only proprietary or separately licensed data was used.

In such cases, the burden of proof shifts to the AI developer, who must demonstrate that no third-party copyrighted material was used. Until they do, the presumption favours the copyright holder, it stated.

The working paper has now entered a 30-day public consultation phase.

The post India Moves to Regulate AI Training by Google, OpenAI appeared first on Analytics India Magazine.

India Moves to Regulate AI Training by Google, OpenAI

Safeguards Against Exploitation

Hybrid Licensing System Proposed

Burden of Proof

Latest stories

CMS Uses Machine Learning to Fully Reconstruct LHC Collisions

LANL: AI Accelerates Elucidation of Nuclear Forces with Explosive Neutron...

PNNL: Integrating AI into Biological Research

Rick Stevens on the Genesis Mission and the Future of...

Inside the DOE’s 26 AI Challenges for Genesis Mission

You might also like...

CMS Uses Machine Learning to Fully Reconstruct LHC Collisions

LANL: AI Accelerates Elucidation of Nuclear Forces with Explosive Neutron Star Data

PNNL: Integrating AI into Biological Research