AI’s Copyright Reckoning: Can India Balance Creators And Code?

With India just a month away from announcing its first homegrown large language model (LLM), as confirmed recently by IndiaAI Mission CEO Abhishek Singh in an exclusive interview with AIM, there’s an urgent need for a legal framework that balances the interests of AI developers and content owners.

Global disputes over AI using copyrighted content for training are intensifying. Publishers Hachette and Cengage have sought to intervene in a class action against Google, alleging mass copying of books and textbooks to train AI systems. In December 2025, journalist John Carreyrou sued major AI firms for unauthorised use of copyrighted books. Earlier the same year, Disney and Universal sued the AI image generator Midjourney over the use of their intellectual property.

Copyright concerns extend beyond publishers and authors. Last month, US recording artist Jerry Anders filed a lawsuit against Stability AI and AudioSparx, alleging his music was used to train the Stable Audio AI model without permission, despite his requests to opt out. He is seeking statutory damages and an injunction to prevent further use. Meanwhile, Oscar-winning actor Matthew McConaughey has trademarked his image and voice to protect them from unauthorised AI use—the first such case by an actor.

India is grappling with similar disputes. In 2024, news agency ANI sued OpenAI, alleging ChatGPT illegally scraped both freely available and paywalled copyrighted content from its website. Last year, the Federation of Indian Publishers filed a lawsuit against OpenAI, claiming its copyrighted books were used for training and summarisation, mirroring ANI’s concerns.

Indian actors Salman Khan, Abhishek Bachchan, Aishwarya Rai, and Nagarjuna have all filed cases to protect their personality rights in light of AI companies using their characters and voices to develop their models.

These cases mark a pivotal moment in efforts by artists, publishers, and content creators to prevent AI firms from using their work without consent or compensation.

Training LLMs and Copyright

LLMs are AI systems designed to understand and generate human-like text. They are trained on vast datasets, including books, websites, articles, and other written material. This breadth of data allows models to learn grammar, facts, context, and some reasoning.

During training, text is tokenised and converted into numerical representations. The model learns by predicting the next token in billions of examples. Essentially, LLMs learn patterns, not documents.

Abhishek Upperwal, CEO of artificial general intelligence-focused Soket.AI, explains how AI datasets are typically built. “We get data from the internet, from wherever it is available. We also extract content from copyright-free books and audio-visual materials.”

This common industry practice puts AI companies on a collision course with creators and publishers intent on protecting their materials. In August 2025, Anthropic reached the first major settlement in an AI copyright dispute, agreeing to pay $1.5 billion to a class of authors who alleged the company had pirated millions of books.

Experts note that AI developers cannot guarantee all content is copyright-free. “AI systems are powerful because they learn from the world, and the world is copyrighted. Developers need governed data supply chains that combine licensed content, structured publisher partnerships, public-domain material, and carefully applied synthetic data,” Sanchit Vir Gogia, founder and CEO of Greyhound Research, tells AIM. He adds that reducing memorisation and output leakage is crucial, as these are the sources of most lawsuits.

Nikhil Pahwa, founder of MediaNama, also highlights how AI training on copyrighted content affects revenue. “Retrieval-augmented generation (RAG) models scrape copyrighted content and reproduce it in other forms, often shrinking the monetisation window that news publishers have. It is theft. Copyright must be respected, or creators’ business models will collapse, leaving AI as the only beneficiary.”

This ongoing conflict underscores the need for AI-specific copyright laws.

India vs the World

India does not yet have a binding AI law, but has issued guidelines, advisories, and policy frameworks shaping AI governance. Other countries have also moved to address AI copyright issues: the EU, UK, Japan, and Singapore introduced text and data mining exemptions to support AI training. While the EU combines risk-based AI rules with copyright opt-outs, Japan permits broad AI training without licences, and China uses layered AI and content controls. South Korea has passed landmark AI laws requiring human oversight for “high-impact” uses, including healthcare, transport, nuclear safety and financial services such as credit and loan decisions.

The US, despite witnessing the most AI copyright lawsuits, has no specific law and relies on courts and fair use.

In December 2025, DPIIT (Department for Promotion of Industry and Internal Trade) released a working paper proposing a dedicated legal framework for copyrighted content in AI training. The proposal recommends a mandatory blanket licensing system, allowing developers to use any lawfully accessed copyrighted material for training without seeking individual permissions. Creators would receive statutory royalties once the AI system is commercialised. A government-designated non-profit, the Copyright Royalties Collective for AI Training (CRCAT), would handle collection and distribution.

The proposal has drawn mixed reactions. “The framework is extremely balanced for both AI developers and content creators. Revenue sharing is ideal,” IndiaAI Mission CEO had earlier told AIM. “My concern is implementation—deciding who gets how much will be challenging. That said, since this is still a proposal, there is room for refinement.”

The IT Ministry also released guidelines requiring AI platforms to label AI-generated content, prevent bias, and ensure accountability. It noted that copyright laws may need to be amended to enable large-scale AI training while protecting copyright holders.

A recent EY report, AI Governance Guidelines: A Bet on Innovation, stresses: “How existing regulations—from consumer protection to penal codes—are interpreted in the context of rapidly evolving AI applications will be the true test. Expert-informed, interdisciplinary evaluation is essential to provide remedies and develop robust Indian AI jurisprudence.”

The need is for a law that can do the balancing act—let innovation thrive alongside fair practice. “It should cover pre-training, fine-tuning, retrieval systems and continuous learning, leaving no grey areas. Disclosure must be meaningful, focusing on source categories, licensing approaches and safeguards—not long dataset lists. Crucially, training rules must be separated from output disputes,” Gogia says.

As global courts, regulators, and creators grapple with the disruptive implications of generative AI, the battle over training data has become a defining fault line in AI governance.

Criticising tech giants lobbying for exemptions in copyright laws, Pahwa reacts, “Big Tech AI firms are currently lobbying extensively for a text and data mining exemption. Why should it be an opt-out anyway? The fact is that many of them have trained their models on pirated content off Sci-Hub and Libgen. The DPIIT Committee’s suggestion to introduce a compulsory license with retrospective applicability is a mechanism for legitimising theft.”

He warns of widespread damage to millions of people in music, movies, news, and publishing. “A voluntary contract regime and enforcement of copyright seems to be the only way forward,” he says.

Bragadish Sureshkumar, Chief Technology Officer, Zopper, agrees, “A statutory model would ensure fair compensation, prevent unfair licensing practices, and create a level playing field between global AI giants and India’s vast base of creators. But effective policy requires more than good intentions.”

Copyright is no longer a peripheral issue in AI policy. It sits at the core of how nations balance innovation, economic competitiveness, and the rights of creators in an AI-driven future.

The post AI’s Copyright Reckoning: Can India Balance Creators And Code? appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...