Trying to Watermark LLMs is Useless

Trying to Watermark LLMs is Useless

Two years ago, ChatGPT started a revolution. Apart from sparking the generative AI and LLM craze, the rise of AI also triggered a wave of protests from writers and artists against the use of AI-generated content. However, instead of overpromising a ban on text-generating models, big tech and the companies that are building these models promised that they would watermark the text generated by AI.

Watermarks aim to improve transparency by labelling AI-generated content, aiding in the detection of malicious uses. However, achieving effective watermarking involves balancing competing parameters like robustness, detection difficulty, and resistance to removal or spoofing.

Fast-forward two years, and everyone has tried their hands on it. Most recently, Google DeepMind, in collaboration with Hugging Face, open-sourced its research titled ‘Scalable watermarking for identifying large language model outputs’ in a bid to distinguish between human and AI content on the internet, driven by LLMs.

Launched exactly a year ago, their watermarking tool SynthID is now available for wider access and designed to have negligible computational impact. This is ideal for both cloud and on-device detection. But what is the point of this tool if OpenAI, Microsoft, and Meta have already tried and failed to build a good watermarking or AI detection tool?

On a similar note, researchers from Carnegie Mellon University’s School of Computer Science have analysed the tradeoffs in popular watermarking techniques for text generated by LLMs.

Identifying AI Text: The False Hope

The goal of watermarking is often misunderstood. While it can identify text from a specific, watermarked LLM, it cannot reliably distinguish between AI-generated and human-authored text. The former benefits developers, but the latter—purportedly protecting society from misinformation or misuse—is practically unattainable with the current technology.

It turns out that watermarking is not enough to reduce the spread of AI-driven “misinformation”. There is still work to be done.

Ethan Mollick, professor and co-director of the Generative AI Lab at Wharton, commenting on the launch of SynthID, said, “A new watermarking AI paper is causing a stir, but note that watermarking does not solve for identifying AI content in the real world, because it requires cooperation from the AI company and also breaks when people edit the text.”

According to researchers, there are three core problems when it comes to watermarking AI text. First, all capable LLMs must be watermarked. Open-source models like Llama 3.1 405B, downloaded millions of times, lack watermarks and can’t be undone. Harmful actors will always access unwatermarked models.

Why LLM watermarking will never work https://t.co/JuHOmfLpPB

— David Gilbertson (@D__Gilbertson) November 14, 2024

Another limitation is that no LLM provider can allow token selection control. Features like temperature settings, which control the randomness in the generated text, are essential for useful applications but incompatible with watermarking. Removing these could ironically increase harm by weakening existing harm-reduction mechanisms designed to balance creativity and safety.

Lastly, and most importantly, no open-source models can exist if watermarking is applied during text generation, making it trivial to disable in open-source models. Moreover, bad actors prefer open models for privacy, rendering API-based watermarking ineffective.

For example, a recent research paper said that LLM watermarking is susceptible to spoofing and scrubbing attacks. For under $50, attackers can bypass schemes with over 80% success, emphasising the need for stronger protections.

If nothing else works, even if powerful future models are API-only, paraphrasing tools derived from open-source models will always manage to bypass watermarks. This can also be achieved by using other LLMs, or in certain cases, even the same one.

What Even is Watermarking?

Earlier, AIM revealed that AI detection tools are largely ineffective, as they fail to detect AI-generated text. The Bhagavad Gita, part of the Mahabharata, believed to have been written between 400 BCE and 200 CE by sage Veda Vyasa, has also been attributed to AI. There’s more. Even the Preamble of the Indian Constitution is supposedly AI-generated, according to many inaccurate AI detectors.

Copyright and watermarks become irrelevant when it comes to such historic texts. So, if an AI detection tool declares that such ancient texts are AI-generated, it becomes clear that text is not a medium that can be watermarked at all.

Assuming that watermarking miraculously worked, would it solve the problem? No, for two key reasons: first, AI-generated and human text are intertwined, as human writers often use LLMs for editing, summarisation, or translation. Second, not every AI-generated text is harmful.

To put it simply in the words of Dominik Lukes, lead business technologist at the AI/ML support competency centre at the University of Oxford, “Even if AI-generated fraudulent text was a bigger problem than human-generated fraudulent text, watermarking would not fix it. Fraudsters would simply use non-watermarked models. Also, outside a school exam, the use of an LLM is no longer a reliable indicator of fraud.”

The post Trying to Watermark LLMs is Useless appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...