As AI-generated text becomes increasingly prevalent, it’s crucial to address the potential misuse of large language models (LLMs) such as academic cheating, propaganda, spam, and impersonation. In this article, we explore a futuristic proposal for implementing a comparatively robust, multi-layered approach to the text watermarking process by examining the lifecycle of AI-generated text stage-by-stage. While this proposal is a work in progress and may have imperfections, it aims to serve as a starting point for further brainstorming and discussion within the AI safety community.
What stages does an AI-generated text go through?
Before diving into the watermarking process, let’s first look at the rough lifecycle of an AI-generated text:
Text generation by the LLM
Copying the text to the clipboard
Pasting the text into a text editor or word processor
Saving the text as a file or document
A Multi-Layered Approach To Watermarking AI-generated Text
Now, let’s dive into a futuristic proposal for implementing a comparatively robust, multi-layered text watermarking process by exploring the lifecycle of AI-generated text stage-by-stage:
LLM Watermarking:
Researchers have proposed watermarking at the stage of text generation by adding a statistical signal:
a. https://deepmind.google/technologies/synthid/
b. https://arxiv.org/pdf/2304.04736.pdf
c. https://arxiv.org/pdf/2301.10226.pdf
d. https://arxiv.org/pdf/2306.17439.pdf
e. https://arxiv.org/pdf/2306.09194.pdf
f. https://arxiv.org/pdf/2307.15593.pdf
g. Prof. Scott Aaronson has also described the technique of adding a statistical signal for watermarking AI-generated text in his video. Most of this work has been produced keeping in mind that the quality of the AI-generated text is least compromised.
We could use any other statistical watermarking techniques (described in the research papers listed above) but we take Prof. Scott’s method as a starting point to discuss further. As per the video, we use a secret key (let’s call it the “LLM secret key”) to parameterize the pseudo-random function that secretly favors the selection of certain next tokens over others. The detector will need this “LLM secret key” to check whether the watermark is present or not.
The “LLM secret key” can be shared with text editors, document editors, and word processors for watermark detection.
If AI Safety companies like Anthropic succeed in their “Interpretability Dreams,” watermarks can be added at more semantic levels by modifying vectors inside the model.
Clipboard Watermarking:
The LLM website (e.g., Claude.ai or ChatGPT) will watermark the text using a secret key (the “Clipboard Secret Key”) before copying it to the clipboard.
The “Clipboard Secret Key” will be available to text editors and word processors for detecting the watermark and pasting the text.
Text Editor/Word Processor/Document Editors:
It will have access to both the “LLM secret key” and the “Clipboard Secret Key” for watermark detection.
Upon pasting, it will perform a double verification to detect the presence of both watermarks.
If the text is verified as AI-generated, the text editor will automatically add a citation/reference (e.g., “OpenAI. (2023). ChatGPT [Large language model].
https://chat.openai.com”).
The citation/reference will be enforced, and users can only edit the style but not remove it completely.
Even if the user paraphrases the copied text, the text editor will be aware of the presence of AI-generated content and add it to the document’s metadata.
This process is similar to “pasting with watermark,” akin to “pasting with formatting.”
Text File/Text Document:
Text files and documents can store metadata, including the watermark keys, as per the C2PA specification.
Challenges and Considerations:
Establishing a secure supply chain for sharing the LLM and Clipboard secret keys with text editors, word processors, document editors, plagiarism checkers, and other relevant entities.
Developing a cryptographic method for generating a watermark at the clipboard level.
Handling scenarios where text is copied from one text editor to another.
Exploring the possibility of shifting the burden of watermark detection and retention to mandatory browser extensions from Anthropic, OpenAI, or Grammarly.
The UX concern is when the citation/reference will be enforced, and users can only edit the style but not remove it completely on the text editors/word processors/document editors
Conclusion:
While this proposal is complex and requires collective efforts from various entities, including AI companies, text editor/word processor/document editor developers, governments, and others, the goal is to remove a single point of failure from the system and make it more robust. Just as cybersecurity attacks persist despite innovations and regulations, we can only strive to make the watermarks in the text more and more robust. As an AI safety researcher, I acknowledge that this process is filled with technical, UX, privacy, and regulatory challenges. However, through research efforts, we can continue to serve as an example throughout the AI safety landscape. This proposal is a first attempt at addressing a difficult problem and is likely imperfect in various ways, but it aims to provide a baseline for further brainstorming and discussion within the AI safety community.
References:
Thank you for reading 🤗



