OpenAI has a '99% effective' ChatGPT-detection tool ready. So why aren't they releasing it?

Thanks to a report from The Wall Street Journal, Microsoft-backed OpenAI has admitted to having a tool that can watermark ChatGPT-generated text. According to OpenAI, the system has been ready for over a year now; however, there’s been debate on whether or not to release it.

The admission comes in the form of an update to a May blog post from OpenAI titled “Understanding the source of what we see and hear online.” According to the WSJ report, the system works by changing the way ChatGPT generates its words. Thus, the changes made by the system would leave a pattern or watermark that can be detected by the anti-cheating tool.

Internal documents obtained by WSJ show that the watermark system is 99.9 percent effective “when enough new text is created by ChatGPT.”

The tool would be a godsend for teachers and educators across the country who are struggling with students using ChatGPT to cheat on assignments and essays. And despite how effective internal tests of the tool have shown — as high as 99 percent effective — OpenAI has been reluctant to release it.

The reasoning behind this, according to a statement to the WSJ from a company spokesperson, is OpenAI’s belief that “the tool could disproportionately affect groups such as non-native English speakers.” However, another primary reason could be an April 2023 survey that revealed that 30 percent of ChatGPT said they’d stop using the generative AI if watermarking was deployed.

This internal debate on releasing the tool has raged on for two years at OpenAI, with one source in the company telling WSJ, “It’s just a matter of pressing a button.”