Literature Review of AI Detectors

About eighteen months ago, I started to notice machine-generated text cropping up in student work. As a composition teacher, my immediate reaction was to ban it. Text generators have little role in the composition classroom, however, composition teachers had few options for accurately identifying machine-generated text. The basic concerns were that detectors were inaccurate and could provide false positives. In other words, they might flag human writing as machine generated, especially with non-native speakers. My colleagues and I put considerable effort into redesigning courses and disincentivizing students from using AI such as ChatGPT or Bard to complete assignments. I think these changes have improved our pedagogies. Having survived a school year with AI, however, I was curious how things have changed in the world of detecting machine-generated text. As of mid-July 2024, here is what I’ve found. 

Neither humans nor AI-detection systems can regularly identify machine-generated text flawlessly. However, it’s worth noting that detectors are reaching a high level of accuracy, and they are preforming better than humans. Looking at research abstracts, J. Elliott Casal and Matt Kessler found that reviewers had “an overall positive identification rate of only 38.9%” (1). Oana Ignat and colleagues found that humans could only accurately identify 71.5% of machine-generated hotel reviews (7). Their AI detector, however, was able to correctly identify roughly 81% of machine-generated hotel reviews (8). Writing in 2023, Deborah Weber-Wulff et al. found similar results when testing twelve different AI-detection programs. The highest, Turnitin and Compatio approached 80% accuracy (15). Publishing this year, Mike Perkins and colleagues found Turnitin detected 91% of machine-generated texts (103-104) while human reviewers in the study only successfully identified 54.5% (1). Custom designing an AI detector to find machine-generated app reviews, Seung-Cheol Lee et al. were able to achieve 90% accuracy with their best model (20). For longer texts, the accuracy of both human reviewers and AI detectors increases. Comparing full-length medical articles, Jae Q. J. Liu et al. found that both professors and ZeroGPT correctly identified 96% of machine-generated texts (1). (Note that GPTZero, a different AI detector, performed considerably worse.) However, the professors also misclassified 12% of human-written content as having been rephrased by AI (8). 

Notably, Weber-Wulff mentions that AI detectors tend to have few false positives. In other words, if the software is unsure if a text was written by a human or a machine, it is more likely to classify it as human written (17). Turnitin, in fact, had 0 false positives (26). Perkins, too, noted that Turnitin was reluctant to label text as machine generated. While it did correctly identify 91% of papers as machine generated, it reported only 54.8% of the content in those papers as machine generated. In fact, the entire paper (100%) was machine generated (103-104). While this means a certain percentage of machine-generated writing will evade detectors, it should give professors some confidence that something flagged as machine generated is, very likely, machine generated. In another encouraging finding, Liu found that “No human-written articles were misclassified by both AI-content detectors and the professorial reviewers simultaneously” (11). 

There is one caveat, however. AI detectors may flag translated or proofread text as machine generated (Weber-Wulff 26). Once machines are introduced into the composition process, they likely leave artifacts that may be noticed by AI-detectors. Strictly speaking, the AI-detectors would not be wrong. Machines were introduced into the composition process. However, most professors would find the use of machines for translation or proofreading to be acceptable. 

The studies I mention to this point were attempting to consistently identify machine-generated content, but a team of researchers led by Mohammad Kutbi took a different approach. Their goal was to establish consistent, human authorship of texts by looking for a “linguistic fingerprint.” In addition to detecting the use of machine writing, this method would also detect contract plagiarism (i.e. someone hiring another person to write an essay for them). This system achieved 98% accuracy (1). While not mentioned in Kutbi’s study, other scholars have found that certain linguistic markers maintain consistency across contexts (Litvinova et al.). For these and other reasons, I believe that linguistic fingerprinting holds the most promise in detecting use of AI in the composition process. 

It’s also worth mentioning that participants in Liu’s study took between four and nine minutes to make a determination about whether or not an article was written by a human (8). In this situation, AI may actually aid professors by reducing the time they need and increasing the confidence they have in determining whether or not a text was machine generated. 

To briefly summarize

  • Both humans and AI-detectors are prone to error
  • AI detectors are generally better and in some cases significantly better than humans at identifying machine-generated text
  • AI detectors are fairly conservative in their classification of text as machine generated

Considering these points, I believe that at the current time, instructors should use AI detectors as a tool to help them determine the authorship of a text. According to Liu and colleagues, Originality.ai is the best overall AI detector and ZeroGPT is the best free AI detector (10). While not as accurate as the preceding tools, Turnitin deserves mention because it did not have any false positives in multiple studies (Liu 6, Weber-Wulff 26). Of course, as with any tool, these detectors need to be used with discretion and with a consideration of the bigger context of a work. I plan to write another post considering some common flags of machine-generated text. 

Stop Saying “Fast Forward.”

Over the past few years, I’ve noticed people using the phrase “fast forward” to indicate a passage of time. While I acknowledge the dynamic nature of language and try not to ride a high horse about “proper” English, I do find this phrase particularly jarring and troublesome. I’d like to take a moment to explain my concerns and, I hope, encourage you to think twice before using the phrase.

Photo by Anthony on Pexels.com

Here’s a pretty typical example from Forbes. “I served as a translator for both language and culture over the years and gained a deep appreciation of the challenges of navigating caregiving, education and culture. Fast forward to graduate school: My interest in supporting child well-being led me to become interested in better understanding policy.”

For starters, there’s an issue of “point of view” (POV). In case you forget POV from English class, first person = I; second person = you; third person = he/she/it/they.“Fast forward” shifts the point of view of a story. Most stories are told in first or third person. So if you say “fast forward,” who is doing the forwarding? 

If no subject is identified, “fast forward” operates in the second person POV with “you” understood. In other words, when I say, “Call me later,” I’m really saying, “You call me later.” So in the example above, who’s fast forwarding? You aren’t telling the story. The phrase makes much more sense when the subject of the sentence takes clear ownership. “Let me fast forward to graduate school.” “Can you fast forward to graduate school …” But if I have control of a story, why are you the one fast forwarding?

Then, there’s an issue of redundancy. “Fast forward,” used to indicate a passage of time, is often used in conjunction with another phrase used to indicate a passage of time. For example, “But relevance wasn’t the point — this was all about toughness. Fast-forward to May 14, when 10 people were gunned down at a Tops supermarket in Buffalo, New York” (MSNBC). Or “…President Donald Trump took a few steps in to North Korea and spoke about his friendship with that country’s leader, Kim Jong Un. Fast-forward almost three years. President Biden is in Seoul, emphasizing his friendship with new South Korean President Yoon Suk Yeol” (NPR).

https://blog.arduino.cc/2020/07/20/this-automated-perpetual-calendar-is-a-beautiful-way-to-watch-the-years-pass-by/

In both of these cases, fast forward is redundant. It is literally a waste of breath. The authors could just write “On May 14” or “Today.” Both choices are shorter and convey the same information. Brevity is a skill. Why use up our mental bandwidth for something you don’t need? If you can delete something, do it!

That being said, many writers will sprinkle in phrases to help set a theme. If you’re talking about movies, why not use “fast forward” as your time transition? And bits of jargon have been weaseling their way into our everyday language for centuries. In my introduction, I mention horses even though this essay really has nothing to do with horses. Below, I allude to plants. Considering the prevalence of video in today’s word, we can’t exactly prevent a phrase like “fast forward” from taking root. But there is a good reason I would caution against it. 

Hollywood has done a great job convincing us that love at first sight is real, “smoking guns” exist, passionate speeches change people’s minds, and there’s always a parking spot directly in front of a courthouse. If you use the veracity of that last example to measure the other three, you can see how absurd some of these propositions really are. We don’t live in movies. We can’t fast forward and rewind at will, and we need to stop thinking that we can. “Sure,” you might say. “That’s a problem for tween influencers who want to star in reality shows. But can tell the difference between reality and the movies.” Respectfully, I disagree. 

https://www.businessinsider.com/fox-24-in-development-2015-5

Think back, if you will, to the early aughts, when one of the most powerful countries in the world invaded a sovereign nation under false pretenses. At that time, a paranoid Bush administration justified its torture of detainees not through psychology and jurisprudence, but through the Fox television show 24. Slate noted the troubling argument way back in 2008. Politicians, judges, and intelligence operatives were basing their actions off of a fictional television show with real-world ramifications. It’s hard for me to believe that fourteen years later, with the ubiquitous use of smart phones and social media, that our psychology has become less entwined with fiction, fantasy, and technology.

I need to reset here, briefly, because I am a fan of fiction. Fiction helps us explore questions and ideas that would not be accessible in a purely fact-driven world. Fiction helps us develop empathy. Understanding fiction helps us understand reality. But fiction is merely an analogy. Fiction and virtual worlds are not the same thing as flesh and blood, and I think it is incumbent on us to keep those lines distinct.

As we spend more time in the virtual world, manipulating images, audio, and video like gods, we need to keep the reality of our existence in mind. We can’t photoshop ourselves to be sexier, edit our conversations to remove faux pas, or fast forward our way through a traffic jam. I think acknowledging that fact, even in a small way, will lead us to accept the world we really live in and do our best to make this world a better place.