Writing to Think

I’m trying to think of a word. Is it existential? Maybe it’s epistemic. I feel like it starts with an “e.” To sort things out, I can ctrl+click on this word in Word and see what synonyms Microsoft suggests. I could go to Google. I could even do it old school and pull out a thesaurus. What I’m trying to illustrate here is that writers have used a variety of tools for nearly two hundred years* to assist them in getting thoughts from their heads onto the page. Is generative AI any different? That’s what I’m going to spend the next six hundred words trying to figure out. 

DALL-E's effort to illustrate this blog...

The underlying** problem is that we don’t know how AI impacts thought, especially in developing brains. One theory is that AI can get rid of the drudgery of menial tasks. If I don’t have to worry about dangling modifiers, subject/verb agreement, and double negatives, I can spend my time on more meaningful pursuits. AI could accelerate learning by allowing us to delve into deeper questions.

Another possibility is that AI will lead us to a certain, bland “average.” AI has taken the sum of our digital culture and condensed it into the likeliest outputs. For example, “How do we stop gun violence in the United States?” ChatGPT says, “Stopping gun violence in the United States requires a multifaceted approach that includes implementing comprehensive gun safety laws, addressing root causes like poverty and mental health, improving community support systems, and fostering a culture of responsible gun ownership.” Let’s be honest, I didn’t need to ask ChatGPT to come up with that answer. It’s pretty obvious. But it’s also so non-specific that it’s basically useless. We know these things, and they haven’t worked yet.  We need to figure out why, specifically, they haven’t worked and come up with actionable steps to address them. And this intellectual dead end takes us to the most dire possible impact AI could have on thought. 

If students leapfrog over fundamental lessons, they may fail to develop the critical thinking skills necessary to tackle the world’s biggest problems. Let’s go back to that double negative. When someone’s speaking to us, we can usually tell what they’re trying to say, even if it’s not literally true. I won’t try to confuse you with none of that grammar unclarity. (You know what I mean, right?) But as problems and conversations get more complex, double negatives and dangling modifiers can create serious issues. Worse, if students have been relying on AI to solve their issues up to the point where they get to those more complex problems, they may not have the skills to solve them on their own. 

The truth is, at this point, we just don’t know how AI affects developing brains. Some of the emerging research should certainly make us pause. This study looking at human creativity found that AI’s impact is—to steal a favorite LLM word—“multifaceted.” To sort out exactly what aspects of learning AI accelerates and what parts it damages will require extensive research, attention, and time. To implement these findings will require even more research, attention, and time. But I feel like this is a good place to pivot back to something I wrote in my first paragraph.  “That’s what I’m going to spend the next six hundred words trying to figure out.” 

[Out of curiosity, I copied the preceding paragraphs into ChatGPT and asked it to write a conclusion. Here’s what it came up with.] “In the end, the question of whether generative AI is fundamentally different from the tools writers have always used boils down to its impact on thought and learning. Like a thesaurus or spell-check, it can help us articulate ideas more efficiently, but unlike those tools, it operates on a scale and complexity that could reshape how we think altogether. Whether it accelerates creativity or dulls it into predictable averages, AI challenges us to reevaluate how we approach learning, problem-solving, and even the pursuit of originality. The stakes couldn’t be higher: in an era of unprecedented technological advancement, ensuring that AI enhances rather than erodes critical thinking will determine whether it becomes a powerful ally or a crutch that undermines us in the long run.”

While ChatGPT’s conclusion is true(ish), it’s not where I planned to take this essay because it’s missing a key consideration. Many people think of writing as a way to record and transmit information. That’s true. And AI is actually decent at doing that. But writing is much more than that. Writing is a form of expression. Writing allows us to connect with each other intimately (love notes) or impersonally (jury summons). Writing is performative. (Seriously Hallmark, who is the “Best Dad Ever”?) But that performance serves a social function. (Aren’t those cards more meaningful with a handwritten note?) Writing signals who’s part of our group and who isn’t. And writing can help us find common ground with people who aren’t part of our group. Most relevant to this essay, writing is a way to sort out our thoughts. That’s what I’m doing here. That’s one of the reasons I started blogging. As various writers, including Joan Didion said, “I write entirely to find out what I’m thinking.” The more that students outsource their writing to a machine, the less time they will spend thinking about their words. We don’t yet know the consequences of this outsourcing, but we do know that writing can address many of the deeply human issues facing us today: a lack of critical thought, empathy, meaning, and human connection. Maybe we should spend some more time grappling with our words before we outsource too much of this process to the machines. 

*According to Wikipedia, another writer’s tool, when Peter Mark Roget created Roget’s thesaurus, he “wished to help ‘those who are painfully groping their way and struggling with the difficulties of composition … this work processes to hold out a helping hand.'”

**Maybe the word I was looking for started with “u.”

Literature Review of AI Detectors

About eighteen months ago, I started to notice machine-generated text cropping up in student work. As a composition teacher, my immediate reaction was to ban it. Text generators have little role in the composition classroom, however, composition teachers had few options for accurately identifying machine-generated text. The basic concerns were that detectors were inaccurate and could provide false positives. In other words, they might flag human writing as machine generated, especially with non-native speakers. My colleagues and I put considerable effort into redesigning courses and disincentivizing students from using AI such as ChatGPT or Bard to complete assignments. I think these changes have improved our pedagogies. Having survived a school year with AI, however, I was curious how things have changed in the world of detecting machine-generated text. As of mid-July 2024, here is what I’ve found. 

Neither humans nor AI-detection systems can regularly identify machine-generated text flawlessly. However, it’s worth noting that detectors are reaching a high level of accuracy, and they are preforming better than humans. Looking at research abstracts, J. Elliott Casal and Matt Kessler found that reviewers had “an overall positive identification rate of only 38.9%” (1). Oana Ignat and colleagues found that humans could only accurately identify 71.5% of machine-generated hotel reviews (7). Their AI detector, however, was able to correctly identify roughly 81% of machine-generated hotel reviews (8). Writing in 2023, Deborah Weber-Wulff et al. found similar results when testing twelve different AI-detection programs. The highest, Turnitin and Compatio approached 80% accuracy (15). Publishing this year, Mike Perkins and colleagues found Turnitin detected 91% of machine-generated texts (103-104) while human reviewers in the study only successfully identified 54.5% (1). Custom designing an AI detector to find machine-generated app reviews, Seung-Cheol Lee et al. were able to achieve 90% accuracy with their best model (20). For longer texts, the accuracy of both human reviewers and AI detectors increases. Comparing full-length medical articles, Jae Q. J. Liu et al. found that both professors and ZeroGPT correctly identified 96% of machine-generated texts (1). (Note that GPTZero, a different AI detector, performed considerably worse.) However, the professors also misclassified 12% of human-written content as having been rephrased by AI (8). 

Notably, Weber-Wulff mentions that AI detectors tend to have few false positives. In other words, if the software is unsure if a text was written by a human or a machine, it is more likely to classify it as human written (17). Turnitin, in fact, had 0 false positives (26). Perkins, too, noted that Turnitin was reluctant to label text as machine generated. While it did correctly identify 91% of papers as machine generated, it reported only 54.8% of the content in those papers as machine generated. In fact, the entire paper (100%) was machine generated (103-104). While this means a certain percentage of machine-generated writing will evade detectors, it should give professors some confidence that something flagged as machine generated is, very likely, machine generated. In another encouraging finding, Liu found that “No human-written articles were misclassified by both AI-content detectors and the professorial reviewers simultaneously” (11). 

There is one caveat, however. AI detectors may flag translated or proofread text as machine generated (Weber-Wulff 26). Once machines are introduced into the composition process, they likely leave artifacts that may be noticed by AI-detectors. Strictly speaking, the AI-detectors would not be wrong. Machines were introduced into the composition process. However, most professors would find the use of machines for translation or proofreading to be acceptable. 

The studies I mention to this point were attempting to consistently identify machine-generated content, but a team of researchers led by Mohammad Kutbi took a different approach. Their goal was to establish consistent, human authorship of texts by looking for a “linguistic fingerprint.” In addition to detecting the use of machine writing, this method would also detect contract plagiarism (i.e. someone hiring another person to write an essay for them). This system achieved 98% accuracy (1). While not mentioned in Kutbi’s study, other scholars have found that certain linguistic markers maintain consistency across contexts (Litvinova et al.). For these and other reasons, I believe that linguistic fingerprinting holds the most promise in detecting use of AI in the composition process. 

It’s also worth mentioning that participants in Liu’s study took between four and nine minutes to make a determination about whether or not an article was written by a human (8). In this situation, AI may actually aid professors by reducing the time they need and increasing the confidence they have in determining whether or not a text was machine generated. 

To briefly summarize

  • Both humans and AI-detectors are prone to error
  • AI detectors are generally better and in some cases significantly better than humans at identifying machine-generated text
  • AI detectors are fairly conservative in their classification of text as machine generated

Considering these points, I believe that at the current time, instructors should use AI detectors as a tool to help them determine the authorship of a text. According to Liu and colleagues, Originality.ai is the best overall AI detector and ZeroGPT is the best free AI detector (10). While not as accurate as the preceding tools, Turnitin deserves mention because it did not have any false positives in multiple studies (Liu 6, Weber-Wulff 26). Of course, as with any tool, these detectors need to be used with discretion and with a consideration of the bigger context of a work. I plan to write another post considering some common flags of machine-generated text.