Literature Review of AI Detectors

About eighteen months ago, I started to notice machine-generated text cropping up in student work. As a composition teacher, my immediate reaction was to ban it. Text generators have little role in the composition classroom, however, composition teachers had few options for accurately identifying machine-generated text. The basic concerns were that detectors were inaccurate and could provide false positives. In other words, they might flag human writing as machine generated, especially with non-native speakers. My colleagues and I put considerable effort into redesigning courses and disincentivizing students from using AI such as ChatGPT or Bard to complete assignments. I think these changes have improved our pedagogies. Having survived a school year with AI, however, I was curious how things have changed in the world of detecting machine-generated text. As of mid-July 2024, here is what I’ve found. 

Neither humans nor AI-detection systems can regularly identify machine-generated text flawlessly. However, it’s worth noting that detectors are reaching a high level of accuracy, and they are preforming better than humans. Looking at research abstracts, J. Elliott Casal and Matt Kessler found that reviewers had “an overall positive identification rate of only 38.9%” (1). Oana Ignat and colleagues found that humans could only accurately identify 71.5% of machine-generated hotel reviews (7). Their AI detector, however, was able to correctly identify roughly 81% of machine-generated hotel reviews (8). Writing in 2023, Deborah Weber-Wulff et al. found similar results when testing twelve different AI-detection programs. The highest, Turnitin and Compatio approached 80% accuracy (15). Publishing this year, Mike Perkins and colleagues found Turnitin detected 91% of machine-generated texts (103-104) while human reviewers in the study only successfully identified 54.5% (1). Custom designing an AI detector to find machine-generated app reviews, Seung-Cheol Lee et al. were able to achieve 90% accuracy with their best model (20). For longer texts, the accuracy of both human reviewers and AI detectors increases. Comparing full-length medical articles, Jae Q. J. Liu et al. found that both professors and ZeroGPT correctly identified 96% of machine-generated texts (1). (Note that GPTZero, a different AI detector, performed considerably worse.) However, the professors also misclassified 12% of human-written content as having been rephrased by AI (8). 

Notably, Weber-Wulff mentions that AI detectors tend to have few false positives. In other words, if the software is unsure if a text was written by a human or a machine, it is more likely to classify it as human written (17). Turnitin, in fact, had 0 false positives (26). Perkins, too, noted that Turnitin was reluctant to label text as machine generated. While it did correctly identify 91% of papers as machine generated, it reported only 54.8% of the content in those papers as machine generated. In fact, the entire paper (100%) was machine generated (103-104). While this means a certain percentage of machine-generated writing will evade detectors, it should give professors some confidence that something flagged as machine generated is, very likely, machine generated. In another encouraging finding, Liu found that “No human-written articles were misclassified by both AI-content detectors and the professorial reviewers simultaneously” (11). 

There is one caveat, however. AI detectors may flag translated or proofread text as machine generated (Weber-Wulff 26). Once machines are introduced into the composition process, they likely leave artifacts that may be noticed by AI-detectors. Strictly speaking, the AI-detectors would not be wrong. Machines were introduced into the composition process. However, most professors would find the use of machines for translation or proofreading to be acceptable. 

The studies I mention to this point were attempting to consistently identify machine-generated content, but a team of researchers led by Mohammad Kutbi took a different approach. Their goal was to establish consistent, human authorship of texts by looking for a “linguistic fingerprint.” In addition to detecting the use of machine writing, this method would also detect contract plagiarism (i.e. someone hiring another person to write an essay for them). This system achieved 98% accuracy (1). While not mentioned in Kutbi’s study, other scholars have found that certain linguistic markers maintain consistency across contexts (Litvinova et al.). For these and other reasons, I believe that linguistic fingerprinting holds the most promise in detecting use of AI in the composition process. 

It’s also worth mentioning that participants in Liu’s study took between four and nine minutes to make a determination about whether or not an article was written by a human (8). In this situation, AI may actually aid professors by reducing the time they need and increasing the confidence they have in determining whether or not a text was machine generated. 

To briefly summarize

  • Both humans and AI-detectors are prone to error
  • AI detectors are generally better and in some cases significantly better than humans at identifying machine-generated text
  • AI detectors are fairly conservative in their classification of text as machine generated

Considering these points, I believe that at the current time, instructors should use AI detectors as a tool to help them determine the authorship of a text. According to Liu and colleagues, Originality.ai is the best overall AI detector and ZeroGPT is the best free AI detector (10). While not as accurate as the preceding tools, Turnitin deserves mention because it did not have any false positives in multiple studies (Liu 6, Weber-Wulff 26). Of course, as with any tool, these detectors need to be used with discretion and with a consideration of the bigger context of a work. I plan to write another post considering some common flags of machine-generated text. 

See My Vest

I have trouble keeping my pants on.

And I’m not alone.

It’s really a physics issue. Gravity pulls down, but belts pull in. Freakonomics actually did an episode about this a while back. If you, like me, are required to carry around work tools, this can be a serious problem. (Things are about to get a little specific and wonky, but if you want to learn about filmmaking gear, vests, and my own persnicketiness, read on!)

The Things They Carried

This is my regular complement of work tools.

img_6907

All told, you’re looking at 3 pounds, 5.7 ounces. And that doesn’t include the cell phone, car keys, etc.

The first thing I did was try to minimize weight. If you have to carry wrenches around, here are two big (or small?) recommendations. First is the Neiko Mini Ratchet. It does require you purchasing 3/16″ hex bits, and I’d recommend putting a drop of glue on the end to be sure they don’t pop out.

Then there’s the Lobtex lightweight adjustable wrench. Besides being incredibly light, this wrench opens up to an inch, so you can still use it on cheeseboroughs. Those little changes saved me just over a pound. Look at the difference!

img_6900

You do sacrifice some leverage, but it hasn’t been an issue yet.

Strangely, the lightest 25′ tape measure I’ve found just so happens to be my old Stanley at about 13 ounces. I have yet to find a lighter one, and many of the weights listed on Amazon are wrong.

Okay, great. But there’s still the problem of where to keep everything.

The Kangaroo

Most crew members carry various kinds of pouches, sometimes on a second belt. But that really doesn’t solve the gravity problem since you’re still pulling in against something that is pulling down.

 

combo-tool-pouch

(https://vipproductionnw.com/product/setwear-combo-tool-pouch/)

Suspenders

On very rare occasions, I’ve seen people wear suspenders on set. Yeah. Very rare…

The Holster

So then there was the walkie holster. A more common one is the “X-Wing Fighter Command” style.

womp rat

(https://www.pnta.com/scenic/tools/dirty-rigger-led-chest-rig/)

Which does kind of work if you aren’t worried about carrying easily scratched, expensive television monitors and looking like you need to shoot some womp rats.

Someone introduced me to the cop holster, which is either really cool or looks a bit like a training bra.

ush-300l-bus-suit-front views-med-300 dpi

(https://www.holsterguy.com/USH-300L-Bus-Suit-Front%20Views-Med-300%20dpi.html)

But then there was the bigger problem of finding space for all of my tools, which brings me to…

The Vest

I was basically looking for something that had vertical pockets and didn’t look too much like I was goin’ fishin’.

fishing vest

(https://www.dickssportinggoods.com/p/field-stream-mens-mesh-back-fishing-vest-17fnsmfsmshbckfshapv/17fnsmfsmshbckfshapv)

First stop: Carhatt. ($60-$65)

carhartt

(https://www.sheplers.com/carhartt-mens-sandstone-mock-neck-vest-/2000212640.html)

Nice, sturdy vest from a trusted brand. And reasonably priced, too. Strangely, has hand pockets, not vertical pockets, so your tools fall out if you sit down.

Second stop: Duluth. (Clearance $50. Reg. $80)

duluth

(https://www.duluthtrading.com/mens-iron-range-fire-hose-lined-vest-14002.html?cgid=mens-outerwear-vests&dwvar_14002_color=COF#start=2&cgid=mens-outerwear-vests)

Here, we have a winner. Vertical pockets (somewhat) reasonable price. Thick cotton to resist tears and fire. Great success. Unless you’re working on a stage next to a heating duct… (Also, I notice that it’s on clearance, so I’m not sure what will be available in the future.)

Third stop: Chinese vest. ($30-$40)

chinese

(https://www.amazon.com/gp/product/B074Z5RWRD/ref=ppx_yo_dt_b_asin_title_o03__o00_s00?ie=UTF8&psc=1)

Capitalism, even in its communist form, has a way of funneling things down to “good enough.” There are a handful of these vests sold by different companies, but I have the deep suspicion that they’re all made in the same place.

Anywho, if you’re on Amazon and see a vest with vertical pockets that’s only $30, you’re like, well, why not? Sure it has a sticker on it that reads “Fashion Style” and I had to order three different sizes because no one can just write a chest size on the product, but it’s only $30. Well, after a week of wearing it, one of the buttons popped right off. Then I noticed that the inside pockets weren’t actually stitched into the vest… So, I’d avoid this one.

Fourth stop: Fjallraven. ($135)

fjalraven

(https://varuste.net/en/Fjällräven+Reporter+Lite+Vest?_tu=55763)

This had the lightness of the Chinese vest combined with the durability of Duluth, and, of course, Swedish style. And a price tag to match. It did not, however, require allen wrenches to assemble. Plenty of vertical pockets to keep the tools from falling out (with snaps, not velcro). Sadly, it is rather expensive, but hopefully it will last a long time. I would certainly keep your eyes open for sales.

But now, success at last. I don’t have to worry about my pants falling off. It’s very easy to shed all of that weight at the end of the day (Just take the vest off), and you can even sit on a toilet without getting all tangled up in your walkie talkie cables. There you have it.

Cold weather: Duluth.
Warm weather: Fjallraven.

And as your reward for dealing with all of that, enjoy this: