{"id":60720,"date":"2024-07-01T07:30:00","date_gmt":"2024-07-01T11:30:00","guid":{"rendered":"http:\/\/pappp.net\/?guid=505804c2cd2201ae0bc07d7422de51b0"},"modified":"2024-07-01T07:30:00","modified_gmt":"2024-07-01T11:30:00","slug":"the-telltale-words-that-could-identify-generative-ai-text","status":"publish","type":"post","link":"https:\/\/pappp.net\/?p=60720","title":{"rendered":"The telltale words that could identify generative AI text"},"content":{"rendered":"<p class=\"syndicated-attribution\">Source: <a href=\"https:\/\/arstechnica.com\/?p=2034045\">Ars Technica<\/a><\/p>\n<div style=\"background-color : #fff7d5;\n\t\t\tborder-width : 1px; padding : 5px; border-style : dashed; border-color : #e7d796;margin-bottom : 1em; color : #9a8c59;\">Article note: They trained it on shitty florid academic writing, so it vomits out shitty florid academic writing. \nThat style is already basically a caricature of itself, propagated by mimicry, so the same \"This is probably horseshit\" indicators that worked for human authors work for LLM spew.<\/div><div>\n<figure>\n  <img src=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2024\/06\/GettyImages-1048873262-800x479.jpg\" alt='If your right hand starts typing \"delve,\" you may, in fact, be an LLM.' referrerpolicy=\"no-referrer\" loading=\"lazy\"\/>\n      <p><a href=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2024\/06\/GettyImages-1048873262.jpg\" rel=\"noopener noreferrer\">Enlarge<\/a> <span>\/<\/span> If your right hand starts typing \"delve,\" you may, in fact, be an LLM. (credit: Getty Images)<\/p>  <\/figure>\n\n\n\n\n\n\n<div><a name=\"page-1\"><\/a><\/div>\n<p>Thus far, even AI companies have had trouble coming up with tools that can reliably detect when a piece of writing was <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/09\/openai-admits-that-ai-writing-detectors-dont-work\/\" rel=\"noopener noreferrer\">generated using a large language model<\/a>. Now, a group of researchers has established a novel method for estimating LLM usage across a large set of scientific writing by measuring which \"excess words\" started showing up much more frequently during the LLM era (i.e., 2023 and 2024). The results \"suggest that at least 10% of 2024 abstracts were processed with LLMs,\" according to the researchers.<\/p>\n<p>In <a href=\"https:\/\/arxiv.org\/abs\/2406.07016\" rel=\"noopener noreferrer\">a pre-print paper posted earlier this month<\/a>, four researchers from Germany's University of Tubingen and Northwestern University said they were inspired by studies that measured the impact of the COVID-19 pandemic <a href=\"https:\/\/arstechnica.com\/health\/2023\/07\/gop-voters-had-higher-excess-deaths-rates-after-covid-vaccine-rollout\/\" rel=\"noopener noreferrer\">by looking at excess deaths<\/a> compared to the recent past. By taking a similar look at \"excess word usage\" after LLM writing tools <a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/12\/openai-invites-everyone-to-test-new-ai-powered-chatbot-with-amusing-results\/\" rel=\"noopener noreferrer\">became widely available in late 2022<\/a>, the researchers found that \"the appearance of LLMs led to an abrupt increase in the frequency of certain style words\" that was \"unprecedented in both quality and quantity.\"<\/p>\n<h2>Delving in<\/h2>\n<p>To measure these vocabulary changes, the researchers analyzed 14 million paper abstracts published on <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/\" rel=\"noopener noreferrer\">PubMed<\/a> between 2010 and 2024, tracking the relative frequency of each word as it appeared across each year. They then compared the expected frequency of those words (based on the pre-2023 trendline) to the actual frequency of those words in abstracts from 2023 and 2024, when LLMs were in widespread use.<\/p><\/div><p><a href=\"https:\/\/arstechnica.com\/?p=2034045#p3\" rel=\"noopener noreferrer\">Read 9 remaining paragraphs<\/a> | <a href=\"https:\/\/arstechnica.com\/?p=2034045&amp;comments=1\" rel=\"noopener noreferrer\">Comments<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Enlarge \/ If your right hand starts typing &#8220;delve,&#8221; you may, in fact, be an LLM. (credit:&#8230;<\/p>\n<p> <a href=\"https:\/\/pappp.net\/?p=60720\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[226],"tags":[],"class_list":["post-60720","post","type-post","status-publish","format-standard","hentry","category-news-2"],"_links":{"self":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts\/60720","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=60720"}],"version-history":[{"count":0,"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts\/60720\/revisions"}],"wp:attachment":[{"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=60720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=60720"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=60720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}