On Generative AI, phantom citations, and social calluses

The Generative AI age is going to be exhausting and unpleasant, isn't it?

Mar 20, 2023

I want to tell you about an odd email exchange from last week. It felt like a harbinger.

On Thursday, a student at a European university reached out to me, politely asking for help obtaining a copy of an old article of mine. Normally the way I handle messages like this is to thank them for their interest and attach a pre-publication version of the article. (Anytime someone shows interest in reading some obscure piece I’ve written, I am happy to oblige.)

What made this exchange remarkable was that I had no recollection of the article he mentioned. It was, he said, published in 2010, in the ANNALS of the American Academy of Political and Social Science. The title was “Tech lobbying in the United States: Exploring corporate political activity in the information age.” I don’t believe I’ve ever written an article with that title. I also don’t recall publishing a piece in that journal back in 2010.

Maybe, I thought, he had the details a little wrong. It might be a book chapter for an edited volume, or maybe a book review. I’ve contributed a lot of book chapters to a lot of edited volumes over the years. Not all of them are particularly memorable.

I checked my files and found nothing that fit the description. I checked my Google Scholar profile. Nothing there either. So I typed the following response:

“Thanks for your interest. Can you send me a link to the article you're referencing? I am not finding it anywhere in my records. It might have been published under an alternate name. I'd be happy to send it to you, just need a bit more help tracking it down.”

The student wrote back:

“Thank you for your response. The article should supposedly be here: https://journals.sagepub.com/doi/10.1177/0002716210373643”

That web link takes us to a file-not-found page. It’s a dead end. Link rot is an all-too-common curse. So I replied:

“That link leads me to an error message. What's the full article citation? Title, journal, date, etc?”

He responded with the following details: Karpf, D. (March, 2010). Tech lobbying in the United States: Exploring corporate political activity in the information age. The ANNALS of the American Academy of Political and Social Science, 628(1), 194-211.

Bewildered, I checked volume 628(1) of ANNALS. No such article was published in that journal. Pages 194-211 span three published articles, none of which have any topical overlap with this piece, and none of which cite me.

So I replied, still trying my best to be helpful: “I'm looking at the table of contents of that issue of ANNALS. That article does not seem to exist. Where did you find this reference? Again, I'm happy to help, but the article title isn't ringing a bell...”

At that point the student stopped responding. I had wasted probably half an hour on a complete snipe hunt. What the hell was going on? Where did this phantom citation come from?

Giving it some though, I have a pretty strong hunch: I think the student used ChatGPT. I think he was trying to use it the right way. And that made trouble for both of us.

(an XKCD classic: “Wikipedian protester”)

Back in the ‘00s — after the initial moral panic subsided — people came to describe Wikipedia as “a great place to start your research, and a terrible place to finish.”

That’s certainly true today. Hell, Google now features Wikipedia entries as the top search result for many queries. Wikipedia is more trustworthy than the open web. It’s more trustworthy than Twitter or Facebook. And it includes citations.

If you want to learn about, say, Moore’s Law the reasonable first step is to visit Wikipedia for a summary. From there, you can dig into some of the pages 181 citations (!!!), or pick up one of the books and articles suggested in the “further reading” section. (I personally recommend Cyrus Moody’s The Long Arm of Moore’s Law. It is excellent.)

If you are a student writing a research paper, you will cite the books and articles, not the Wikipedia entry. Wikipedia points to and summarizes the underlying research, but you should then take the time to review it yourself.

Now imagine you are a student conducting academic research today. You have heard constantly about these new Generative AI tools. You know that people are using Generative AI to write their papers for them. You know that’s plagiarism. You don’t want to do that. You aren’t trying to cheat. You’re trying to learn.

So what you might reasonably do is treat Generative AI the same way we treat Wikipedia — good place to start/terrible place to finish. So you ask the AI to create an essay summarizing the history of, for instance, tech lobbying in the United States. That’s an important and interesting topic. There is no Wikipedia entry on it.

That machine-generated essay, just like Wikipedia, includes citations. Brilliant. You can now do the responsible thing, tracking down those citations, reading them, and producing a better essay yourself. This all seems reasonable, responsible, and ethical.

There’s just one problem: Generative AI is a bullshit generator. It has no underlying theory of truth or facts. It does not think, reason, or theorize in the way that humans are used to. Products like ChatGPT are exceptionally overpowered guess-the-next-word engines. So they are remarkably effective at producing text that sounds right, and they effortlessly fabricate along the way.

If you were to brainstorm a list of academic authors who might publish an article on “Tech lobbying in the United States,” I would be on it.

If you were to generate a list of journals that might publish such an article, ANNALS would be included.

It is all quite plausible. Therein lies the problem.

This reminds me of a passage that has always bugged me from Nicholas Negroponte’s 1995 book, Being Digital. Negroponte, ever the digital optimist, describes the wonderful leveling effects of email:

A mild insomniac, I often wake up around 3:00a.m., log in for an hour, and then go back to sleep. At one of these drowsy sessions I received a piece of e-mail from a certain Michael Schrag, who introduced himself very politely as a high school sophomore. He asked if he might be able to visit the Media Lab when he was visiting MIT later in the week. […]
When I finally met Michael, his dad was with him. He explained to me that Michael was meeting all sorts of people on the Net […] What startled Michael’s father was that all sorts of people, Nobel Prize winners and senior executives, seemed to have time for Michael’s questions. The reason is that it is so easy to reply, and (at least for the time being) most people are not drowning in gratuitous e-mail.
Over time, there will be more and more people on the Internet with the time and wisdom for it to become a web of human knowledge and assistance. The 30 million members of the American Association of Retired Persons, for example, constitute a collective experience that is currently untapped. Making just that enormous body of knowledge and wisdom accessible to young minds could close the generation gap with a few keystrokes.

It has always seemed to me that Negroponte’s abundant digital optimism was rooted in a naive misunderstanding of what it means to be living in the early times. There is a habit among tech futurists to imagine that the future of a technology will be like the present, only much larger-scale, and with all the bugs worked out. But instead, it turns out that the technology’s future is barely like the present, because reaching a larger scale creates an entirely separate set of problems.

Young Michael Schrag was able to correspond with the illustrious head of the MIT Media Lab, and with Nobel Laureates and Senior Executives, because they were all members of the same exclusive club. They were early internet adopters — “netizens,” as they called themselves back then. When e-mail was still rare, those who participated online were often delighted to make time for one another.

Negroponte showed a passing recognition of this state of affairs, acknowledging that “(at least for the time being) most people are not drowning in gratuitous e-mail.” But he treated it as a future annoyance rather than a systemic property of the communications environment he was trying to promote.

It is no less easy to reply to an email today than it was in 1995. (In fact, Google’s early forays into suggestion replies has made it even easier.) But Nobel Laureates are no longer replying to high school sophomores. Of course they aren’t.

Over the years, we have built up social calluses. It was inevitable that we would do so. We were never headed toward Negroponte’s imagined world, where generational wisdom was just a keystroke away. We were, instead, headed toward a world of elaborate spam filters and pervasive corporate surveillance. Because as more people came online, the business models came into focus. As more people came online, we dealt with the flood of communications by tuning most of it out.

This brings me back to my young interlocutor last week. I don’t fault him for reaching out to me. ChatGPT, or some similar program, fabricated a citation. It sounded real, but the link was broken online. So he reached out and asked for help. It’s all reasonable, well-meaning behavior.

But I’m left with a premonition of the sheer weariness that is to come.

Because as all the largest internet platforms race to integrate these tools into everything, everywhere, all at once, I suspect this interaction is going to become commonplace.

From there we’ll grow a new set of social calluses. I’ll eventually come to view such messages as an unwelcome drain on my precious time, instead of as a warming reminder that someone, somewhere, has found value in a piece I wrote long ago.

It’s just such a bummer. When all is said and done, it seems like the net result of this massive advance in computational power will be to drown us in noise and make everyone a little less kind.

FURTHER READING: John Herrman, “The Nightmare of AI-Powered Gmail Has Arrived.”

Herrman is always worth reading. His column this week takes aim at Google’s announcement that it will rebuilding all of its main workplace products to take advantage of Generative AI:

Here’s Herrman:

Google’s promotional video is worth watching for a few reasons. In contrast to tools like ChatGPT, which are quite capable but function mostly as general-purpose tech demos and marketing tools, it’s an example of how a major firm thinks software powered by a large language model, or LLM, will change its existing business and products, some of which you might already use — these features are where the new class of AI tools will face a test of their real utility.
[…] for those who might interact with people who have these jobs — that is, those who can expect to be on the receiving end of this plentiful new content — these features read a bit differently. Are you excited for your co-workers to become way more verbose, turning every tapped-out “Sounds good” into a three-paragraph letter? Are you glad that the sort of semi-customized mass emails you’re used to getting from major brands with marketing departments (or from spammers and phishers) are now within reach for every entity with a Google account? Are you looking forward to wondering if that lovely condolence letter from a long-lost friend was entirely generated by software or if he just smashed the “More Heartfelt” button before sending it?

This, again, is why I have such trouble getting excited about Generative AI.

We are on the cusp of a torrential increase in the volume of authoritative, personalized text online. We’re going to be swimming in it. A whole lot of obnoxious, office-job-tasks are about to become radically easier thanks to ubiquitous AI assistance. And that sounds nice, until you remember that everyone else will have these tools too.

The next few years will be an absolute mess as we work through the chaos. And I’m not convinced that it will actually improve the quality of human thought, or make us more productive in our jobs, or help us solve any of the massive social dilemmas we collectively face.

It’s exasperating. We have built unfathomably large computer networks for the grand purpose and we are using them for… what, exactly?

(my point here being: Herrman’s column is really good. You ought to read it.)

Discussion about this post

Suw Charman-Anderson

Mar 20, 2023

When it comes to content, whether that's academic papers, reviews, short stories, books or any other form of content, the adjustment from scarcity to abundance has been painful for many, even as it brought benefits too. Traditional gatekeepers have been weakened, frequently replaced at least in part by curators. And in many cases, this has been a good thing. That I can locate and read nearly any academic paper is a boon. That books and stories are so easy to find and read is amazing. Less fun are, for eg, all the spam reviews on pretty much every site that hosts them, Amazon being a particularly notable example of a company that hosts spam reviews and doesn't care.

But we are now moving into an era of superabundance and no one is prepared. I've recently been paying a lot of attention to superabundance in the literary industry. For example, the literary magazine Clarkesworld which suffered 500 short story submissions from 1-20 February when it had to close submissions completely. It usually receives 10-25 a month. Most of those 500 were LLM-generated spam. And Clarkesworld isn't the only magazine to have been affected.

It's only too easy to imagine a time when Kindle is flooded with LLM-generated novels and novellas, when Amazon and other reviews are predominately ChatGPT created, when magazines and agents are drowned in LLM submissions. I wrote about this here:

https://wordcounting.substack.com/p/can-publishing-survive-the-oncoming

Of course, if we were talking about a superabundance of quality content, then that would be one thing. Discovery would get harder, but we'd still get something useful (ish) at the end of the process. But we're not. We're talking about a superabundance of LLM-generated trash, whether that's citations or papers or opinion pieces or novellas or books or whatever. The infosphere is going to become horribly polluted. It's like we've just crossed an information event horizon beyond which nothing found on the internet is going to be reliable because all that LLM trash is going to pollute the search engines.

I've really been trying to find a light at the end of this tunnel, but every conversation I've had about it just makes me more concerned. Even if OpenAI creates a digital watermark in its content so that it can be detected, there'll be other LLMs that don't, and soon we won't know what's real from what's been made up. When LLMs get good enough to sound just like humans, and they will, how will be tell humans apart from LLMs? And honestly, that's not a rhetorical question.

Expand full comment

Gerben Wierda

Mar 20, 2023

Great observation, beautiful example. The current generative AI wave certainly brings back memories of the early days of the internet (1990's). At the time I was one of the lone voices against the naiveté of the tech-optimists (in newspaper opinion pieces and one TV debate). But looking back, while it was easy to spot simplistic nonsense, I missed the darker side, like the corrosive side of things, such as mass-manipulation and the problematic effects of the 'attention economy'.

That we will be seeing a tsunami of 'noise masquerading as signal' seems likely. But I wonder what I am missing now.

Either we will not cope and we will culturally drown in it, or we will cope, but if so: how? Coping may for instance mean the internet waning in terms of influence and only sources with strict policies of human-curated content or human curated sources will remain, content/sources you have to pay for. The 'you're not paying for the product so you are the product' might become less and less a workable business model, as what you can consume for 'free' (i.e. in exchange for data about you) will be worth almost nothing. Most open comment sections (like this one) might have to shut shut down as will other open communities. 'Islands' (smaller, closed groups) may become a dominant pattern (again). If the sea of 'noise masquerading as signal' becomes orders of magnitude larger than the noise we already have, trust will be so rare that trust itself becomes valuable (again).

Food for thought and thank you for putting it so clearly, with such a great example, in front of us.

Expand full comment

9 more comments...

No posts

The Future, Now and Then

On Generative AI, phantom citations, and social calluses

The Generative AI age is going to be exhausting and unpleasant, isn't it?

Discussion about this post