On "Trusting the Data" and the specter of AI-induced decision paralysis

A cautionary tale from the age of Big Data.

Oct 18, 2024

AI Snake Oil: What AI Can Do, What It Can’t, and How to Tell the Difference, by Arvind Narayanan and Sayash Kapoor is one of the best books I’ve read this year. It’s thorough and thoughtful, cutting through the haze of AI futurism to discuss where these technologies originate from, what they’re good for, and where the marketing hype and starry-eyed ambitions depart from reality. Narayanan and Kapoor are computer scientists, and they bring a level of technical expertise to this area that most critics (like me) cannot hope to match. And/but, they are the type of computer scientists who realize you also need social scientists and policy thinkers in the room. That’s a refreshing change of pace.

That being said, today’s post isn’t going to be a full book review of AI Snake Oil. (Rob Nelson has already has that covered.)

Instead, what I’d like to do here is a bit of thinking-out-loud, exploring a tangent that their book left me thinking about. Specifically, I want to take a shot at articulating a throughline between the current uses of AI and the “big data” enthusiasm of ~10 years ago. The past is, I think, meaningful prologue, and it features a cautionary warning about what AI boosters are hoping to automate.

In the opening pages of the book, Narayanan and Kapoor note that “AI” is an umbrella term, a branding mechanism. We use it to refer to a diverse set of machine learning tools, many of which do fundamentally different things.

This is marvelously useful for marketing and hype-building — DeepMind’s Demis Hassabis just won the Nobel Prize in Chemistry for the contribution his AI research has made to the study of chemical compounds. If one type of AI research is already winning chemistry Nobels, then surely ChatGPT is on the path to solving all of physics, right?

It’s also frustrating and confusing.

Here’s how they put it:

Imagine an alternate universe in which people don’t have words for different forms of transportation — only the collective noun “vehicle.” they use that word to refer to cars, buses, bikes, spacecraft, and all other ways of getting from place A to place B. Conversations in this world are confusing. There are furious debates about whether or not vehicles are environmentally friendly, even though no one realizes that one side of the debate is talking about bikes and the other side is talking about trucks. There is a breakthrough in rocketry, but the media focuses on how vehicles have gotten faster — so people call their car dealer (oops, vehicle dealer) to ask when faster models will be available. Meanwhile, fraudsters have capitalized on the fact that consumers don’t know what to believe when it comes to vehicle technology, so scams are rampant in the vehicle sector.
Now replace the word “vehicle” with artificial intelligence,” and we have a pretty good description of the world we live in.

They then identify three different types of AI. There’s “predictive AI,” which receives two chapters documenting why it does not and cannot work as promised. Then there are two other types — generative AI and AI for content moderation — that fall outside the point I want to make here.

Here’s the relevant point: Predictive AI is just a rebrand of the Big Data hype bubble from 10-15 years ago. We have learned in great detail about all sorts of problems and limitations — the data is shit, it cannot be made not-shit under prevailing institutional conditions, prediction rates are too low to be reliable without a human-in-the-loop, etc etc. A whole critical literature developed in response to the Big Data marketing machine — Cathy O’Neil’s Weapons of Math Destruction, Safiya Noble’s Algorithms of Oppression, Virginia Eubanks’s Automating Inequality , Frank Pasquale’s The Black Box Society, and many others. The main thing for us to understand is that, when Sam Altman et al make grand pronouncements about the predictive potency of not-yet-existing AI, they are just slapping an upgraded label on the same old snake oil that we spent years slowly getting wise to.

I contributed ever-so-slightly to that critical literature. My 2016 book, Analytic Activism: Digital Listening and the New Political Strategy, explored how netroots political advocacy organizations used data analytics and experimentation to develop new tactics and strategies. I never used the words “Artificial Intelligence” in the book, because that wasn’t what people were calling it back then. But it was a whole book about the promises and pitfalls of analytics for activism.

The book suffered from awful timing. It was published in December 2016, just in time for Donald Trump to fundamentally reorient how practitioners and public thinkers approached data and politics. I had imagined, when writing the book, that it would be published in the aftermath of a year+ of public discourse about how “big data” was changing electoral campaigns, just in time to tee up questions about how these same tools could be used for politics outside of elections. I think it would’ve been a very timely book for that alternate universe. It’s still a good book, I think, but it doesn’t directly answer the questions people were primed to ask in early 2017.

Ultimately, the message of the book for political leaders was this: Data and testing are useful tools. You should use them. But you have to be intentional about what you optimize for.

Digital listening and data analytics can be quite valuable tools for advocacy and activist groups. The are not perfect, but they are a substantial improvement on the status quo. I would rather have organizations paying attention to their email response rates and action rates, running small experiments to see what works, than see organizations ignore these signals of supporter opinion. As one interviewee put it to me: “If you’re not looking at your data, then you’re not listening to your members. And that probably makes you kind of an asshole.”

But the pressing danger is that, instead of using digital listening to make better decisions, political organizations might use data to avoid making any decisions at all. “We trust the data” turns out to be a huge red flag. What it frequently means is that an organization is operating on autopilot, choosing whatever tactics makes the graphs go up-and-to-the-left, instead of crafting tactics that help build real power.

(If I was writing a second edition of the book today, I would gush at length about Dan Davies’s “Accountability Sinks,” and how analytics-without-intentionality-or-leadership can prevent an organization from learning anything much at all.)

So that was my big, somewhat-counterintuitive take, back when the words of the day were “Big Data” and “Analytics” rather than “Artificial Intelligence.” These can be useful tools, so long as you still do the work to develop a clear theory of change and deploy them to create useful feedback loops. But they aren’t magic, and if you substitute data for strategy, you’ll be irrelevant at best and an active menace at worst. I spent much of 2017 saying that to anyone who would listen, and then kind of decided it was time for a totally new research project based on reading way too many old magazines.

Ted Chiang wrote a piece for the New Yorker last month, titled “Why A.I. Isn’t Going to Make Art.” The heart of his argument is that art is, essentially, about making decisions. And when you prompt an A.I. to make those decisions for you, what it produces might be recognizable as text or image or video or sound, but it cannot be art.

I’m going to sidestep commenting on whether art is fundamentally about making decisions. I found his argument compelling, but I don’t really consider myself qualified to opine on the matter. I am a mid-tier social scientist with a blog. I have not spent nearly enough time thinking hard about what is art. That simply isn’t my wheelhouse.

But I do know about strategy. Strategy is about making choices amidst constraints. It is, to quote Marshall Ganz, how we turn “what we have into what we need to get what we want.”

The very first lesson I teach my students every semester is that you cannot know if an action was strategic unless you know its goal. I teach this same lesson when I talk to advocacy leaders. And it always prompts vigorous nods. Absence of clarity about goals is a frustrating part of their daily professional lives.

I could, and perhaps someday will, wax poetic about the reason for this state of affairs. Suffice to say that the path to meaningful social/political change is never well lit. All the easy victories have already been won. We are, all of us, mostly guessing in the dark, trying to learn from history, adapt to the future, and be present in the moment. The goals remain murky because know one actually knows what will work.

The promise I saw in 2016 was that digital listening could help advocates and activists find their way through the maze. The danger was that they would throw up their hands, cede control to “whatever the data says,” and be left worse off.

And that danger is even greater today. That was the “ah hah” moment for me, reading AI Snake Oil. In the course of rebranding and recycling the old Big Data hype bubble, combining it with the excitement over Open AI’s chatbots and image generators, we are now being told quite explicitly that the LLMs are (or soon will be) smarter than you. Dario Amodei describes AI agents as “geniuses in a datacenter,” infinitely replicable intelligence that can accomplish astounding things.

That’s what alarms me most about the current AI moment. I’m not worried about skynet. I’m not worried about deepfakes. I’m not immediately worried about the tidal wave of disinformation (except for its corrosive impact on elite norms and the myth of the attentive public).

The promise from the AI engineers and entrepreneurs is that this will not be a tool for scientists and civil servants and journalists and activists — it will BE the scientist and the civil servant and the journalist. (Henry Farrell and Benjamin Riley both have excellent rejoinders to Amodei’s latest essay, btw) You can stop being intentional about how you listen to the data, how you use it. You don’t have to make hard strategic choices anymore. Just trust the augments. The AI might be a black box, but it has a genius inside.

That’s lousy advice, if history is any guide. It was self-evidently lousy in 2016. It’s still lousy today, but it is buried in such a haze of techno-optimist marketing copy that we are losing the capacity to tell.

What worries me about the AI bubble in a way that I didn't particularly worry about the big data bubble is that we are hailing these technologies as though they possess agency, acting as though they can make choices for us.

I spent 2017 warning that was a bad habit. Now I want to shout it at the top of my goddamn lungs.

Gabe

Senate Union Tracker

Oct 19·edited Oct 19Liked by Dave Karpf

As Dave knows, I invited him to speak to Congressional digital directors based on his Analytics Activism book, it was the first book I read that I felt understood how data could fit into an organization like Congress that wasn't entirely concerned with profit.

I think he is underselling how useful it was after 2016. I found it plenty useful and I know a lot of the digital directors appreciated the talk.

The greater concern was exactly the problem he is describing now: data is both not understood and when it was used it was used in a way to try to remove decision makers from hard choices. Also the lack of goals that congressional offices had for themselves contributed to massive confusion about what analytics digital directors are supposed to be collecting.

At the time reading it, I was struck by how much time was dedicated to talking about Google's ranking of web pages and the race that ensued to game the pagerank system. This was before my time doing digital work (we were mostly focused on social media platforms and email) and it reminded me that very little of the dynamics have changed in what is supposed to be a cutting edge space.

Big companies control information spaces on their own terms. You can try to squeeze out the utility that is given either intentionally or accidentally but the problem isn't fundamentally a tech problem, it's a power problem.

Expand full comment

Rob Nelson

𝐀𝐈 𝐋𝐨𝐠

Oct 19Liked by Dave Karpf

Thinking of predictive AI as "just a rebrand of the Big Data hype bubble from 10-15 years ago" seems right, and the habit of substituting "data for strategy" nicely sums up the real risks of many forms of AI, including the kind that generates cultural artifacts.

They don't use the term in the book, but last year the Snake Oil guys put out a paper with a few others on what they call "predictive optimization," which I think helps frame this danger. Here is a link to post about it: https://www.aisnakeoil.com/p/ai-cannot-predict-the-future-but?utm_source=publication-search

Deflating AI hype requires pointing out all the ways AI doesn't work, but it also requires pointing out how truly terrible using it can be when it works as designed to make some types of management decisions. Davies is a great example of how to think about this problem in terms of organizations, as is The Ordinal Society. Analytical Activism is now on my list.

Thanks mentioning my review essay!

7 more comments...

The Future, Now and Then

Discussion about this post