What are we going to do about Generative AI content farms?

Jan 31, 2023

I’ve gotten a bad feeling lately. Like my spider-historical-analogue-sense is tingling, y’know?

It’s probably nothing.

But I’ve been thinking about the digital journalism crisis circa 2009-2010. The Internet was awash in low-quality content, produced for pennies by an army of freelancers employed by content farms. The freelancers accepted the jobs because they were the only jobs to be had. The content farms were scooping basically all the money out of the online advertising ecosystem. It looked really bleak. Hopeless, basically.

In 2010, Google stepped in and rendered the whole situation inert. If your business lives by the algorithm, it also dies by the algorithm. Bad news for the content farmers; good news for the rest of us. Relying on platform monopolists to protect the public interest isn’t a great way to run a civilization, but it’s better than nothing.

Looking at some of the emerging use-cases for ChatGPT, it seems likely to me that we’re about to take a trip back to 2009. All the factors that made the content farms seem like a late-capitalism-financial-inevitability are back in play. But this time, the platform monopolists might not perform the deus ex machina role.

…In other words, I think we’re headed for a big mess.

I’ve been fretting about this since the CNet news a couple weeks ago. Futurism.com broke the story that CNet was using ChatGPT to write dozens of SEO-bait articles, giving them the byline “CNet money staff.” This move was inevitable. Someone was going to put ChatGPT in charge of writing the bargain-basement explainer pieces.

And it went terribly. The articles weren’t just poorly written, they were riddled with errors. But it seemed like a portent of things to come. CNet broke the seal. Someone else will surely try it. (Buzzfeed has since announced that it will use ChatGPT to help generate its personalized content and quizzes.) The errors will become less-glaring. The articles will never be good, but they’ll eventually reach the threshold of being good-enough. And, from that starting point, the trickle will become a flood.

Which brings me to the historical analogue.

Back in 2009, a company called Demand Media looked like a full-blown crisis for the journalism industry. Demand Media, the owner of brands like Cracked.com and eHow.com, was the biggest “content farm” on the web. Daniel Roth wrote about the company for WIRED, describing it as “the fast, disposable, and profitable as hell media model.” New York Times media critic David Carr (R.I.P.) wrote blistering critiques of the company. Jay Rosen also weighed in with this barnburner, where he interviews the “demonic” company’s CEO.

Demand Media’s model was awful simple. The company tracked the supply and demand of Google search traffic. It identified arbitrage opportunities — search terms associated with high-value ads that were undersaturated with timely, well-tailored results. It then generated writing and video assignments to an army of freelancers who were paid $15-20 per piece. It also paid freelancers for copyediting ($2.50/article), fact-checking ($1/article), quality control ($0.25-0.50/video) and transcription ($1-2 video).

2009 was the pinnacle of the internet’s SEO (search engine optimization) era—Mobile and social sharing would come later. The 2000s were Google’s decade of singular dominance of the digital advertising ecosystem, whereas the 2010s carved out competing fiefdoms.

Byron Reese, Demand Media’s Chief Innovation Officer, told Daniel Roth that the ideal Demand Media story was "Where can I donate a car in Dallas?" Online car ads provide high margins, Dallas is a huge, sprawling metropolis, and few competitors already provide answers to that question. Demand Media didn’t need to provide good answers to these queries. It just had to provide relevant answers that match Google's search algorithms and generate ad revenue.

Think of this as lateral competition to the journalism industry. Demand Media wasn’t in the news business. The company was in the basic-question-answering business. But the trouble—especially back in 2009-2010, when journalism was in an extended crisis and the country was struggling to recover from the economic crash—was that real news organizations relied on that same pool of online advertising revenues. Content farms like Demand Media threatened to scoop out all the money.

Demand Media went public in 2011 at a valuation of $1.5 billion. This was back when the New York Times was on the brink of bankruptcy. It looked bleak. We were staring at a future where low-quality search results swamped the web because it was the only profitable game in town. (David Carr offered the company’s article on “How to Throw a Super Bowl Party,” as an example: “Buy several six-packs of beer. Keep the beer in a cooler close by so you don’t have to run to the fridge when it’s third and inches. Restock the cooler at halftime.”)

But then Google intervened. Google’s users found content farms annoying. And Google was in the keep-users-happy-with-our-search-results business. Demand Media, and its ilk had become bad for Google’s business. So Google pushed an update to its search algorithm, algorithmically penalizing content farms. The arbitrage opportunity vanished. (Tough lesson for companies like Demand Media: You live by the algorithm, you die by the algorithm.)

We shouldn’t exactly get misty-eyed about this era of digital news. It’s not great that the entire journalism industry is downstream of the online ad industry, which has just one or two dominant players and is still barely regulated. But the resolution of the content farming crisis has always stood out to me as a reminder that there are some minimal guardrails provided by the semi-benevolent tech monopolies. Google wants to keep its users happy-enough with their search results. Google pays attention to that sort of thing. If your business degrades search results, and your business gets big enough, Google is going to put a finger on the scale that it built and calibrated.

Right???

I fed David Carr’s old Super Bowl example into ChatGPT. Here’s what it gave me:

[Prompt: Write an article explaining how to throw a Super Bowl party.]
Throwing a Super Bowl party can be a fun and exciting way to celebrate the biggest event in American sports. Here are some tips for hosting a successful Super Bowl party:
Plan your guest list: Decide who you want to invite and how many people you can comfortably accommodate in your space. Make sure to include friends and family who are big football fans.
Decorate your space: Decorate your living room or wherever you'll be hosting the party with football-themed decorations such as balloons, banners, and streamers in the team's colors.
Prepare the food: Make sure to have plenty of snacks and drinks on hand. Consider serving popular Super Bowl foods such as wings, chili, sliders, and dips. Don't forget about drinks - have soda, beer, and other beverages readily available.
Set up a TV area: Make sure you have a large enough TV for everyone to see and have good seating arrangements for all your guests.
Have some games and activities: Plan some games and activities for halftime and commercial breaks. You can have a trivia contest, a pool for picking the winner, or play video games.
Be prepared for last-minute changes: The Super Bowl is a long event, and things can change quickly. Have backup plans in case the power goes out, the TV stops working, or the food runs out.
Clean up: Clean up the area after the party, making sure to dispose of any waste and to put away any decorations.
By following these tips, you can ensure that your Super Bowl party is a memorable event that your guests will enjoy. Just remember to keep things simple and have fun!

This isn’t great advice. It is, I would argue, precisely average advice. ChatGPT, after all, is designed to be a cliche-generator.

But the old Demand Media article circa 2010 was below-average advice. And ChatGPT, at least for now, is free. OpenAI has announced plans for a $42/month “pro” version. I won’t pay that kind of money for deluxe access to text-and-image generation. It’s an absolute steal for a next-generation Demand Media-type company, though.

You can easily imagine how it would work. Monitor all the same advertising marketplaces. Lose the army of freelancers entirely. No one to pay. No messy freelance assignments to schedule. Just train a generative AI to produce basic answers to every search query where the expected ad revenues exceed the compute cost of the AI program. CNet has already taken the first step down that path.

But that’s not what has me alarmed. What worries me is the reporting that Google has declared a “code red” over ChatGPT, and is rushing the release of its own generative AI products.

What happens if journalism’s next-gen lateral competition is coming from companies that are also Google subsidiaries? Just as Amazon is both in the “everything-store” business and in the cloud-computing business, Google is heading towards being both in the keep-users-happy-with-search-results business and in the whatever revenue models emerge for AI business.

Will the guard rails from a decade ago still be operational? Will Google’s enlightened self-interest still lean in a direction that happens to benefit journalism and the public good?

There’s a simple-sounding answer here: stop relying on Google to play the role of regulator-of-last-resort! The platform monopolies have played a quasi-regulatory role for over a decade, filling the vacancy left by our dwindling regulatory state.

(This only applies in the United States. The EU has been much more proactive in crafting regulations. It has created huge headaches for the platforms, which is ultimately probably a good thing. Companies this large shouldn’t enjoy so few frictions.)

But that’s easier said than done. We’re headed in that direction already, but at a necessarily plodding pace. The Biden Administration has taken steps to rebuild administrative capacity. The people Biden has appointed (Lina Khan, Tim Wu, etc) are basically a dream team. But building state capacity and enforcing new regulatory regimes takes a lot of time. And it is much easier for tech platforms to push an update to their ranking algorithms than it is for a government to develop and enforce an entirely new regulatory regime.

So, there it is. That’s the bad feeling. I think we’re soon going to see the deployment of generative AI for next-gen content farming. I think it’s going to be a mess. And I don’t know if Google executives will still see a clear business interest in intervening they way they did last time.

This time won’t be exactly like last time. News organizations aren’t quite so reliant on Google search and ad revenues as they once were. The country isn’t in nearly the same dire economic straits as it was in 2009. The platforms no longer have a blank reputational check from users, regulators, and legislators. And the potential revenues from generative AI might not be enough to dissuade Google from protecting the search business at all cost.

It might be nothing.

But, seeing the early content farming use-cases emerge, it feels to me like we’re on a poorly-lit path, navigating without a map. I don’t know where it all leads. I think I can make out a few indistinct shapes in the distance, though. And they’re leaving me muttering "hic sunt dracones.”

Jacob Kramer-Duffield

Yeah it's not great.

But/also: even though this isn't the exact areas where Google is being hit with antitrust actions, I think there's a pretty substantial governor on future both-sides-of-the-transaction work (i.e., I'm much more bullish on there being regulatory warning shots, consent decrees, etc. if Google rolls out clearly anti-competitive products).

THAT SAID - Google *itself* is already a problem, with the authority and reliability of its results badly compromised by years of optimization to its own ad sales model. I think 2023 is a real pivot point for Google - it's not going away but there's a pretty substantial chance it's going to be both a) a less profitable company, with one or multiple ad divisions spun off, and b) under more genuine competition for search results - not just via OpenAI or similar, but via DuckDuckGo and potentially even a revival of expert-indexed directories (yes, I think OG Yahoo actually was on to something).

A really interesting question is whether Google can use the opportunity to, for the first time in a long time, actually make its search results *better* - I'm not optimistic but I am curious.

Expand full comment

1 reply by Dave Karpf

Stuart McDonald

“Demand Media didn’t need to provide good answers to these queries. It just had to provide relevant answers that match Google's search algorithms and generate ad revenue.”

Uh huh, and we all know how great that “content” was.

So now, the stories are what, a buck a pop, and, as with Demand, material lifted without attribution, so what could go wrong?!

The big diff this time around is instead of slapping it down (eventually) Google wants an in.

1 more comment...

The Future, Now and Then

Discussion about this post