[Dall-E2 prompt: oil painting of a cyberpunk dystopia]
This semester, I’ll again be teaching my “History of the Digital Future” class. It’s kind of a guided tour through the WIRED archive. Working on syllabus modifications has left me thinking about how much has changed in the past year, and the risks we face in the years to come.
In January 2022, we were in year five of the techlash. Tech elites were racing to define the Internet’s “next chapter” (Web3 and Metaverse and Artificial General Intelligence, oh my!), at least in part because they had grown tired of the current one. Silicon Valley elites were some the richest people on earth, but they weren’t lauded as hero-inventors-of-the-future anymore. That bummed them out.
I wrote my syllabus to capture and contextualize that moment. Bitcoin was around $50,000, and Facebook/Meta was spending $10 billion to build the Metaverse. Big Tech had gotten less popular, but never less profitable. It was an 18-year tech bubble that had never burst. And, even if the bubble couldn’t go on forever (NFTs? Really?), good luck predicting when it would finally burst.
Today, in January 2023, I think we can say that 2022 was the end of the techlash and the beginning of the tech crash. Every major tech company lost between 25% and 75% of its value in 2022. Somewhere around 120,000 tech workers lost their jobs. Brian Merchant compellingly describes it as “The End of the Silicon Valley Myth.”
So I’m chipping away at my syllabus. A bit less metaverse. A bit less Web3. NFTs are just a cautionary tale now. We’ll spend a lot more time on the history of AI and “Big data,” in order to help make sense of the stories that are being told today about the digital future.
But what I’m really trying to make sense of is what comes after the tech crash. And, in particular, what does it mean that major advances in artificial intelligence are going to be deployed in the midst of the first major crash in two decades?
I’ve got a baaaaaaad feeling about how advances in AI and the tech crash are likely to intersect.
First let’s take a step back, though.
There was an excellent article in ProPublica a couple weeks ago, by Craig Silverman and Ruth Talbot, “Porn, Piracy, and Fraud: What Lurks Inside Google’s Black Box Ad Empire.” Silverman and Talbot are first-rate data journalists. They painstakingly reveal the status quo of Google’s ad network. The company made $31 billion from its ad network last year, and some of that came through creating a business model for truly bad actors.
The article is worth reading in full. There was one passage that sent me off on a weird tangent though, and that’s what I want to focus on here.
Silverman and Talbot write:
“In one example, a Bulgarian company helped scores of piracy sites with close to 1 billion monthly visitors earn money from Google ads. Most alarming, Google knew from its own data that these sites were engaging in mass copyright theft, yet it allowed the sites to receive ads and money from major brands such as Nike and HSBC Bank right up until we contacted Google.
As for what else lurks in the black box, only Google knows.”
This strikes me as not precisely right. There likely is no one individual or set of individuals at Google who possess this answer and stays silent about it. The data is proprietary (and Google doesn’t follow all the same transparency standards that it has promoted in the rest of the industry), so it’s more like only Google is positioned to know. Only Google has the capacity to know. But given the magnitude of their traffic, the company only notices those things that staff have been assigned staff to look for. The managerial work of operating a platform at scale is just immense. Google doesn’t observe everything that happens within the black box of its ad network. It only observes what the company decides to monitor, optimize for, and prevent.
It reminds me of one of the anti-Facebook data points that came out early in the Cambridge Analytica/Russian ads imbroglio circa 2017-2018. Facebook, it turned out, had accepted payment for political advertisements in rubles. That sounds pretty damning. If someone is paying for political ads around the U.S. Presidential election in rubles, it seems more likely than not that they’re a foreign actor trying to influence the domestic election, right?
And sure, yeah, not great! But it’s also less damning than it initially appears. Facebook doesn’t have some employee working the cash register, ringing up political ad sales, who failed to say “uhhhh… are these RUBLES you’re paying with? Just a moment, I think I need to call my manager.” What they have is a huge automated system, with employees who monitoring key metrics. Currency-used-for-purchase is just a column in a massive spreadsheet. And, prior to 2016, it wasn’t even one of the more interesting columns. Facebook did a better job managing electoral misinformation and foreign interference in 2020 than in 2016 because the company devoted significantly more staff and resources to the problem.
Right now, Google has the potential capacity to know how its system is used and abused by copyright pirates, porn sites, and all kinds of fraudsters. But Google doesn’t demonetize all the bad behavior. That would require a larger, more complicated system. It would require more staff and resources than Google has been willing to spend on the problem. So the company triages. It takes down the worst stuff that comes to its attention. Sometimes that comes as the result of activist pressure from folks like Nandini Jammi and Claire Atkin. Sometimes it is a response to data journalism like this ProPublica piece. Sometimes its driven by independent tech research.
Here’s where we pick the trendlines back up and I get concerned: Increased reliance on AI is poised to make this so much worse.
Part of what makes these advances in generative AI seem so magical is that it is a black box that even Google can’t open up. That can work fine if there is a layer of human reviewers empowered to adjudicate and correct errors (Google has been using various types of machine learning for years, after all. This isn’t new.). But it is particularly troublesome if the company ends up deploying these new systems as a cost-cutting measure in the midst of a tech crash.
Generative AI is going to find patterns that no employees are currently tasked with looking for. It will fix some existing problems. But it will also create new problems. And we should worry that Google will cut the staff charged with looking inside the black boxes. Throwing machine learning at complex sociotechnical problems while also cutting staff is in the neighborhood of a worst-case-scenario for AI adoption. We’ll be left with even less ability to know when/if things go wrong.
It is historical happenstance that these breakthroughs in Generative AI are happening against the backdrop of the first tech crash in 20 years. But I worry that it will have significant implications on how the technology is deployed and adopted. AI could be the engine driving us towards a future where the main hubs of information and communication become less transparent, less responsive, less manageable, and more socially harmful.
There’s still time to affect how these systems develop. If you are a researcher, consider joining the Coalition for Independent Tech Research. If you are a trust and safety professional, consider joining the Integrity Institute. If you are an AI researcher, try to get involved with the DAIR Institute. We don’t have to leave the future of the Internet up to the VCs and the big tech companies this time. We can shape the future through policy and collective action too.
2022 was the year of the tech crash and the year that generative AI passed some major public-facing benchmarks. The interaction of those two developments will likely define what comes after the tech crash. If we aren’t vigilant, it might be a step in the wrong direction.
Good stuff, as usual.
I do think, though, that too much is expected in this post from 'generative AI'. Generative AI has serious trouble with producing *meaningful* results. The results are well-structured and they fit the subject, but the results are not trustworthy, nor is there any sign that they will be. Reading OpenAI's article on GPT Few-shot learning (https://arxiv.org/pdf/2005.14165.pdf) for instance, it is clear that — other than producing language that is well-structured and 'fitting' — the results on being *trustworthy* in terms of content/meaning remain very poor. Generative AI seems magical, not because the systems are intelligent, but because we humans not that much. We are easily fooled/misled.
See https://ea.rna.nl/2022/12/12/cicero-and-chatgpt-signs-of-ai-progress/
As someone who regularly relies on data, this rings so true to me: "...it’s more like only Google is positioned to know. Only Google has the capacity to know. But given the magnitude of their traffic, the company only notices those things that staff have been assigned staff to look for. "
Thank you for making that observation. It's not enough to have the data - someone has to actually be looking at and analyzing it. It always comes back to people's time and what they are focused on.