31 Comments
User's avatar
Joseph Meehan's avatar

As a software developer who generally thinks vibecoding still isn't worth it, the real test will be can it transform *Scientist* code. Scientists need to code a lot, are generally bad at it, but also care a *lot* about precision and accuracy in a way vibecoding works against. A model that overall improves the quality of scientist code can actually do a lot of good.

Geoff Anderson's avatar

I just snorted diet Coke. I studied Physics, and in that time (firmly in the 1980's and 1990s) I wrote code to do things I needed.

It was ugly. It was functional. It had no error checking. I would spend hours to groom the data so that I got a result, and I didn't care that if I had even one flier, it would break hard.

So, I agree. "Scientist" code is a good meter stick.

Ralph Haygood's avatar

I eagerly (?) await the article retractions due to egregious blunders in "vibe-coded" software. (I've made quite a lot of software for scientific purposes.)

Kaleberg's avatar

Having dealt with the dark art of numerical analysis while trying to maximize precision, I have my doubts about how well vibecoding will deal with it. I tend to hire consultants who can tell me when A implies B and when the exact same A doesn't imply the exact same B.

Tom M's avatar

My experience is pretty optimistic on this front. I’m an economist, and I my code is trash, laboriously produced. I write just enough just often enough to be bad at it. After playing with Claude code the past couple weeks, I find that it produces code that’s less ugly in less time.

Of course, I need to vet the code and the results it produces carefully because Claude will occasionally make weird and dumb mistakes, but I’d have to do that anyway because I also make dumb mistakes. It’s especially helpful for things like data cleaning and graphics production where the correctness of the results is pretty easy to assess.

It’s not a revolutionary advance, but it does make me better and faster at one of the more tedious parts of my job.

Ben P's avatar

This is a great idea! I'm an AI-skeptic statistician, the only language I'm semi-competent in is R, and I like making interactive apps using Shiny. If an LLM can take one of my apps and make it run faster without screwing anything up (or if the screw-ups are easy enough for me to catch and fix), I'd use it. Or, if I can give it code for an app in Shiny and have it recreate the app in JavaScript or some other webapp friendly and more efficient language, that would be cool.

I tried the whole "describe the app you want and have the LLM make it for you" thing a year ago and it was hopeless. I know the new ones are better so maybe I should try again, but I'd be surprised if it worked. Usually my motivation for making an app is that I can't find an already existing one that does what I want, and of course the LLM is biased toward creating slight variants of whatever it's seen the most of in its training.

For real data analysis work, I've occassionally used an LLM if there's a conceptually simple, well-defined task I need accomplished that I don't know how to do off the top of my head and a web search turns up no obvious and easy to implement solutions. But for the most part I would not want an LLM writing my analysis code, because that kind of coding is a fundamental part of thinking about and understanding the problem itself. I can think about a problem abstractly without thinking about code, but once I start working, I'm at a level of granularity where coding decisions *are* data analysis decisions. And I'm not handing those over to a probabilistic next-token generator, even one that writes real pretty.

Evan G's avatar

Agentic coding is best for tasks whose achievement can be verified with eyeballing (otherwise, it requires trust which may or may not be warranted--often, it is, it seems to me). I'd highly encourage you to try out Claude Code (or another coding agent of your choice) for developing shiny apps, which have a verifiable product. Data analysis is an interesting use case, since its output is not a product, but rather conclusions/insight whose desideratum is correctness, which cannot be verified by eyeballing the way a product's usefulness can.

Kaleberg's avatar

I can think of some obvious apps to vibecode. How about one that takes a list of my friends, looks up what they've posted on Facebook for the last 30 days and gives me a reverse chronological list of their posts? How about one that does a Google search and eliminates the advertisements, the web sites that are full of fatuous, probably AI generated, textual padding and shows me the remaining hits? How about one that gives me the list of Netflix offerings in a particular genre showing the teaser panel and the full descriptive text for each? How about one for Amazon that lets me search for vendors but ignore results offering products that I didn't search for and places close textual matches, like those matching the name of a book and its author, at the top?

I'm sure there's a felony contempt of business model here to prevent this kind of app from getting written, but AI might make it easier to disenshittificate the internet.

Philip Koop's avatar

So I write software for a living. My employer is very keen on "AI" and I have over a dozen LLM-models integrated in my development environment; they can be run in both "chat" and "agentic" modes. We get regular training seminars on their use.

Recently, I've seen a sea-change in the seminars. "Vibe-coding" is now discouraged, and we are to aim for "spec-based coding". This involves analyzing a requirement, specifying a solution, and breaking that solution down into tasks to be executed. Then the agents can do the tasks, although we are admonished that we always have to check the work.

The thing is that although I started my career as a systems programmer (working on the operating system of a computer that supports actually useful applications), I have long since become an "application developer". Analyzing a requirement and figuring out how to solve it is maybe 90% of what I do. Once I know what needs to be done, coding it up isn't hard; that's the air that I breathe and the water that I swim in. We are also encouraged to use various special keywords (analogous to HTML tags) to help the models along. You see the point? "Agent-based coding" is becoming more like "coding". It looks more like the latest generation of high-level programming language than like a self-licking ice cream cone.

The promise (or threat) is that AI will soon be able to do that 90% bit of my job. So far that is empirically not true, and I am not clear of the theoretical basis on which these hopes rest. The Princeton AI researcher Tom Griffiths (I recommend his book "The Laws of Thought") notes that the thing about current LLMs is that they need a *lot* of training data, compared to a human. One example is that a human infant can learn a language in about 5 years but a LLM needs the equivalent of 5,000 to 50,000 years of training data to do the same. He is working on an idea: if you think of LLM training as a Bayesian process, you ought to be able to speed it up by picking good initial weights - the equivalent of an "informed prior". This makes sense for human language: maybe you could train LLMs on a couple dozen languages, then pick out some commonality in the trained model weights to start with on the next language. But I am doubtful that you could apply that idea in my field, because there isn't an obvious common element that you could isolate and use to "pre-train" your models.

Ralph Haygood's avatar

"... the demand for skilled coders is so great, and the market for coding jobs is so promising, that we are heading for a future where more and more people make their careers out of learning to code.": Even in 2019, that was a silly notion, reflecting a basic misunderstanding of software development (which I've been doing, off and on, for 45 years in a variety of contexts, both academic and commercial). Because this misunderstanding keeps cropping up like the proverbial bad penny, I hereby announce:

Haygood's Fundamental Theorem of Software Development: The most challenging and time-consuming aspect of software development is design and specification, meaning deciding exactly what the software should do, under every condition that may arise in practice. If done well, design and specification routinely take at least twice as long as coding.

Supposing that developing software for any serious purpose consists mostly of writing code is akin to supposing that composing poetry consists mostly of writing words.

For example, I could paste into this comment the 10,544 characters (plain text) specifying how self-serve password resetting (when a user has forgotten their password) works in a web application I'm currently developing. That process needs to be designed very carefully and specified very precisely, because it's a potential point of attack on the application, and it's something a sizable proportion of the users will need to use at some point.

To be sure, there's a great deal of poorly designed software in the world. (E.g., although most web applications with user accounts have self-serve password-resetting processes, many of those processes are clumsy, confusing, and/or insecure.) Almost every day, I'm forced to (try to) use poorly designed software. I believe this situation is largely due to members of the boss class who don't understand software development (partly because they've never done any themselves) and/or don't care that their software is lousy. Often, lousy software is foisted on us by big companies or public agencies that enjoy at least de facto monopoly positions. "Yeah, our software sucks, but what're you gonna do about it, huh? Take your business somewhere else? [Howls of laughter]"

Because those bosses also hate needing employees, particularly expensive ones like software developers, they're now enthusiastically embracing klarna koding*, and so I expect software to get even lousier.

"She didn’t want to vibecode the Clueless closet-organizer app. She just wishes she could download something like that.": Indeed. Most people don't want to develop software, not even with klarna koding, much like they don't want to run servers or even have to be aware there are servers**. It's comical how oblivious "techie" types often are to these "normie" preferences.

*Like Klarna, "vibe coding" is buy now, pay later, in that if you're using it much, technical debt is probably accumulating in your codebase.

**https://moxie.org/2022/01/07/web3-first-impressions.html

Ralph Haygood's avatar

By the way, I never imagined Linux would take over the world of personal computing, for reasons that were obvious even in the 90s. To begin with, most people don't want a "free tank"; they just want a car that's comfortable and convenient. In mass-market products, ease-of-use almost always trumps versatility. Moreover, there was and still is, I think, a culture gap between most people who get excited about Linux and the "normies" who buy most personal computers. Inevitably, the development of "Linux on the desktop" has mostly been done by and hence has mostly catered to the needs and tastes of the former, not the latter. (I personally am the former; I'm writing this comment in Firefox on a Debian VM under Qubes. However, I understand the latter; I'm a freelance software developer, and they tend to be my clients.)

Rachel Jacobs's avatar

The tank metaphor for Linux feels perfect because I don't really want a tank. They have power, but what on earth do I need with all that power!

And I think you are being far too kind to AI, at least the sort we have seen thus far. Professionally, I'm a writer and editor, and I am anything but worried about AI replacing me. It does a mediocre job at best. So I can believe AI can do casual vibecoding just fine, but I'm guessing they're trying to make a market for that _because_ they can't actually make in the big leagues yet.

Alex Remington's avatar

I think Clueless has transcended its era in an interesting way - a way that many movies from the '90s did not, including movies I adore just as much such as Romy and Michelle's High School Reunion, for example.

When my friend and I did a LearnedLeague MiniLeague on 1990s Romantic Comedies, he immediately suggested we make an entire day's worth of questions about Clueless, and I immediately agreed. It's just become a timeless, universal classic, like Groundhog Day and When Harry Met Sally.

Hey, the youngs aren't wrong! It's a great movie.

Kaleberg's avatar

Kudos to Jane Austen.

Rob Nelson's avatar

"....didn’t follow the expected path" feels like line that applies to just about every revolution, technological or otherwise. What's weird is how little that fact registers among those who enjoy prognostication.

And Clueless! My thirteen year old daughter and I are watching it tonight. At her suggestion. Baffling indeed, though I am thrilled to see it again.

NickS (WA)'s avatar

I'm in essentially the same position -- I'm curious about some of the AI tools but haven't started using them yet. One of the pieces which best captured, for me, the process of thinking about was this by Jasmine Sun (who ended up having a positive vibe coding experience): https://jasmi.news/p/claude-code

---------------------- block quote ----------------

If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are.

...

Well, I get it. I am embarrassingly nontechnical and scared of CSS, but spent every day last week talking to Claude Code more than my friends. It is an incredible technology that has made me more AGI-pilled than ever, while also being a net decrease on my work productivity. This is my attempt to reckon with both.

...

Eventually I came up with a first task: I needed to stitch together three PDFs for a grant application. The online discourse made Claude Code sound exceptionally easy—like it requires no technical skill, can one-shot complex apps, and never ships a bug. But for the truly uninitiated, I don’t think this is true.

Here’s what using Claude Code initially felt like: cooking with ingredients from a stranger’s fridge, the blankness of a page before you start writing, solo traveling in a country where you don’t speak the language. It’s hard to know what to build, hard to know how to start, and sometimes stuff doesn’t work the way you expect. As with all these analogues, you will eventually enter a flow state. But it took my fair share of false starts to get there, and I (the human) was very much in the loop.

...

A few days later, a friend sent me a voice memo instead of a text, and a collaborator asked me for feedback on a plan shared via YouTube. Unfortunately I am a psycho who refuses to listen instead of reading. So I had Gemini convert both files to text and sent off my replies.

Oh, I noticed. I do this over and over. Copying and pasting, uploading and downloading, turning audio and video into text for me to read. Maybe *this* problem is software-shaped. ...

Albrecht Zimmermann's avatar

Fascinating example in the context of the linux discussion since: "stitch together three PDFs" since on linux there's already an app for that. Call it on the command line, give it the right parameters and you have a merged .pdf, no coding (let alone Claude Code) required. :)

NickS (WA)'s avatar

To be fair to her, "stitch together three PDFs" is given as an example of a poorly chosen vibe-coding project. But, you're right that there is the potential for vibe-coding to push people towards large amounts of duplicated effort,

Albrecht Zimmermann's avatar

I've to admit that it wasn't so much the duplication that was on my mind.

At first, I just found it funny that the chosen example mentioned a linux capability but since you forced me to think a bit more about that :) : contrary to the post, people actually do want linux, even though they don't it.

Linux has all the applications that Windows has and because one can easily work on the command line, one can do things that I'd struggle to reproduce in Windows. But because of misconceptions about linux (such as the believe that one needs to be able to code), they'd rather vibe-code. And the conspiracy theorist in my thinks that this is a concerted effort because staying in the Windows ecosystem and using Claude Code makes more people money than installing linux.

Kaleberg's avatar

Apple started doing this three or four years ago for voice messages on the iPhone. They don't always provide a complete transcription. Now and then they lapse into dot dot dot, but the big words, dates and numbers tend to come through. I saw this prototyped in the mid-1970s and thought it was a great idea. It only took 40 or 50 years, but it's coming.

Cheez Whiz's avatar

If you learned to operate a computer using a Graphical User Interface you were deliberately shielded from learning how a computer works. What you learned was how to manipulate the GUI to do things on a computer. For a long, long time any UNIX GUI was crude and complex compared to the more polished Mac and then Windows. And there was the application barrier. UNIX gave you tools, not apps, with none of the compatability handholding Apple and Microsoft offered by simply being much bigger markets. If you chose to write an application, you had to roll up your sleeves and learn UNIX, a text editor, a language and its libraries, and how to compile it all. Or just buy one for Windows or MacOS.

The promise of vibecoding leans into this model. I assume AI also compiles and dumps an executable on your computer, tablet, or phone. Or do vibecoders do that? I find that really hard to believe. If they view the source Claude generates, how much do they understand? Do they pick the language to write in or does Claude? What do they do when (not if) the app breaks or does something wrong? Who you gonna call? The promise is very seductive, just as the promise that you'll never see a command line or worry about "allocating" memory for your app ever again was. But writing code is a horse of a different color. There are way too many things outside the control of Claude that can break that promise. The never-ending war of obsolesence with hardware and OS means Claude gives you an app compiled to run on that device on that version of OS, and maybe elsewhere. I guess the vision is you save your prompt and ask Claude for a new app when the old one stops working.

Albrecht Zimmermann's avatar

This! Reading the text and the replies made me understand that vibe coding is an offer for the people whose main computing device is a smartphone.

Parrish Ticer's avatar

Thanks for writing such an on point; at least for me, article. It helped me frame my thinking about the deluge of AI information that is washing over my feeds. I don’t want to code or vibe code either. I feel your pain on being stuck in Apple’s walled garden too. Keep up the good work, it’s appreciated. Cheers!

Geordie Korper's avatar

I remember enjoying “in the beginning there was the command line” back in the day but as a long time NeXT user (somehow I felt OK with typing the book title in all lowercase but am compelled to intercap that brand) I felt it had some flaws. That same year Mac OS got its Unix/Linux command line and 17 years later Windows did too.

The more interesting thing to me is that I almost never truly operate in the command line any more. I run as much or more Unix commands but it is almost all within Visual Studio Code where I am dragging and dropping file references into it. Or in a lot of cases asking Claude to build me a droplet. Take the example someone else brought up of combining 3 PDFs as a good use case for the command line because Linux has the ability to that (or more accurately such functionality is easily installed). I’d almost certainly ask Claude to code me up an app that uses that backend to combine the PDFs drag and dropped on to it and open the result in Preview. Although now that I think about it, the Preview app allows drag and drop of pages between PDFs so I would actually just do that. Although turning that on its head, Preview can also convert PDFs to a folder of images and I built a droplet to do that because it is a task I need to do somewhat often and I wanted control over the size versus quality. I guess what makes me different than the average person is I’m constantly looking for a better way to do something and therefore I find opportunities to code up things all the time.

Robert's avatar

Clueless was made into a musical in 2018 or so. My kid's English theatre school in France is doing it this spring so I assume it's popular in the US.

Kevin Munger's avatar

https://journals.sagepub.com/doi/10.1177/2056305119849492

I think you were right the first time!

Andrew's avatar

As a professional engineer, I wanted to try the vibe coding thing a few days ago. I had this idea that the “sales” at my local Big Box Store weren’t really sales and that it’s just a trick when prices don’t actually move.

So I used an agent to search through historical prices and create a report for me. I now have a github repository that runs this weekly. I’ll let you know if it finds anything.

Simple app. Simple idea. Maybe useful. Took me about an hour. Doing it without an agent would have been 3 hours maybe (well, I would t have done it at all).

Seth Finkelstein's avatar

Linux is more like those old VW Beetle's, which were big for people who wanted do their own car repairs. They had their niche.