Broadsheet

GDP projections of Federal Reserve Governors and Reserve Bank presidents, Change in Real GDP¹
Projection Date	2025	2026	2027
Sept 2025	1.4 to 1.7	1.7 to 2.1	1.8 to 2.0
Jun 2025	1.2 to 1.5	1.5 to 1.8	1.7 to 2.0

Unemployment projections of Federal Reserve Governors and Reserve Bank presidents, Unemployment Rate²
Projection Date	2025	2026	2027
Sept 2025	4.4 to 4.5	4.4 to 4.5	4.2 to 4.4
Jun 2025	4.4 to 4.5	4.3 to 4.6	4.2 to 4.6

Inflation projections of Federal Reserve Governors and Reserve Bank presidents, PCE Inflation¹
Projection Date	2025	2026	2027
Sept 2025	2.9 to 3.0	2.4-2.7	2.0 to 2.2
Jun 2025	2.8 to 3.2	2.3-2.6	2.0 to 2.2

Core Inflation projections of Federal Reserve Governors and Reserve Bank presidents, Core Inflation¹
Projection Date	2025	2026	2027
Sept 2025	3.0 to 3.2	2.5-2.7	2.0 to 2.2
Jun 2025	2.9 to 3.4	2.3-2.6	2.0 to 2.2

As I say at the beginning of this interview, it’s annoying for economic analysts that two huge things are happening at the same time: a radical change in U.S. trade policy and a giant AI boom. Worse, while I think I know something about tariffs, the more I think about AI the less I believe I understand. So I talked to Paul Kedrosky, investor, tech expert and research fellow at MIT, for some enlightenment. Lots in here that I found startling.

Transcript follows.

. . .

TRANSCRIPT:
Paul Krugman in Conversation with Paul Kedrosky

(recorded 12/03/25)

Paul Krugman: Hi, everyone. Paul Krugman here. I’m able to resume doing some videos for the Substack, and today’s interview is based on me being really annoyed at history. If only one big thing would happen at a time. Unfortunately where we are now is, on the one hand, we have tariffs going to levels that we haven’t seen for 90 years, which should be the big story and where I feel fairly comfortable; but then we also have this AI explosion where I feel completely at sea. I don’t quite understand any of it. I’ve been reading and watching interviews with Paul Kedrosky, who is an investor, analyst, and currently research fellow at MIT, he certainly knows more about it than I do, and I wanted to just have a conversation where I try to understand what the heck is going on, insofar as anybody can.

Hi, Paul.

Paul Kedrosky: Hey, Paul. Both of us “Paul K.,” that’s dangerous.

Krugman: Yeah, welcome on board.

Kedrosky: Thanks for having me.

Krugman: Let me ask first, I have a really stupid and probably impossible question, which is that at a fundamental level what we’re calling “AI”—I think you usually use generative AI, large language models, although they’re not just language now—but at a fundamental level, I don’t understand how it works. Is there a less-than-90-minute explanation of how the whole thing operates?

Kedrosky: There is and I think it’s really important because it helps you be a more informed consumer of their products as a result. I think a really good way to think of these things is as grammar engines and I often call them “loose grammar engines,” meaning that there’s a bunch of rules in a domain that I can instantiate in the form of, whether it’s language, or whether it’s the law, or whether it’s software engineering: these are all grammars, when you abstract away from the nature of how we use them, meaning that they’re actually rules about what’s going on. If I ingest all of that and pull it into a giant network of matrices that weight all of this, then I can therefore do what we call “training on its basis,” it makes pretty good predictions about how that grammar might imply what we should be doing in terms of the “continuations”, the next thing that might be generated, whether it’s a subroutine in software, a PowerPoint slide, some language in an English presentation, or even loosely in the context of an image.

But it’s all this idea that these things are “loose grammars” that are reasonably good at predicting what should come next, the continuations based on the data they’re trained on, which tells you a lot of things about what they’re good at, and it tells you a lot of what they’re bad at.

Krugman: It’s a little bit like if you give me four words of a sentence, correlations out there will tell me what the next word is likely to be. But it’s a lot more elaborate than that, right? There are multiple layers, as I understand it.

Kedrosky: Right. It’s like the old Einsteinian expression of “spooky action at a distance,” where it’s not just the proximity, in terms of the very next thing that’s coming, we call these “tokens,” it’s also about the entire holistic context in which that language is embedded in the context of the grammar. So things that are far away actually have a surprising influence in terms of what might be the next tokens.

So it’s not something as simple as saying, “that box is red, you know that a color should come up next.” It’s not that simple. It has a lot to do with the entire context on which it was trained. In-turn, this ‘spooky action at a distance’-thing tells you about what it might look like. It turns out—this was the surprising thing that in a weird way, surprised even Google in 2017—when the original so-called Transformers paper that led to a lot of the recent developments in AI rose to prominence was that it was created for language purposes. It was created to use in the context of their Google Translate application. They thought, “this is kind of nifty. It doesn’t work too bad for that.” But the idea that embedded in language itself, through this idea of “near and far prediction” and this “spooky action at a distance,” this idea of attention could actually capture a lot of what we call knowledge, and therefore a lot of what seems like inference almost, was surprising to everyone, which is why Google kind of let things go by the wayside.

It took until it appeared inside of other companies like OpenAI, until the technology had a huge impact. So it’s not as simple as just predicting the next token, it’s this idea of: in the context of these attention mechanisms that look at the entire body of where this information is embedded, whether it’s English language or software or the law or any of these domains that you can actually get something that feels to us like, “oh, it understands what I’m thinking or understands the question I’m asking,” which is really just a reflection of—in the context of these large corpuses—what prediction feels like. It feels like this is a kind of continuation of what a normal person would think. What’s interesting is that when I have a colleague doing work on this, if you back sample who it thinks you are—if you think about it in the context of the training models—it has a rough sense that you’re like a 37 year old guy on Reddit. That’s the kind of person that it’s—in this sense—doing the continuation for, because that’s a big chunk of the training corpus. So if you back-engineer out of it what the data actually suggests about it, that can also tell you something. So I often tell people whenever they send me a message like, “a large language model said I should do x, y, z.” For instance, this should be my next car, or this is the answer to the essay question: what you’re really saying is, “a 37 year old guy on Reddit said it,” and you’ve got roughly the same amount of information, so it can be good, or it can be really fraught.

Krugman: We have all these stories about ChatGPT (or similar) telling people what they want to hear and giving them really bad advice. “Guys that look like you tend to make the same mistake,” basically.

Kedrosky: Exactly. Of course, it’s even more fraught now because of the nature of training and how we’ve increasingly exhausted the number of 37 year old guys on Reddit. A lot of the optimization in models now is in what’s called “post training.” So what goes on after the model has been created, where I go out and I say, “here’s the response it will give you to this particular prompt, do you like it?” We call that: reinforcement learning with human feedback. That leads down a path no different than being a professor at MIT obsessed with student ratings. You can become very sleazy, right? All of a sudden now all you care about is whether or not your students like you. This is a dangerous path for all the reasons we know, and it’s no different in the context of models. So not only is there this issue of the corpus itself being very centrally trained with respect to that group, but the models are increasingly trained in the post-training world because we’ve exhausted a lot of the pre-training data—there’s only so much of that out there—that the models become “sycophantic.” They’re tail-wagglingly eager for you to love them. That’s what we’re seeing increasingly.

Krugman: Oh, boy. What strikes me—and I’m by temperament just a skeptic about all these things—but I paid attention out of the corner of my eye to artificial intelligence, and efforts there, for a very long time through decades and decades of immense frustration, when being able to recognize “this is a cat” was a basically an insoluble problem. Then all of a sudden all of this stuff becomes absolutely routine, which is just mind boggling.

Kedrosky: The analogy I make is that we—via the Transformers paper—stumbled into a kind of Saudi Arabia of data. The right way to think about it from my standpoint is that Saudi Arabia of data was the public internet, which suddenly became useful as training data in the context of these massive models that required huge amounts of data and improved on the basis of scaling, meaning that a 10X improvement in the amount of data you trained on led to a predictable increase in the capacity of the model to make what we would call “useful inferences.” That was novel because we could never do that in the past. So the Saudi Arabia of free textual data, no different than any other reservoir, whether it’s the Permian Basin, etc., we’ve increasingly exhausted that data. What you’re seeing now is those old scaling laws, the goddess from 2017, 2019, 2020, GPT-1, all the way up to the present are producing less and less bang for the buck, no different than any extractive model where the remnant of that reservoir is much more expensive to get access to and probably more polluted, probably less useful, probably requires more refining. This is exactly the same, and that’s the point at which we are now.

Krugman: Funny story, I actually knew people who, not worked on but were close to the original Google Translate stuff, and their initial big resource—at least they told me—was OECD documents. Because of the multinational thing, everything is said in four languages. So it was kind of a Rosetta stone.

Kedrosky: No, you’re right. It was a tremendous training corpus for those models. So again, much back to the 37 year old guys on Reddit, once you understand the nature of what’s under the hood, it tells you a lot about why these models are useful and where they are less-so.

The other point I’d make is that it also helps to understand the nature of what training means, because we throw that word around a lot. Training follows this idea of what’s called “gradient descent,” which is that as I make changes, as I do training cycles, incrementally how much improvement do I see, and at what point does it stop or even reverse? In certain domains, the data has a really high rate of gradient descent, meaning that small changes provide a huge signal back to the model. So they’re very good at those things. A good example of that is software itself. If I make minor changes in code, I don’t get minor differences on the other side, I get broken software. So there’s a huge signal that flows back into training when you make minor changes in software, so the gradient descent is very sharp, which makes the models much better on relatively limited data. The English language itself is the exact opposite, if I make minor changes in language and I ask you which one’s better, you’d say, “oh, I don’t know, maybe this one, maybe that one.”

So the notion of learning from language itself versus learning from software is very, very different, which is incredibly important because it tells you why these models are great in the context of software itself, because the gradient descent of learning is so sharp and why they’re so equivocal and sometimes even dangerous in language, because we don’t have that same ability to learn from relatively small morsels of information. And it takes you to the next step, which is why benchmarks themselves in AI are so, I’ll say conflicted, because software is such an extremely good place for models to run, that’s saying “this model is very good at software, therefore, we’re on a path to AGI,” shows a profound misunderstanding of the nature of large language models. Of course they’re good at software. There could hardly be a better domain for training a large language model than software.

Krugman: By the way. Just in case there are some listeners that don’t know, AGI is artificial general intelligence. That’s the holy grail, and I think you’re one of the big skeptics about this being at all what we’re heading towards right now.

Kedrosky: Very much so, for some of the reasons I’m describing, that the nature of large language models, that architecturally—for reasons of data set exhaustion and for reasons of declining returns for increasing investment—we’re kind of at a cul-de-sac already. We’re seeing that happen. So the notion that I can extrapolate from here towards my own private God is belied by the data itself, which shows you that we’re already seeing this sharply asymptotic decline in the rate of improvement of models outside of software, but in almost every other domain.

Krugman: Since we’re talking about investment in terms of the economics and the business side, one of the things that we tend to “think of thinking”—or whatever it is, to the extent we think of this as some kind of thinking-like process—we tend to think of that as being kind of immaterial, as existing in a pure, nonphysical domain. Yet the whole thing about all of this is the extreme physicality of it. We’re talking about particularly huge amounts of capital being deployed, huge amounts of energy being consumed.

Trying to estimate how much CapEx is coming from AI is a huge pain. You have one of the most widely cited estimates but it’s looking a little stale now. I can tell you about why I find it a problem, but why don’t you talk about what’s involved?

Kedrosky: We have this prodigious amount of spending going on, and that was one of the windows through which I got interested in the investment side of this stuff, because it seemed as if it was so large that it was having an impact on economic data itself. I was looking at that early this year—I just did yesterday or the day before—there was a new OECD report on the US showing that in the first half of 2025, the US was arguably in a recession absent AI CapEx spending which was scarcely a ripple in terms of people saying, “hello, we’re running a giant private sector stimulus program that’s keeping the US out of recession,” and yet no one’s talking about it in those terms.

The analogy I make all the time is that when you don’t understand how large AI CapEx is and how consequential it is, you have the causality of policy all messed up. You don’t understand that the thing that’s actually driving the US economy is not the thing you think it is. I often make the joke that it’s like my dog who barks when the mailman comes to the house, and then the mailman leaves and he thinks it’s because of the barking. It’s like, “no, the mailman leaves every day.” It doesn’t matter whether you bark or not, they always keep going. You just have a bad model of causality in this context. That’s no different than what’s happening now in the world of macro, with respect to the role of AI CapEx in the US economy; for example, if you want to believe the tariffs are the primary reason why the US did well in the first half. If you’re of that no-partisan mindset, you’re ignoring the substantial role of AI CapEx on an annualized basis, probably being over $1 trillion, which made it more than half of U.S GDP growth in the first half of the year, which again kept the US out of recession, arguably from a single sector in private sector related spending in that which is just remarkable to me and is really fraught whenever you try to apply another lens and say “no, it was because of this or it was because of that.” No, this was the reason and it helps to explain job growth with someone even in the first half and continues to be, data centers are not a huge job creator. It’s all of these reasons for the capital intensity associated with this one particular sector.

Krugman: What drives me crazy is that you look at the standard, the way the data is cut—basically you look at national accounts—and I’ve seen people say, “oh, well, let’s take communications and information equipment plus software.” But that’s wrong in both directions. Some of that stuff is not AI, on the other hand there’s just a lot of construction of buildings that’s part of AI.

Kedrosky: You can back into it with things like nonresidential fixed investment and try to come in through that angle, which are also fraught. At least one of the ways I tried to triangulate it was build up from the numbers released by the companies themselves, because they’re so eager to brag about how much they’re spending. We can talk about why that is, I think it’s partly a deterrence thing: “I’m willing to spend so much to dominate this market that there’s no reason for you to spend anything at all.” It’s this O.K. Corral phenomenon of trying to deter people from actually contesting the market with you. So you make these giant preemptive announcements, partly to hoard capacity, but partly to deter competitors.

But nevertheless, we’re in this unusual moment where they’re willing to tell you what they’re doing, in a way that actually creates some data you can aggregate up and seize and say, “this is what’s going on with respect to spending” that you might not otherwise see, certainly not from the national accounts.

Krugman: Some respectable people have tried very hard, and I concluded that the BEA data is just not cut in a way that lets us do this, and we have to do something like what you’ve been doing.

Kedrosky: There’s other problems with the data too, which is really amazing to me. There’s also an ongoing business trend survey that’s been coming out where the Census Bureau added a line on AI adoption, trying to be helpful back in 2022. It’s showing that AI adoptions actually began plateauing at around 18% of large corporations already in the third quarter of 2025, which seems ridiculous, obviously, for a host of reasons. But nevertheless, when you go back and look at the actual survey item, you realize that it wouldn’t be out of place ten years ago, it’s about SQL dashboards and all of these machine learning technologies that were ancient ten years ago. So even the attempts to improve the data aren’t very compelling.

Kedrosky: So we’ve got bad data, both in terms of adoption and bad data in the national accounts from the standpoint of what’s actually being spent. An ongoing problem in general is that a lot of our economic statistics are really designed for the economy of 1929.

Kedrosky: That’s right. (laughs)

Krugman: We’ve got an infinite number of categories of textiles. (laughs)

Kedrosky: Yeah, tremendous data on textiles. Not so much on recent adoption of large language models, which is fine; I understand that, but nevertheless, when you introduce a new survey item in 2022 and say that this is oriented towards current adoption of these emerging technologies and it’s all about ancient machine learning technologies, it’s not going to tell you very much.

Krugman: Quick question. Do you have a sense—this may be unfair—of the AI numbers that you’re looking at? How much is equipment and how much is structured as buildings?

Kedrosky: So you can come at it from the standpoint of the data centers themselves, roughly 65-70% of the cost of a data center is specifically the equipment.

Krugman: So it is mostly equipment.

Kedrosky: It is mostly equipment. Obviously the primary beneficiary of that are companies like Nvidia: GPU manufacturers. So it is mostly equipment. Again, there are issues with respect to that being the primary, because obviously there’s a relatively short timeline over which those technologies must be replaced. Michael Burry of “Big Short” fame has been out chattering about this stuff.

I think it’s somewhat misunderstood what’s going on, but nevertheless, I sometimes say “a data center full of GPUs is like a warehouse full of bananas, that’s got a relatively short half life in terms of its usefulness.” That’s important to keep in mind. That’s what makes it different from prior CapEx spending. Moments like railroads, canals, rural electrification, take your pick, because of the nature of the perished ability of the thing that we’re investing in.

Krugman: So let’s talk about chips. As a technological naif: a chip is a chip. RAM is one thing, memory chips, which are commoditized, although I gather there’s a shortage of them now globally?

Kedrosky: Yes, there is in particular these, what are called, HBM, these high bandwidth memory chips, which are the ones that basically interconnect these GPUs and allow them to parallelize the training process. But yes, there’s a shortage in those, not in PC RAM, but in high bandwidth memory.

Krugman: Then there’s GPUs and TPUs—which I don’t quite get. They’re basically these specialized chips that do computational things or I guess GPUs—the G is for general, so less specialized, but still that are much more elaborate.

Kedrosky: It’s actually for “graphics processing units,” weirdly enough, the origins of Nvidia GPUs were back in the day whenever everyone thought the world was going to get taken over by 37 year old guys on Reddit with giant machines where they’re playing games on their personal computers at home. So the reason why GPUs are so good for training is because they were created to be very good at manipulating real time graphics on a screen, which is just a giant set of matrices in terms of the calculations of the positions on the screen, and researchers figured out fairly quickly, “wow, that’s actually useful for doing huge amounts of matrix math,” which underlies most of machine learning and thus large language models. So GPUs really were almost an accident of history in terms of their role in the context of large language models emerging from the graphics world.

Krugman: One big insight that I got from you is—until like a week ago—I understood that these chips depreciate fast, but I thought it was going to be basically depreciation through obsolescence. But it turns out that it’s just very, very different. Do you want to tell us about that?

Kedrosky: Yeah that’s really important because there’s this idea that the reason why this is a warehouse of bananas—or whatever your favorite fruit is in this context—is due to the pace of change in technology. That’s kind of a trope, “oh, everything changes quickly. I have to throw out my phone, my laptop.”

That’s not really the primary driver in most of what are called hyperscale or the largest data centers run by people like Google and Meta and others. You have to think about it in the context of the workload, what’s actually happening inside the data center, and it can loosely be split in two ways: there’s the training aspect of what goes on, so where I’m training new models or enhancements to old models using giant amounts, at least 10 to 20,000 GPUs inside of one of these data centers; and then the other chunk of the activity inside the data center is inference, which is responding to requests I might make when I write some nonsensical question to a chat AI, like Claude or whatever. So those are the things that loosely split in terms of the two things going on inside the data centers. Chips are underlying both of those. But from the standpoint of the wear and tear on the chip, those are very different activities. The analogy I often make is, let’s take training as an example, if I take training and I’m using that for a job, I’m running the chip flat out 24 hours a day, seven days a week, which requires an immense amount of cooling, incurs a lot of thermal stress (heat-stress), and then inference: I’m running it more episodically, maybe more in the day, less at night. People aren’t making as many requests at night, so the load changes fairly dramatically.

So the analogy I make is, imagine both the chips were used for 50 hours for training and 50 hours for inference. Now imagine a car in the same circumstance. I raced a car for 50 hours in two 24 hour races, or I took it every Sunday to church for an entire year, roughly 50 hours, let’s say it’s a half hour there and back. Which car would I like to own? I’d like to own the one that went to church on Sundays, even though it’s 50 hours is 50 hours. Because I realize that racing a car for two 24 hour races, even though the car’s only been run for 48 to 50 hours in a year, is a very different requirement with respect to the stress incurred.

When you use a GPU for training, it’s like those two 24 hour races on your car versus taking it to church for a year on Sundays. And so what happens is, and the data is fairly clear about this, there’s a distribution with respect to a long tail, where some chips last for quite a while, but there’s a high failure rate in the first 2 to 3 years, with a mean time between failure of about two and a half years or so; so long before we might be saying to ourselves, “oh, look, there’s a hot new chip out there that I want to replace this thing with,” you’re actually seeing a steady drip of chip failures. So let’s aggregate up. Imagine you had a 10,000 or even a 20,000 GPU data center. You should expect on the statistics a chip to fail about every 3 or 4 hours. So long before I get to the point where I’m rapidly turning these over because there’s a new generation of chips, I’m turning over a vast chunk of my chips just because they’re failing under thermal stress. This is because these workloads are like running my motor flat-out in that car. It’s high heat, it’s a lot of stress, things begin to break down. This leads to the turnover long before generally speaking, you might turn it over just because there’s some hot new chip out.

Krugman: Wow. So basically as you say, “it’s the training rather than the inference that’s an issue.” But the training is basically just running chips hot. They get heat stroke more or less.

Kedrosky: They get heat stroke and they and it can be really insidious because they don’t necessarily break catastrophically. It’s not like your car suddenly stops. They can actually slow down. You don’t realize that it’s not running as fast as it once was. So there’s all kinds of ways in which it requires a lot of work to figure out, “oh, that chip is running at a subpar level,” so it’s not as neat as, “it just blinked out of existence. Now I need to hot swap in and replace something.” It’s not quite that neat and tidy, which makes it even more time consuming and complicated to make the replacement. But that split understanding and the difference in terms of what it does to GPUs—to chips in the data centers—and thinking about it almost in the context of cars going to church on Sunday versus taking it in a 24 hour race is incredibly important because it tells you a lot about the dynamics of what we should expect in terms of the future capital costs associated with replacing the GPUs and data center. There’s an ongoing replacement wave not driven necessarily by technological change, but driven by the actual thermal stress on the chips themselves.

Krugman: Okay, so we have lots and lots of analogies, with the telecoms boom of the 90s. We all said, “well okay, a lot of companies went bust. The returns never added up.” But on the other hand, you had all this fiber in the ground, which eventually became useful. But you’re saying basically that’s not what’s going to happen here. What we’re going to end up with is a bunch of burned out chips.

Kedrosky: That’s right, a bunch of burnt out cases. That’s exactly it. It’s kind of like The Big Lebowski as a chip, where it’s like, “I’m not sure what’s going to happen here, but all I know is this guy’s long past his due date,” and so that’s a part a big part of the problem here is not just that technological change makes this 10,000 GPU data center less useful, it’s that it’s also gone through cycles of thermal stress, and likely its lifespan isn’t particularly long anyway as a result of what it’s already done. There’s a double whammy here that will make it less useful. So the response of the technology industry to that is generally this idea, they say, “well, that doesn’t really matter that much. What we’ve created is a powered shell. It’s this giant building that’s got power, it’s got cooling, it’s got all the things. So we can just hot swap in all of these GPUs again in future.” And that’s of course assuming away the problem, which is that 60 to 70% of the cost of the data centers—the chips themselves. So I’ll give you the power, the electricity there, the cooling and the walls and the concrete. I’ll give you all that for free. You still have the preponderance of the cost in front of you in terms of replacing the GPUs.

So the notion that I built a fixed asset that’s perpetually useful is really dangerous. I hear this a lot in particular from regional economic development officials who are talking about why they’re offering really extreme subsidies and tax abatements to hyperscalers to build data centers in their area. And they talk about them as—and I’ve heard this expression so many times—that data centers are “the factories of the new industrial revolution.” The analogy is just so fraught, for this exact reason that there isn’t that longevity that you would hope from this—leaving aside the analogy is bad—there isn’t that kind of longevity for these reasons.

Krugman: I think Jim Chanos may have been the first to say this to me, but I know other people have said it. It’s like shale wells, in which a lot of people lost a lot of money because it turned out that shale, gas or oil well, doesn’t keep yielding the same way that a conventional oil or gas well does. It depreciates really fast.

Kedrosky: That’s just another extractive resource economy. It’s an extractive resource economy in surprising ways. So not just in terms of the nature of a declining return from the GPUs themselves, but also the declining return—as I was talking about earlier—of these giant training sets that allowed us to scale up the so-called scaling laws for large language models that got us to the point of GPT-4 and 5 or Claude—that there’s a declining return on that, at a much higher cost.

The cycle times are longer, more training cycles. The amount of cost is higher. So in both ways, the extractive economy that underlies all of this is producing declining returns, which in the context of shale, wasn’t just the one point of failure with respect to declining returns to extraction. There are multiple ways you begin to see that. It’s masked by the capital expenditures because people try to spend their way out of the problem. So I’ll run more training cycles to produce better data. Of course, that doesn’t work. Then they go into the mode of—like Elon Musk has been doing with his Grok model—where I’ll spend half the time doing post training.

So instead of relying on finding new data, I’m going to do all kinds of work to make it more sycophantic in terms of the response it gives to people. If you look at his training data, almost 50% of the of the training cycle time on the latest Grok models was about 50% from post training, which can work but in the limit leads to these kinds of obsequious and sycophantic behaviors that make the response at best unstable and realistically not very helpful.

Krugman: The last number you had was a little bit bigger as a share of GDP than the telecoms boom of the 90s. But presumably you think it’s higher than that now?

Kedrosky: I do. It’s something like, nonresidential fixed investment, probably around 14% now. So, we’re considerably ahead of where we were in the telecom bubble, we are somewhere between rural electrification and World War two rearmament.

Krugman: But not yet like railroads in the 19th century.

Kedrosky: Not yet like railroads, but on a path to a similar place. Given that—and this is really important—we’re in that point where there’s a financial flywheel now, where increasingly the financing of these data centers is in a place where it’s somewhat divorced from what goes on inside the data center, because we’ve created a financing template for how to finance data centers, where you have these SPV, these special purpose vehicles into which a third party contributes capital and the tech company contributes technology. And out the other side magically pops these securities that have great income and yield characteristics that are hugely attractive to investors. They look at it as almost like a synthetic security, where I understand it’s the SPV, the data center that’s producing this.

But on the other side of this is Meta and Google and they’re a prime credit. They’re a really strong credit. So they’re going to keep paying on this. I don’t really care what goes on inside the data center, because I have a lot of confidence in the counterparty in this. We know where all of this kind of thing leads when you have these financing flywheels driven by securitization and high yields and people not caring what goes on inside the actual structure itself: it leads to a lot more construction and eventually over-building.

Krugman: Oh, God. I’m having flashbacks to 2008, 2009. All of the stuff that was “perfectly safe” because after all, AIG was backing it, right?

Kedrosky: Right, exactly. Very much so. When you have the same phenomenon of this look-through mechanism, where you have these legal vehicles, where people look through the legal vehicle and say, “oh, well, it doesn’t matter, because on the other side of this, it’s Google and Meta,” and it’s even more insidious. Some of the private credit providers have been straight up about this, that in the contractual terms that underlie the data centers, if you were to cancel early and no longer continue as a technology company using one of these centers, there are whole provisions which basically force you to pay, in some sense, the net present value of the future lease payments back to the private credit company. They’ve been very clear about—this actually works out in their benefit given time value of money, that actually they don’t mind if you walk away from it early and make the payment because “now I have more capital to do more building.” So in a weird way, there’s a perverse incentive in the system to make bad loans.

Krugman: I do want to ask about circular financing, except that I’ve been looking at all of these pictures showing the money flows among all the players and my eyes were glazing over—I’m supposed to be good at this!—but there is some sense that things are kind of inflated by taking in each other’s washing, is that wrong?

Kedrosky: No, it’s absolutely right that we increasingly will see circumstances where an Nvidia will make an investment in a provider with the provision that they use Nvidia’s chips. In turn, that becomes their primary source of semiconductors for their training centers. Then that in turn feeds back and leads to more buying. And so round and round and round it goes. It gets very sort of incestuous and complicated because we have all of these interlocking combinations. But the reality is it creates the impression of much more demand than there is. And it’s done in part for strategic reasons, because Nvidia is trying to block up a position in the marketplace where it says, “there’s really no point in even looking at a Google chip or an AMD chip or anyone else, because look how much we’re dominating the market, and look at the lengths to which we’re prepared to go to make sure we continue to do that.” So it’s not so much that it’s some kind of malfeasance, it’s just this kind of rogue strategic move that ends up causing this impression of more growth than actually exists, because these companies all believe there’s a kind of land grab, literal and figurative, going on right now, that I need to make sure I populate these things with my technology now, because who knows what other opportunities I’ll get to do it in the past or in the future.

But all this tends to do is create this circularity, and round and round and round it goes. It becomes very difficult to get a true sense of actually what demand looks like. That’s made worse by this hoarding that’s going on where people don’t know what the demand is going to look like in future. But they do know that there’s relative scarcity of access to power. So I want to make sure I lock up every location I can now, and we’ll just let the chips fall out—no pun intended—how they do in future. So there’s this hoarding phenomenon that’s going on, which also leads to overbuilding, this circular phenomenon, and even leads to this kind of Chinatown-like speculation with respect to land grabs that might one day turn out to be useful.

We see the emergence of these companies called powered land companies, which are kind of analogous to what went on in the days leading up to LA taking over the Owens Valley’s water supply, where you show up with numbered companies and you buy up locations and no one knows exactly what you’re doing, and it’s all in anticipation of eventually one day someone wanting that and you say, “haha, I’m already here and I’ve already got the rights to access to power here and so if you want to build a data center, away you go,” and we’ve seen there’s a whole host of these so-called powered land companies that have no interest in building data centers. They just want to kind of go through a Chinatown-like model of preemptively buying the land in anticipation of an eventual buyer showing up.

Krugman: Wow. Power, that’s one of those things that I was completely caught off guard by was the sheer power requirements and how that becomes a constraint.

Kedrosky: So part of the problem is the technology industry itself isn’t used to anyone saying no, they’re kind of like a petulant toddler. So the problem is that that power is the connection to the real world of what’s going on, and so these things have to be grid connected. We have to get power from somewhere. We’re looking at certainly hundreds of megawatt buildouts, but also even into the gigawatts. This is obviously far in excess of what you can straightforwardly attach to an orthodox grid. At the same time though, there’s this huge temptation on the part of utilities to say, “we’ll take this because of the predictability of the load and the high quality of the credit makes it really appealing.”

But then the problem becomes, I have to make whole. So now I have to turn back and probably increase rates to my ratepayers, which is why we’re seeing soaring electrical bills all over the place. We’re even seeing people pushing back and saying, “I don’t want to have data centers connecting in my region” and that in turn turns into what’s called “behind the meter power,” which is you show up but you’re supposed to bring your own power. Well, that’s easier said than done. It turns out it takes a long time to build a nuclear generating station. It turns out that it’s like 4 to 5 years now to bring in natural gas. So people connect to the grid now with the promise that it will eventually be self-sufficient. But who knows whether they’ll ever be self-sufficient. So you get into these perverse situations, like recently in Oregon, where Amazon connected three data centers to the grid and has now registered an official complaint with the Oregon PUC because they can’t get power for any of them, but they were promised. So this is the beginning of what you would expect to have happen, because the temptation to take on these loads is immense, but the loads themselves are so large that it’s not straightforward how you attach it without actually changing the bills back to ratepayers.

Krugman: Yeah, the utilities may like it, but the governor elect of New Jersey probably doesn’t.

Kedrosky: That’s exactly right. Then you get even crazier situations, like a recent one with Allegheny Energy and Power, AEP, where you actually have utilities speculatively buying futures with respect to providing power that they hope will be used by data centers. The data center demand doesn’t show up, and so they in turn turn around.

This is happening right now with AEP. They’re trying to dump that power back into another interconnect. So it’s essentially a secondary distortion of a market. Because they have 700MW of power that’s just burning a hole in their pocket. But that’s because they were borrowing speculatively, trying to control some power such that they could then turn around to data centers and say, “hey, come here.” That didn’t happen, and now they’re dumping power, which is distorting another market.

Krugman: So we have a big problem of power. We have probably much faster depreciation rates than are being built. The question is, what is the prospect of the stuff actually generating the kinds of returns that would justify the investments?

Kedrosky: They’re low, this is why you get into these perverse conversations, which I seem to get drawn into all the time about what that might look like. So you get people doing these top down models and saying, for example—and this one just makes me crazy—that “the TAM (the total available market) for global human labor is like $35 trillion.” What if we get 10% of that? That would be a $3.5 trillion revenue stream, which just for a host of reasons, are indefensible ways of approaching this. It’s partly the old mistake of saying, “if I just got 5% of the Chinese market, I would be a huge business.” Well, no one gets 5% in the Chinese market. You succeed or you fail. But it doesn’t work that way. Same thing with this 10% of the global labor market. But more fundamentally— and this is more your bailiwick than mine—is that a $35 trillion market into which AI makes huge incursions is no longer a $35 trillion market. It’s a massive deflationary force. You have 10% of something, maybe, but I have no idea what it is anymore.

So the idea that you can predictably say, “I will continue to pay as much for labor when it’s done this way versus that way,” just seems naive, at best inept, really self-serving at worst. So all of these models about trying to come up with a defensible, whether it’s top down or bottoms up models where people say, “well, what if 5 billion people worldwide are all paying $100 a month for some kind of large language model subscription? Well, then we’re making enough back.” It’s like, that’s not the way it’s going to happen! That’s an incredibly naive way of thinking about the way this will play out. It’s more likely it’s just running for free on my phone and I don’t even notice. I’m not gonna be paying for it at all.

Krugman: There are not 5 billion people in the world who can afford $100 a month.

Kedrosky: No, of course. It’s just a staggering misinterpretation. So both ways of thinking about it really don’t make a lot of sense. You fall into this—and I use this expression all the time—faith based argumentation where “it has worked out before.” This is what everyone said during the fiber bubble, or this is what everyone said during the dot com bubble, or pick your favorite moment with respect to a technological change. They say, “these things always work themselves out.” I find that a really patronizing approach to the problem, because the scale of the spending is now on a sovereign level, the amounts of debt being raised by companies like Oracle rival a mid-sized European powers’ sovereign debt raising on an annual basis. These are non-trivial numbers, and it’s even rippling through to places like Taiwan where, for example, TSMC now is something like 15% of Taiwan’s GDP. Every other sector in the country is struggling, not just not least because of technology, but also because of tariffs. So we’re creating new fragilities in all kinds of places as we merrily extrapolate our way along on the basis of this debt-fueled spending.

Krugman: Of course, there’s always the possibility that “other players, other approaches.” I mean, it’s a little bit like last year where the Danish economy is all about Novo Nordisk, and it turns out other people can produce weight loss drugs, too.

Kedrosky: The analogy is spot on, because at peak, Novo Nordisk was something like 14% of Danish GDP. So in a weird sort of way, TSMC holds the same role with respect to Taiwan now as a result of it, and faces the same risks with respect to fragility because LLMs, large language models, as the basis of much of the current excitement, are at a kind of natural architectural dead end with respect to some of the things we’ve been talking about. So the idea that it’s going to continue, we can project in the same way and extract the same gains from the same kinds of spending are just incredibly unrealistic. That’s one of the reasons why you’re seeing people increasingly look at other approaches. I think in all likelihood, none of them will lead to anything like AGI. But it doesn’t really matter. The point is that it’s a demonstration of the extractive exhaustion of what we’ve currently done.

Krugman: There is this talk among mostly uninformed circles that I run in, but about smaller models trained on a more limited base to the Chinese approach and that are much cheaper to run and that that would be a huge blow to these companies, if that turns out to be right.

Kedrosky: Absolutely. So you have these small and micro models that are much cheaper to train. Deep Seek was loosely an example last year. This idea of a much less expensive method for training models. We saw it recently with Moonshot’s Kimi model which just came out of some of these Chinese models. So these are in a sense they’re a different approach to the same problem. We’re not taking a new architectural approach. These are still large language models. They’re just at a much smaller scale in terms of the amount of time required to train them and the cost required to use them. So they’re really important, but they’re even more important because, let’s think forward, if I’m right, that the amount of training we do in future has to decline, because the natural architecture that we’ve had with respect to large language models, the economics are dictated then by inference, by the ability of these models to respond to requests.

But most of inference is not you and I. This is a mistake we make all the time. People think that we are the story. With respect to inference, most of the global inference from consumers—from you and I and others—could be satisfied by a single data center in northern Virginia. That’s how small a fraction of the total load we are with respect to inference worldwide.

So 60%, let’s say, is training. We’re maybe 5 or 6% of the total workload of data centers. That bit in the middle, a huge chunk of that, is software itself—is coding, which turns out to be a huge profligate use of tokens. So what you’re forced to project as you go forward, is you say, “well, is everyone on earth going to be writing software using Copilot or Cursor or any of these tools?” That seems unrealistic. So where is the balance going to come from with respect to the increased usage of these models? Then at the same time, you have the incursion of these small models which are going to eat up even more of it at the margin. So it’s very difficult to see how the current extrapolated model, with respect to the workloads at these data centers makes any sense.

Krugman: It’s amazing. One of my pastimes now is watching old tech ads from the 90s. The ads were a lot better, by the way. I don’t know why the 90s were so much more fun than this one, but the old Qwest ads about all the wonders of fiber optics. It all came true, except, not for Qwest.

Kedrosky: Right, which is a sort of the perennial problem here, it’s: you turned out to be the pioneers with the arrow in your back. But yeah, I think that’s a big part. The other thing I think is that what’s really unusual about this bubble and confuses people a lot—or this moment, I’ll say—is that historically, the U.S. has been very good at speculative bubbles. This is one of our main core competencies here. They tend to be about real estate, or they tend to be about technology, or they tend to be about loose credit, and sometimes they even have a government role with respect to some kind of perverse incentive that was created. This is the first bubble to have all four. We’ve got a real estate component, we have a loose credit component, we have a technology component, we have a huge government component, because we’re told “we’re in an existential crisis with China, and we must win this at any cost.” So all of those forces together means that you have people looking at it through four different silos and lenses, rather than just saying, like in the global financial crisis, “it’s always about real estate and credit.” Or in telecom, “it’s about technology and loosely some credit.” This is the first one where you end up in the rational bubble theory of all of this, where everyone feels like they’re doing something rational. Yet in aggregate, all of these different people who are looking at the problem through their own lenses are actually profligate contributors to the problem, because it’s the first one that combines all of the forces that historically have made some of the largest bubbles in U.S. history.

Krugman: Oh, joy. (laughs) Sorry. Well, it does have a sort of a feeling again. I think the bursting of the housing bubble played an important role in my life because it made it possible for me to afford my New York apartment. And, of course, I was paying a lot of attention, though not financial stake but to the tech bubble of the 90s. But now we have the sum of all these things...

Kedrosky: The sum of all bubbles.

Krugman: Wow. Let me just ask. We’re running a little long, but I want to add that one of the interesting posts that you had recently was about—economic geography, location, that’s one of my things—San Francisco is having a revival, you want to talk a little bit about what are the places that are affected?

Kedrosky: So this is probably one of the narrowest moments with respect to risk capital in the last 30 years in terms of either the money is going to one thing or it’s going to nothing, which is to say venture, secondary credit, growth capital, it’s all going into AI, which is having this impact in those centers which are most prone to having companies doing this kind of work.

So San Francisco is a good example, where it’s gone from a relative commercial real estate glut as recently as four years ago to it’s now back to historical norms. And probably by this time next year at the latest we’ll be well below the levels that we saw even 10-15 years ago and entirely driven by this influx of capital around this single sector. So the narrowness is one thing, but the scale of the money flowing in is another, to the point that it’s actually distorting. It’s doing the same thing in New York. It’s doing the same thing in San Francisco, to a lesser extent in other centers. But it’s narrow geographically and it’s narrow sectorally which is really unusual.

I think the flip side of that, and the point I always make, is that whenever all of this capital is flowing to a single thing, it also means that it’s not flowing somewhere else. I think that’s incredibly important to understand. I gave the Taiwan example earlier, where if you’re in AI or semiconductor manufacturing in Taiwan, you’re awash in capital. If you’re a manufacturer of literally everything else, you cannot get a loan. The same thing is true in the U.S, where if you’re an early stage company or a mid-stage company looking for growth capital for almost anything and it doesn’t have an AI component, you’re out of luck, my friend.

This notion of starving not just manufacturers, but growth companies for capital because of the narrowness of the spending almost always has historical consequences. We saw this in the 90s, with the rise of China and sort of coincident with the telecom bubble, and how U.S. manufacturers are increasingly starved of capital because it was all flowing sectorally to telecom. We’re seeing the same thing now. That will play out over the next few years. But it’s dramatic right now.

Krugman: Sounds bizarre unless you know the history, but in international economics it’s “the Dutch disease.” There was this famous period when after the Netherlands discovered natural gas, it really killed their manufacturing sector.

Kedrosky: Exactly. I make the same analogy. I think that’s exactly what’s going on. It plays out in insidious ways. Like let’s say you imagine the tariff policy was going to be effective with respect to offshoring manufacturers, imagine you’re a capital intensive manufacturer trying to onshore and you’re not in the semiconductor sector, how difficult is it to raise capital right now? It’s virtually impossible. It’s really much more difficult than it would be absent the AI spending bubble. Because of this tsunami of cash flowing into a single sector. So even if you believe that policy was likely to be effective, the struggles with respect to getting any capital are dramatic because of this phenomenon. And yet, if you don’t talk about it and understand it, you’ll think that, “oh, well, what we need is probably higher tariffs.” We need to encourage people even more to come. Otherwise we won’t have enough manufacturers manufacturing domestically.

Krugman: It’s just this feeling that—monstrous sums of money, monstrous egos, where does all of this end up?

I have to say, there’s one humble sector that I happen to know is prospering amid all of this, which is the two remaining companies that produce blue books for college exams.

Kedrosky: Oh, yeah.

Krugman: They’re having a revival because we’re going back to handwritten exams.

Kedrosky: You know what? That doesn’t surprise me. I should have thought of that, but I bet that’s exactly right.

Krugman: The problem is the young people don’t know how to write anymore. They literally don’t know cursive. How this thing works is so important, and people like me are thoroughly unequipped. So thank you for helping me a little bit on that front.

Kedrosky: That was great, it was great chatting.

I’m hanging out in Sydney with my esteemed co-author and co-conspirator Gene Kim today; we flew in to conduct Vibe Coding workshops and talks this week to the Commonwealth Bank of Australia, some of their partner companies, and the general engineering public. Very cool of CBA to sponsor this training, and Gene and I are super excited for it.

We noticed that we’ve pushed into new territory since our Vibe Coding book was published. The book is all about how to work with coding agents, and all the advice and techniques in it are still incredibly relevant; I use it all daily. But there’s even more to learn, and we continue to uncover new tips and strategies.

I thought I’d share some of the new themes we’ve noticed, in no particular order, hot off the presses. Let’s see which ones resonate with you.

1. Software is now throwaway — expect < 1 year shelf life

This is probably the most obvious one. Anthropic has already begun embracing this idea internally, which is how I first heard about it, from friends there.

26 years ago Joel Spolsky wrote one of the most useful pieces of software advice anyone has ever given, in Things You Should Never Do, Part 1, where he says, in a nutshell, DON’T REWRITE YOUR SOFTWARE!

In this classic essay, well worth a read, Joel gives powerful examples of companies and projects that decided their code base was too old and crufty, so they chose to rewrite it all from scratch. And the results were, predictably, awful. Joel says:

The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material?

And he was right! Outstanding essay. But unfortunately, not so timeless as we thought. It proved to have a shelf life of only about a quarter century. We are entering a surprising new phase of software development, in which rewriting things is often easier (and smarter) than trying to fix them.

I first noticed this with unit tests. You’ll use agents to make a giant refactoring to your system, and then all the tests will be broken. The agents inevitably struggle to fix them. So one day I said, screw it, delete all the tests and make me new ones. And it got through that exercise SO much faster. The new tests were great, had great coverage, and importantly, the LLM was able to generate them very quickly, compared to trying to reason through the old system behavior vs the new expected behavior. With new tests, it can focus just on the new behavior, which is a much cleaner cognitive problem.

This generalizes beyond tests: generating almost any code is easier (for AIs) than rewriting it. Hence, recreating software stacks from scratch is starting to become the new normal. We’re seeing it more and more, e.g. companies with mainframes who are concluding that a small team of engineers and biz people could recreate the entire experience with the same API, but with modern architecture and maintainable code, in just a few months. And they’re doing it.

The upshot is that for all the code I write, I now expect to throw it away in about a year, to be replaced by something better. Maybe mine, maybe someone else’s. Doesn’t matter. It’s all just stepping-stones to higher velocity.

This spells trouble for third-party SaaS vendors. Companies are also discovering that they can build bespoke business-automation software so easily that they don’t need to re-up their vendor contracts. SaaS vendors are going to have to work harder to provide value that’s too expensive to recreate. It can be done — Graphite is one example; they now have years of learnings into the nuances of AI code review. I don’t think you would necessarily want to retrace those years of steps yourself, on your company dime. Sourcegraph is another example; they have a code search engine with 10 years of enterprise bug fixes, and even with modern agents, you almost certainly wouldn’t want to try to clone that yourself.

But many SaaS vendors who’ve found niches building business automation software are going to be in real trouble. Because businesses are automating their own processes now, with vibe coding!

2. Agent UX matters at least as much as Human UX

One of the interesting themes I heard at the AI Engineering Conference in NYC a couple weeks ago was that although many people are building tools for AIs, they are finding it very hard to get the AIs to use those tools.

It’s tricky to get AI to use a tool it’s not trained on. They have certain ways of thinking and working, and they tend to reach for familiar tools (e.g. grep instead of a fancier search). I’ve talked with many people who wanted to build a tool for their agents to use, and they’d work with the frontier models to design the perfect agent-friendly interface — one the models swore up and down would get them to use it.

And then haha, no, the agents don’t use it. You prompt and prompt, they ignore and ignore. So what do you do? How do you get them to use your tools?

My Beads issue tracker for agents has been an interesting case study here. It’s only maybe 2 months old and it already has 250+ forks and 5000+ stars. It’s a successful project. But I’ve never looked at the code. It’s fully vibe-coded by agents. Despite that, Beads managed to capture lightning in a bottle — it’s a tool that AIs use, and not only that, they like it. Agents use Beads eagerly and enthusiastically with very little prompting. They make smart decisions, such as filing Beads when they are low on context, instead of doing the work directly. Things you would normally have to prompt them to do, they just do!

I’m no magician. I’ve built plenty of tools that the AIs refused to use; I’ll talk about one of them below. And I’ve built plenty of prompts that the AIs choose to ignore or overlook. It’s not like capturing lightning in a bottle is super reproducible at this point. But I can share some of the things I did with Beads that I think helped.

First, I asked Claude to help me design a new lightweight issue tracker backed by git, with a few other constraints, and then Claude came up with about half of the rest of the design: the SQLite database caching layer, the discovered_by graph link that the models feel is very important for gathering context on issues, the hash IDs, deletion tombstoning, etc.

During the Beads design phase, I mostly argued with Claude, telling it I didn’t like certain choices it was making from a Human UX perspective. Eventually we negotiated our way to something we both liked, something that had good agent UX and also good human UX.

For the agent side, once we had the initial structure in place (the issue tracker itself), the primary UX issue became tooling ergonomics. My agents were trying to use Beads, but they kept giving it the wrong arguments. For example, they’d use — body instead of — description when filing an issue, which would fail. Why? Because they were trained on GH Issues, and GHI’s CLI tool uses — body for filing issues. Reaching for the familiar again!

So in that particular case, told it to add — body as an alias for — description, which it did, and that bit of Agent UX friction went away forever. I’ve done this many, many times in Beads. As the agent works, I watch how it’s using the tool, and whenever it encounters an error, I ask it, how did you want it to work there? How can we change it to make the behavior be more easily guessable?

Over the past few months we’ve made dozens of tweaks, adding flags and commands, and the agents now rarely have trouble using Beads fluently.

I can’t claim to have cracked the agent-UX problem, not by a long shot. I think the role of “Agent UX Designer” feels like it’s ready to emerge as a first-class career for humans. As just one example, I’m working on my third agent orchestrator this year. And even though the architecture is sound, I haven’t found the magic UX formula yet, to where any agent automatically just figures out what to do, and does the right thing most of the time. I’ll get there! In fact, as soon as I solve this problem with my orchestrator, I’m launching it. I’m aiming for Christmas Day. We’ll see.

Once you do find that secret incantation that makes your tool truly agent-friendly, you should get it out there as fast as you can, because it will grow like crazy.

And if you try to launch a tool that agents don’t choose to use of their own volition, with minimal prompting, then you need to go back to the drawing board and fix the agent UX.

The best way to do this is to leverage the Optionality from FAAFO, from our Vibe Coding book. Generate a whole bunch of interfaces, and then experiment with each one, to see which one the agents like best. It’s very much a trial-and-error search problem at this point, until either the agents get better at using new tools, or we get better at learning what they like.

3. Spend 40% of your time on code health, or else you’ll wind up spending >60%.

Gene was curious how I could be so confident in Beads if I’ve never looked at the code. My answer to him was one of the easiest I’ve ever given. If you are vibe coding, i.e., having the AI write all your code for you, then you need to spent at least 30–40% of your time, queries, and money on code health. That’s how you make sure your code is OK. You have the AI conduct regular code inspections. Tons of them.

It’s pretty easy in principle: Every now and then, you pause your regular work, and tell your agents: go find code smells of all shapes and sizes. Have them file Beads for anything that needs followup. Tell the agent to look for large files that need refactoring, areas with low test coverage, duplicated/redundant systems, legacy code, dead code, poorly-documented code, etc. etc. etc. I don’t have a good prompt for this step yet; would appreciate it if anyone has crafted one. But you can also just ask your agent to help craft it.

You’ll also want to ask your agent to do cleanups during the code-health passes. Have it look for files that are in the wrong place, or have misleading names, or need better homes. Have it clean up debug cruft, ancient plans, build artifacts, old docs, anything you don’t need. This is all part of the regular hygiene and maintenance of a vibe-coded code base.

It helps to be creative, and also to ask the agent to be creative, thinking outside the box. After the first round or two of regular code reviews, start having it look for over-engineered subsystems (YAGNI), opportunities where your code could have used a third-party library, and other broad, system-level concerns.

Basically the agent will always find problems, often shocking ones, e.g. where you discover you have two or even three completely redundant systems (databases, logging, telemetry, whatever) that need consolidating. And since agents tend to accrete code without automatic refactoring, your vibe-coded source files will tend to grow to thousands of lines, which makes them harder to agents (and humans) to reason about. So you should tell it regularly to break things up, and then run dedicated sessions to implement the refactoring!

During each code review, have your agent file Beads for everything it discovers. Then have it review the epics and issues (up to 5 times; see below) to ensure the implementation will go smoothly.

Then swarm to fix it all! Do all this at least weekly. For me, I’d estimate I spend about 25–30% of my time and money on code health, and I don’t think it’s enough. As long as I continue to find serious problems with reviews, I need to do more reviews. My current guidance is that you should expect nearly half of your work to be code-health related.

What happens if you don’t follow this rule? You gradually (but rapidly) accumulate invisible technical debt that weighs down your agents in various ways — too much code, conflicting code, obsolete docs, etc. Your agents will begin to work more slowly and you’ll see more bugs in their outputs.

Stay on top of code health, and you’ll keep your vibe-coded code base sprightly.

4. You might be too early: Some projects are ahead of their time.

AI cognition takes a hit every time it crosses a boundary in the code. Every RPC, IPC, FFI call, database call, client/server call, every eval, every single time the AI has to reason cognitively across a boundary or threshold… it gets a little dumber.

I noticed this when working on Efrit, my native-elisp coding agent, which lives inside Emacs. Over the summer I was trying to get Claude and other models to build it for me, and they struggled. Hard. Efrit lives in Emacs, which is a separate process from your coding agent, so already there’s one boundary.

For that particular IPC boundary, there are multiple channels for the agent to talk to Efrit, all of them quite unsatisfying. There’s emacs — batch, which has limitations, and the emacs-server client/server mode, which is also limited for the kind of heavy reflective introspection the agent needs to do for this kind of code base.

So what did I do? I spent a week working with Claude to build a better agent-Emacs bridge. Claude built me the “Agent-Efrit bridge”, a simple and elegant system which uses a polling file channel as a message queue to and from Efrit. It’s beautiful. A tool made for agents, by agents! When it does work, it’s amazing.

Naturally, Claude never uses our fuckin’ bridge we built together. I’ve given up even asking. This is an example of a tool I tried to build, but the AI just refuses to use it.

With Efrit, after that initial bridge there are still other RPCs — the API call to the frontier model, the parsing of its response, and the eval of the elisp code to execute the response. All of these were piling up to make the models dumber. And ultimately, the August 2025 crop of frontier models couldn’t solve this problem. Or at any rate, the returns became so diminishing that I gave up.

So I paused the project! There was plenty of other work to do. A few months went by, a few model releases happened (notably Sonnet 4 and Sonnet 4.5). Efrit sat idle. And then about 2 weeks ago, someone asked to be an Efrit maintainer, since people wanted to use it. But wait, Efrit was still crap! So I thought, what the heck, let’s have Claude 4.5 peek at it.

Claude 4.5 took one look and said, “great idea, awful execution, but we can modernize this.” It produced an incredibly detailed plan to take Efrit to the next level, and I’ve spent the past 2 weeks letting it grind through this plan (serially, no swarming, since swarming on elisp sounds like a bad idea today.) And now Efrit is getting to be approximately on par with modern coding agents.

All I had to do, in order to crack this nut, was wait 3 months (i.e., 2 model releases). Claude is finding Efrit quite easy now, compared to this summer. I cite this as one of many examples of how the models and tools are indeed getting exponentially better. I have a set of projects they can’t do today. Efrit is (well, was) one of them. If you keep a menagerie of “too hard for AI” projects, then you will be able to watch and measure their cognitive progress increasing month by month.

I often bake this philosophy into my project planning. I will deliberately build something that’s just slightly too hard for the agents, knowing that in the next model release, they’re almost certainly going to find it straightforward. I plan for the models to get smarter, by building tools that don’t work that well with today’s models. This is how you get that little bit of extra shelf life out of your software — plan for it to be useful when smarter agents arrive.

If you read this section and concluded, “well, obviously AI isn’t ready to handle my project work; I tried it, it was confused, so I’m just going wait for smarter models,” then I wouldn’t blame you. But be careful! You might not need to wait as long as you think. If you’re just using this as an excuse to procrastinate until the models are smarter, then you’re missing out on honing a massive set of skills you need in order to work with models effectively — even as they do get smarter.

In the next section, we’ll talk about a way you can get even more cognition out of today’s models, without needing to wait. You’ll have them solve even harder problems than you thought they were capable of, all because you didn’t give them enough of a chance before. Let’s take a look!

5. The Rule of Five: When in doubt, have the agent review its own work 5 times.

Jeffrey Emanuel discovered this powerful and unintuitive rule. He found that he gets the best designs, the best plans, and the best implementations, all by forcing agents to review their proposals (and then their work) 4–5 times, at which point it “converges”. It typically takes 4 to 5 iterations before the agent declares that it’s as good as it can get.

Jeffrey described a long, complex series of prompts for this process; I’m sure we’d all be appreciative if he publishes them. But the way he described it to me, you first make them do a task, then you do a series of focused reviews. Each review should be slightly broader and more outlandish than the previous one, or you can do it the opposite order. But you need a mixture of in-the-small and in-the-large reviews. You’re having it look for bad code (or designs), but also bad architecture.

To be slightly more concrete, Jeffrey first asks it to do a couple of regular code reviews, which find all the usual stuff. And you’ll notice right away that even on the second review it will often find things it missed in the first review. But I think most of us stop there, if we even ask at all. It definitely feels weird to ask for the 3rd code review, which is the agent’s 4th pass over the code, counting the generation step. But the 3rd review, especially during the Design phase, is where you start asking it existential questions about whether you’re doing the Right Thing throughout the project.

I tried it, and sure enough, it does take 4–5 iterations, just as Jeffrey described, before the agent will say something like, “I think this is about as good as we can make it.” At that point it has converged. And that, folks, is the first point at which you can begin to moderately trust the output the agent has produced. If you always take the first thing it generates, with no review at all, you’re bound to be disappointed.

I asked Claude what it thought of this Rule of Five, and Claude was enthusiastically supportive. Claude claims that this process matches their own cognition model, which is breadth-first: they solve each problem first in very broad strokes. And then they almost always need more passes for proofreading, refining, and polishing — much like humans do.

At first you’re going to want to do this purely with prompting. Maybe Jeffrey Emanuel will share some of his fancy review prompts. But over time, you’re going to want to automate it, since you’re applying the Rule of Five at every single step in the process, which at a bare minimum, for any nontrivial hunk of work, would be:

- 5 passes over the design

- 5 passes over the Beads implementation plan (this results in far better issues and dependencies, and better execution)

- 5 passes over the implementation (code + 4 reviews)

- 5 passes over the tests

- 5 passes for code health (might as well build it into your dev middle loop)

Yes, this is slower. Yes, this is more expensive (though, probably less so than all the rework you’ll be stuck with if you skip these steps.) Yes, it’s awkward to tell an AI to keep reviewing its work that it just reviewed.

But you should make sure you do it. Rule of thumb: demand at least 2–3 passes on small tasks, and 4–5 passes on big tasks. If you’re not super familiar with the language, the stack, or the domain, then you should err on the side of more reviews.

Do this, and it’ll feel like you’re using a model from the future. They will do far better work than they’ve been doing for you. Try it!

6. Swarm where you can, but beware the Merge Wall

I’ve been focused on agent swarming the past few weeks, after several months chasing quality and reliability without much success. I’ve got a new (third!) orchestrator in the works, and wow. Swarming. Next year is going to be extraordinary.

I’ll share a quick example of how powerful swarming can be when it’s working right. I had a disaster the other day where 30 Beads issues went missing. It was three or four major epics, each with a bunch of child issues. I had put a ton of work into their design, following the Rule of Five, and they were all ready to implement.

But I couldn’t find them.

I wasn’t panicked, since it’s hard to truly lose issues in Beads (we do have some bugs here and there but they are getting closed fast). Beads is all backed by Git, so it’s almost always possible (for the AI) to reconstruct what really happened from the git history, and fix it.

But I was concerned, because, where the hell did my 30 issues go? They weren’t deleted. After a couple minutes of increasingly alarmed searching, I finally figured out where they all went: My swarm had implemented them all! WTF?

There was a minor miscommunication, I guess; I asked my orchestrator to start working on the bug backlog, and it assigned all 30 issues to the eight workers I had already spun up. Some of these were quite complex issues. But while I was busy with other stuff, and not watching, the worker agents implemented and closed all 30 issues.

I was equal parts delighted and flabbergasted when I realized what had happened. I went and checked, and sure enough, they’d done all the work. It was pretty decent work and needed very little touchup — likely because I had used the Rule of Five throughout, and the Beads were in very good shape when it came time to implement.

After my 30 issues were magically implemented, I was sold. I would never not swarm again!

And then, of course, I was utterly unable to reproduce that perfect swarm. Subsequent attempts all ran into merge issues and required a ton of hand-holding and infrastructure tweaks. It will be a couple more weeks before I can swarm reliably. But still, I am completely sold.

I’ll know that my swarm orchestrator is ready to launch when I can swarm the web UI, building it from scratch. My system doesn’t have a non-CLI UI yet; well actually it does, in Emacs, but I doubt you want that one, however cool it might be. (It has Efrit inside it, so it’s pretty damn cool.) But I’m going to build a UI with the swarm, and that’s when I’ll know it’s ready for prime time.

The thing you have to be prepared for when swarming, is the Merge Queue problem. It’s like smacking into a wall. To illustrate, let’s say you have a simple swarm of 3 workers. One worker is redoing the logging system, another is changing the database API, and another is changing the client-server protocol. It’s likely that all three of these subsystems have some overlap, and changing one requires changing another. And their work will collide when they try to merge all the work together.

When you swarm a task, a key problem is that the workers all start from the same baseline (e.g. the same starting git commit), and they all do their work off that baseline. But each worker has the ability to change the baseline dramatically. Let’s say workers A, B, and C all complete and merge in their work. The system may now be completely different from the original baseline. When the fourth agent D finishes its work, a rebase may no longer be feasible. The system may have changed so much that D’s work needs to be completely redesigned and reimplemented on the new system baseline, which includes A, B, and C’s changes.

This is why you need the Merge Queue. You need to serialize the rebases, and give each worker enough context, and context-window space, to fully merge their work into the new baseline.

Some work is inherently parallel, and some work is inherently serial — the latter because of irreducible complexity and task overlap. If you think you’re going to be stuck with an awful merge, then you should probably defer some tasks until the earlier ones complete. But it’s not always possible to tell in advance, so sometimes you’ll have tough merges.

I’ve noticed that projects tend to go through a cycle where they are swarmable for a while, but then you’ll suddenly need to pause and serialize all work for a time. This can happen, for instance, if you’re changing the directory layout of your project — e.g., to make it more accessible to AIs who are trying to guess their way around. You might need to experiment with a bunch of different layouts. But each new project source layout changes all your package imports, scripts and other inter-module references, which would totally break any existing workers. So you have to pause all other work while you do the big package restructuring.

You can think of swarming as a MapReduce-type operation. In the mapper phase, you can spin up virtually unlimited workers. But in the reducer phase you need to merge their work all back together. Unfortunately, as Gene observed, this isn’t really a MR because most MRs have a very simple reduce phase — the workstreams have a monoidal shape, and you can merge their work by doing things like summing counts or whatever.

But with agent swarming, the reduce phase is a nightmare; it’s the exact opposite, in fact: it can be arbitrarily complicated to merge the work of two agents. In the limit, what should we do if Worker A deleted an entire subsystem, and Worker B comes along with a bunch of changes to that (now-deleted) subsystem?

So the swarm merge step is often messy and not entirely automatable. Some cases require either human judgment, or else really good context for AIs to make the call.

I don’t know if we’re going to get a tool that hides the mess. I’ve been talking to investors, many of whom are keenly interested in the next generation of developer tools, and there is a prevailing belief that all we need are proper guardrails, and then these kinds of agentic coding and swarming tools will be accessible to “average” developers, which they certainly are NOT today.

And why is that? Well, as Joel Spolsky observed in Things You Should Never Do Part 1, reading code is by far the hardest part of coding. This is a well-known finding in the Dev Productivity world; they’ve done study after study. And with vibe coding, reading code is… pretty much all you do all day. It’s hard for most developers. The average dev probably thinks 5 paragraphs is an essay. Coding agents make you read enormous waterfalls of both text and code. This is absolutely draining and beyond the capabilities of most devs today.

However, I don’t see eye-to-eye with the investors on this one. I personally do NOT think we will get useful guardrails. If you try to build something with heavy guardrails, you’re going to wind up with Bolt or Lovable, and nobody will use it. Sorry! That’s just not the right model. Instead, I think we’re going to get orchestration tools that are every bit as powerful, messy, quirky, and frustrating as Claude Code and the current batch of terminal-based coding agents.

And the people who figure out how to use these tools, despite the lack of guardrails, will become super-engineers. I’ve been kicking around the idea of a new blog post, the Rise of the Superengineer. Dunno if it’s worth a whole post, but what’s going to happen in 2026 is that a new class of 100x (or maybe 1000x) engineer will emerge — people who have figured out how to wield coding agent orchestrators effectively, deal with the merge problem, planning, swarming, code health, etc. — all the stuff I’ve talked about here, and more. And they will be able to run 100 coding agents at once, and get meaningful work done with them.

This will make them as productive as a team of 50+ regular engineers.

I think my own orchestrator will usefully peak at around 50–80 agents. Maybe I can get it up to 100. It’s not aimed at massive swarms; it’s aimed at leveling you up from manually managing a dozen ad-hoc agents in ad-hoc repo clones all around your filesystem, to managing swarms of well-behaved agents working 5–10 at a time on focused tasks. It will still require your full attention, your full engineering background, and every bit of design taste you can muster, to use these tools. In some ways it’s even harder and more dangerous than using a single coding agent, even with tooling support.

But some people are doing it already! By hand, to be sure, or by building their own homegrown orchestrators. Mark my words, though: next year, you’re going to have engineers who can build an (and likely maintain) an entire company’s software on their own. You’ll have solo unicorns, sure, but also a marketplace of solo uber-contractors who can build companies things they would have had to pay someone like Accenture tens of millions of dollars for.

There will also be small teams of people who figure out how to maximize their velocity when multiple humans work with agent teams. And these small teams are going to change the world. Gene and I are actively wondering whether company size is going to decrease on average, because you will be able to get so much more done with so many fewer people.

But no matter what, the tools are going to be messy from now on. Working with AIs is a little messy and nondeterministic. And I think that’s here to stay.

Wrap-Up

Gene and I went through at least a baker’s dozen ideas this morning, and I’ve chosen the half that seemed the most baked. A few others are becoming clearer, but are still so vague that we don’t really have the right vocabulary to talk about them yet.

Change is coming. Agents are way more powerful than they were 3 months ago. I’ve talked with plenty of (good) engineers lately who still believe that agents have plateaued. Ignoring the 30 years of evidence showing that AI is following Moore’s Law, they feel it’s just going to stop getting better today, out of nowhere. And in their opinion, agents are not good enough yet.

But if you’ve been following and using agents since they landed in February, you’ll know just how much more powerful and capable they have become, even since summertime. It’s not plateauing; heck, it’s not even slowing down. And you can prove it using your backlog of projects that are too hard for AI. Every few months, another one will fall, until there are no more left.

If you’re one of the many engineers who still hasn’t made the switch to AI-first coding, now is a good time to try it again. If you haven’t used an agent in a few months, you’re going to be shocked at how smart and capable they have become. They are full concierges now, able to help you with any computing-related problem. People tell me they even use Beads for their personal TODO lists!

My orchestrator is right around the corner. I’m excited for it. It’s going to make a splash. Hopefully this Christmas!

But you’ll only be able to use it if you already use coding agents for literally everything. If you want to be a 100x super-engineer next year, you need to start learning vibe coding basics today, and make it work for you. Keep in mind all the advice I’ve given here, and read our Vibe Coding book, which just came out on Oct 21st. It’s fresh and relevant, and will help you get into the right mindset with the right techniques and practices.

More to come, soon.

This stuff is so fun!

Comments

Related Stories

Comments

Related Stories

Comments

Related Stories

Code

Space gain

Performance loss

Comments

Related Stories

Wall St Journal Investigation Published the Details

Comments

Please support my work—by taking out a premium subscription (just $6 per month).

Comments

Related Stories

The United States no longer cares about the European project

Russia and China together are the real menace to Europe

How Europe can resist the siege

Comments

Related Stories

December 8, 2025

Comments

Related Stories

Comments

Related Stories

Comments

A320 software upgrades

Average homebuyer age

3D printed legos

Filings coherer

Comments

Related Stories

Comments

Related Stories