As I say at the beginning of this interview, it’s annoying for economic analysts that two huge things are happening at the same time: a radical change in U.S. trade policy and a giant AI boom. Worse, while I think I know something about tariffs, the more I think about AI the less I believe I understand. So I talked to Paul Kedrosky, investor, tech expert and research fellow at MIT, for some enlightenment. Lots in here that I found startling.
Transcript follows.
. . .
TRANSCRIPT:
Paul Krugman in Conversation with Paul Kedrosky
(recorded 12/03/25)
Paul Krugman: Hi, everyone. Paul Krugman here. I’m able to resume doing some videos for the Substack, and today’s interview is based on me being really annoyed at history. If only one big thing would happen at a time. Unfortunately where we are now is, on the one hand, we have tariffs going to levels that we haven’t seen for 90 years, which should be the big story and where I feel fairly comfortable; but then we also have this AI explosion where I feel completely at sea. I don’t quite understand any of it. I’ve been reading and watching interviews with Paul Kedrosky, who is an investor, analyst, and currently research fellow at MIT, he certainly knows more about it than I do, and I wanted to just have a conversation where I try to understand what the heck is going on, insofar as anybody can.
Hi, Paul.
Paul Kedrosky: Hey, Paul. Both of us “Paul K.,” that’s dangerous.
Krugman: Yeah, welcome on board.
Kedrosky: Thanks for having me.
Krugman: Let me ask first, I have a really stupid and probably impossible question, which is that at a fundamental level what we’re calling “AI”—I think you usually use generative AI, large language models, although they’re not just language now—but at a fundamental level, I don’t understand how it works. Is there a less-than-90-minute explanation of how the whole thing operates?
Kedrosky: There is and I think it’s really important because it helps you be a more informed consumer of their products as a result. I think a really good way to think of these things is as grammar engines and I often call them “loose grammar engines,” meaning that there’s a bunch of rules in a domain that I can instantiate in the form of, whether it’s language, or whether it’s the law, or whether it’s software engineering: these are all grammars, when you abstract away from the nature of how we use them, meaning that they’re actually rules about what’s going on. If I ingest all of that and pull it into a giant network of matrices that weight all of this, then I can therefore do what we call “training on its basis,” it makes pretty good predictions about how that grammar might imply what we should be doing in terms of the “continuations”, the next thing that might be generated, whether it’s a subroutine in software, a PowerPoint slide, some language in an English presentation, or even loosely in the context of an image.
But it’s all this idea that these things are “loose grammars” that are reasonably good at predicting what should come next, the continuations based on the data they’re trained on, which tells you a lot of things about what they’re good at, and it tells you a lot of what they’re bad at.
Krugman: It’s a little bit like if you give me four words of a sentence, correlations out there will tell me what the next word is likely to be. But it’s a lot more elaborate than that, right? There are multiple layers, as I understand it.
Kedrosky: Right. It’s like the old Einsteinian expression of “spooky action at a distance,” where it’s not just the proximity, in terms of the very next thing that’s coming, we call these “tokens,” it’s also about the entire holistic context in which that language is embedded in the context of the grammar. So things that are far away actually have a surprising influence in terms of what might be the next tokens.
So it’s not something as simple as saying, “that box is red, you know that a color should come up next.” It’s not that simple. It has a lot to do with the entire context on which it was trained. In-turn, this ‘spooky action at a distance’-thing tells you about what it might look like. It turns out—this was the surprising thing that in a weird way, surprised even Google in 2017—when the original so-called Transformers paper that led to a lot of the recent developments in AI rose to prominence was that it was created for language purposes. It was created to use in the context of their Google Translate application. They thought, “this is kind of nifty. It doesn’t work too bad for that.” But the idea that embedded in language itself, through this idea of “near and far prediction” and this “spooky action at a distance,” this idea of attention could actually capture a lot of what we call knowledge, and therefore a lot of what seems like inference almost, was surprising to everyone, which is why Google kind of let things go by the wayside.
It took until it appeared inside of other companies like OpenAI, until the technology had a huge impact. So it’s not as simple as just predicting the next token, it’s this idea of: in the context of these attention mechanisms that look at the entire body of where this information is embedded, whether it’s English language or software or the law or any of these domains that you can actually get something that feels to us like, “oh, it understands what I’m thinking or understands the question I’m asking,” which is really just a reflection of—in the context of these large corpuses—what prediction feels like. It feels like this is a kind of continuation of what a normal person would think. What’s interesting is that when I have a colleague doing work on this, if you back sample who it thinks you are—if you think about it in the context of the training models—it has a rough sense that you’re like a 37 year old guy on Reddit. That’s the kind of person that it’s—in this sense—doing the continuation for, because that’s a big chunk of the training corpus. So if you back-engineer out of it what the data actually suggests about it, that can also tell you something. So I often tell people whenever they send me a message like, “a large language model said I should do x, y, z.” For instance, this should be my next car, or this is the answer to the essay question: what you’re really saying is, “a 37 year old guy on Reddit said it,” and you’ve got roughly the same amount of information, so it can be good, or it can be really fraught.
Krugman: We have all these stories about ChatGPT (or similar) telling people what they want to hear and giving them really bad advice. “Guys that look like you tend to make the same mistake,” basically.
Kedrosky: Exactly. Of course, it’s even more fraught now because of the nature of training and how we’ve increasingly exhausted the number of 37 year old guys on Reddit. A lot of the optimization in models now is in what’s called “post training.” So what goes on after the model has been created, where I go out and I say, “here’s the response it will give you to this particular prompt, do you like it?” We call that: reinforcement learning with human feedback. That leads down a path no different than being a professor at MIT obsessed with student ratings. You can become very sleazy, right? All of a sudden now all you care about is whether or not your students like you. This is a dangerous path for all the reasons we know, and it’s no different in the context of models. So not only is there this issue of the corpus itself being very centrally trained with respect to that group, but the models are increasingly trained in the post-training world because we’ve exhausted a lot of the pre-training data—there’s only so much of that out there—that the models become “sycophantic.” They’re tail-wagglingly eager for you to love them. That’s what we’re seeing increasingly.
Krugman: Oh, boy. What strikes me—and I’m by temperament just a skeptic about all these things—but I paid attention out of the corner of my eye to artificial intelligence, and efforts there, for a very long time through decades and decades of immense frustration, when being able to recognize “this is a cat” was a basically an insoluble problem. Then all of a sudden all of this stuff becomes absolutely routine, which is just mind boggling.
Kedrosky: The analogy I make is that we—via the Transformers paper—stumbled into a kind of Saudi Arabia of data. The right way to think about it from my standpoint is that Saudi Arabia of data was the public internet, which suddenly became useful as training data in the context of these massive models that required huge amounts of data and improved on the basis of scaling, meaning that a 10X improvement in the amount of data you trained on led to a predictable increase in the capacity of the model to make what we would call “useful inferences.” That was novel because we could never do that in the past. So the Saudi Arabia of free textual data, no different than any other reservoir, whether it’s the Permian Basin, etc., we’ve increasingly exhausted that data. What you’re seeing now is those old scaling laws, the goddess from 2017, 2019, 2020, GPT-1, all the way up to the present are producing less and less bang for the buck, no different than any extractive model where the remnant of that reservoir is much more expensive to get access to and probably more polluted, probably less useful, probably requires more refining. This is exactly the same, and that’s the point at which we are now.
Krugman: Funny story, I actually knew people who, not worked on but were close to the original Google Translate stuff, and their initial big resource—at least they told me—was OECD documents. Because of the multinational thing, everything is said in four languages. So it was kind of a Rosetta stone.
Kedrosky: No, you’re right. It was a tremendous training corpus for those models. So again, much back to the 37 year old guys on Reddit, once you understand the nature of what’s under the hood, it tells you a lot about why these models are useful and where they are less-so.
The other point I’d make is that it also helps to understand the nature of what training means, because we throw that word around a lot. Training follows this idea of what’s called “gradient descent,” which is that as I make changes, as I do training cycles, incrementally how much improvement do I see, and at what point does it stop or even reverse? In certain domains, the data has a really high rate of gradient descent, meaning that small changes provide a huge signal back to the model. So they’re very good at those things. A good example of that is software itself. If I make minor changes in code, I don’t get minor differences on the other side, I get broken software. So there’s a huge signal that flows back into training when you make minor changes in software, so the gradient descent is very sharp, which makes the models much better on relatively limited data. The English language itself is the exact opposite, if I make minor changes in language and I ask you which one’s better, you’d say, “oh, I don’t know, maybe this one, maybe that one.”
So the notion of learning from language itself versus learning from software is very, very different, which is incredibly important because it tells you why these models are great in the context of software itself, because the gradient descent of learning is so sharp and why they’re so equivocal and sometimes even dangerous in language, because we don’t have that same ability to learn from relatively small morsels of information. And it takes you to the next step, which is why benchmarks themselves in AI are so, I’ll say conflicted, because software is such an extremely good place for models to run, that’s saying “this model is very good at software, therefore, we’re on a path to AGI,” shows a profound misunderstanding of the nature of large language models. Of course they’re good at software. There could hardly be a better domain for training a large language model than software.
Krugman: By the way. Just in case there are some listeners that don’t know, AGI is artificial general intelligence. That’s the holy grail, and I think you’re one of the big skeptics about this being at all what we’re heading towards right now.
Kedrosky: Very much so, for some of the reasons I’m describing, that the nature of large language models, that architecturally—for reasons of data set exhaustion and for reasons of declining returns for increasing investment—we’re kind of at a cul-de-sac already. We’re seeing that happen. So the notion that I can extrapolate from here towards my own private God is belied by the data itself, which shows you that we’re already seeing this sharply asymptotic decline in the rate of improvement of models outside of software, but in almost every other domain.
Krugman: Since we’re talking about investment in terms of the economics and the business side, one of the things that we tend to “think of thinking”—or whatever it is, to the extent we think of this as some kind of thinking-like process—we tend to think of that as being kind of immaterial, as existing in a pure, nonphysical domain. Yet the whole thing about all of this is the extreme physicality of it. We’re talking about particularly huge amounts of capital being deployed, huge amounts of energy being consumed.
Trying to estimate how much CapEx is coming from AI is a huge pain. You have one of the most widely cited estimates but it’s looking a little stale now. I can tell you about why I find it a problem, but why don’t you talk about what’s involved?
Kedrosky: We have this prodigious amount of spending going on, and that was one of the windows through which I got interested in the investment side of this stuff, because it seemed as if it was so large that it was having an impact on economic data itself. I was looking at that early this year—I just did yesterday or the day before—there was a new OECD report on the US showing that in the first half of 2025, the US was arguably in a recession absent AI CapEx spending which was scarcely a ripple in terms of people saying, “hello, we’re running a giant private sector stimulus program that’s keeping the US out of recession,” and yet no one’s talking about it in those terms.
The analogy I make all the time is that when you don’t understand how large AI CapEx is and how consequential it is, you have the causality of policy all messed up. You don’t understand that the thing that’s actually driving the US economy is not the thing you think it is. I often make the joke that it’s like my dog who barks when the mailman comes to the house, and then the mailman leaves and he thinks it’s because of the barking. It’s like, “no, the mailman leaves every day.” It doesn’t matter whether you bark or not, they always keep going. You just have a bad model of causality in this context. That’s no different than what’s happening now in the world of macro, with respect to the role of AI CapEx in the US economy; for example, if you want to believe the tariffs are the primary reason why the US did well in the first half. If you’re of that no-partisan mindset, you’re ignoring the substantial role of AI CapEx on an annualized basis, probably being over $1 trillion, which made it more than half of U.S GDP growth in the first half of the year, which again kept the US out of recession, arguably from a single sector in private sector related spending in that which is just remarkable to me and is really fraught whenever you try to apply another lens and say “no, it was because of this or it was because of that.” No, this was the reason and it helps to explain job growth with someone even in the first half and continues to be, data centers are not a huge job creator. It’s all of these reasons for the capital intensity associated with this one particular sector.
Krugman: What drives me crazy is that you look at the standard, the way the data is cut—basically you look at national accounts—and I’ve seen people say, “oh, well, let’s take communications and information equipment plus software.” But that’s wrong in both directions. Some of that stuff is not AI, on the other hand there’s just a lot of construction of buildings that’s part of AI.
Kedrosky: You can back into it with things like nonresidential fixed investment and try to come in through that angle, which are also fraught. At least one of the ways I tried to triangulate it was build up from the numbers released by the companies themselves, because they’re so eager to brag about how much they’re spending. We can talk about why that is, I think it’s partly a deterrence thing: “I’m willing to spend so much to dominate this market that there’s no reason for you to spend anything at all.” It’s this O.K. Corral phenomenon of trying to deter people from actually contesting the market with you. So you make these giant preemptive announcements, partly to hoard capacity, but partly to deter competitors.
But nevertheless, we’re in this unusual moment where they’re willing to tell you what they’re doing, in a way that actually creates some data you can aggregate up and seize and say, “this is what’s going on with respect to spending” that you might not otherwise see, certainly not from the national accounts.
Krugman: Some respectable people have tried very hard, and I concluded that the BEA data is just not cut in a way that lets us do this, and we have to do something like what you’ve been doing.
Kedrosky: There’s other problems with the data too, which is really amazing to me. There’s also an ongoing business trend survey that’s been coming out where the Census Bureau added a line on AI adoption, trying to be helpful back in 2022. It’s showing that AI adoptions actually began plateauing at around 18% of large corporations already in the third quarter of 2025, which seems ridiculous, obviously, for a host of reasons. But nevertheless, when you go back and look at the actual survey item, you realize that it wouldn’t be out of place ten years ago, it’s about SQL dashboards and all of these machine learning technologies that were ancient ten years ago. So even the attempts to improve the data aren’t very compelling.
Kedrosky: So we’ve got bad data, both in terms of adoption and bad data in the national accounts from the standpoint of what’s actually being spent. An ongoing problem in general is that a lot of our economic statistics are really designed for the economy of 1929.
Kedrosky: That’s right. (laughs)
Krugman: We’ve got an infinite number of categories of textiles. (laughs)
Kedrosky: Yeah, tremendous data on textiles. Not so much on recent adoption of large language models, which is fine; I understand that, but nevertheless, when you introduce a new survey item in 2022 and say that this is oriented towards current adoption of these emerging technologies and it’s all about ancient machine learning technologies, it’s not going to tell you very much.
Krugman: Quick question. Do you have a sense—this may be unfair—of the AI numbers that you’re looking at? How much is equipment and how much is structured as buildings?
Kedrosky: So you can come at it from the standpoint of the data centers themselves, roughly 65-70% of the cost of a data center is specifically the equipment.
Krugman: So it is mostly equipment.
Kedrosky: It is mostly equipment. Obviously the primary beneficiary of that are companies like Nvidia: GPU manufacturers. So it is mostly equipment. Again, there are issues with respect to that being the primary, because obviously there’s a relatively short timeline over which those technologies must be replaced. Michael Burry of “Big Short” fame has been out chattering about this stuff.
I think it’s somewhat misunderstood what’s going on, but nevertheless, I sometimes say “a data center full of GPUs is like a warehouse full of bananas, that’s got a relatively short half life in terms of its usefulness.” That’s important to keep in mind. That’s what makes it different from prior CapEx spending. Moments like railroads, canals, rural electrification, take your pick, because of the nature of the perished ability of the thing that we’re investing in.
Krugman: So let’s talk about chips. As a technological naif: a chip is a chip. RAM is one thing, memory chips, which are commoditized, although I gather there’s a shortage of them now globally?
Kedrosky: Yes, there is in particular these, what are called, HBM, these high bandwidth memory chips, which are the ones that basically interconnect these GPUs and allow them to parallelize the training process. But yes, there’s a shortage in those, not in PC RAM, but in high bandwidth memory.
Krugman: Then there’s GPUs and TPUs—which I don’t quite get. They’re basically these specialized chips that do computational things or I guess GPUs—the G is for general, so less specialized, but still that are much more elaborate.
Kedrosky: It’s actually for “graphics processing units,” weirdly enough, the origins of Nvidia GPUs were back in the day whenever everyone thought the world was going to get taken over by 37 year old guys on Reddit with giant machines where they’re playing games on their personal computers at home. So the reason why GPUs are so good for training is because they were created to be very good at manipulating real time graphics on a screen, which is just a giant set of matrices in terms of the calculations of the positions on the screen, and researchers figured out fairly quickly, “wow, that’s actually useful for doing huge amounts of matrix math,” which underlies most of machine learning and thus large language models. So GPUs really were almost an accident of history in terms of their role in the context of large language models emerging from the graphics world.
Krugman: One big insight that I got from you is—until like a week ago—I understood that these chips depreciate fast, but I thought it was going to be basically depreciation through obsolescence. But it turns out that it’s just very, very different. Do you want to tell us about that?
Kedrosky: Yeah that’s really important because there’s this idea that the reason why this is a warehouse of bananas—or whatever your favorite fruit is in this context—is due to the pace of change in technology. That’s kind of a trope, “oh, everything changes quickly. I have to throw out my phone, my laptop.”
That’s not really the primary driver in most of what are called hyperscale or the largest data centers run by people like Google and Meta and others. You have to think about it in the context of the workload, what’s actually happening inside the data center, and it can loosely be split in two ways: there’s the training aspect of what goes on, so where I’m training new models or enhancements to old models using giant amounts, at least 10 to 20,000 GPUs inside of one of these data centers; and then the other chunk of the activity inside the data center is inference, which is responding to requests I might make when I write some nonsensical question to a chat AI, like Claude or whatever. So those are the things that loosely split in terms of the two things going on inside the data centers. Chips are underlying both of those. But from the standpoint of the wear and tear on the chip, those are very different activities. The analogy I often make is, let’s take training as an example, if I take training and I’m using that for a job, I’m running the chip flat out 24 hours a day, seven days a week, which requires an immense amount of cooling, incurs a lot of thermal stress (heat-stress), and then inference: I’m running it more episodically, maybe more in the day, less at night. People aren’t making as many requests at night, so the load changes fairly dramatically.
So the analogy I make is, imagine both the chips were used for 50 hours for training and 50 hours for inference. Now imagine a car in the same circumstance. I raced a car for 50 hours in two 24 hour races, or I took it every Sunday to church for an entire year, roughly 50 hours, let’s say it’s a half hour there and back. Which car would I like to own? I’d like to own the one that went to church on Sundays, even though it’s 50 hours is 50 hours. Because I realize that racing a car for two 24 hour races, even though the car’s only been run for 48 to 50 hours in a year, is a very different requirement with respect to the stress incurred.
When you use a GPU for training, it’s like those two 24 hour races on your car versus taking it to church for a year on Sundays. And so what happens is, and the data is fairly clear about this, there’s a distribution with respect to a long tail, where some chips last for quite a while, but there’s a high failure rate in the first 2 to 3 years, with a mean time between failure of about two and a half years or so; so long before we might be saying to ourselves, “oh, look, there’s a hot new chip out there that I want to replace this thing with,” you’re actually seeing a steady drip of chip failures. So let’s aggregate up. Imagine you had a 10,000 or even a 20,000 GPU data center. You should expect on the statistics a chip to fail about every 3 or 4 hours. So long before I get to the point where I’m rapidly turning these over because there’s a new generation of chips, I’m turning over a vast chunk of my chips just because they’re failing under thermal stress. This is because these workloads are like running my motor flat-out in that car. It’s high heat, it’s a lot of stress, things begin to break down. This leads to the turnover long before generally speaking, you might turn it over just because there’s some hot new chip out.
Krugman: Wow. So basically as you say, “it’s the training rather than the inference that’s an issue.” But the training is basically just running chips hot. They get heat stroke more or less.
Kedrosky: They get heat stroke and they and it can be really insidious because they don’t necessarily break catastrophically. It’s not like your car suddenly stops. They can actually slow down. You don’t realize that it’s not running as fast as it once was. So there’s all kinds of ways in which it requires a lot of work to figure out, “oh, that chip is running at a subpar level,” so it’s not as neat as, “it just blinked out of existence. Now I need to hot swap in and replace something.” It’s not quite that neat and tidy, which makes it even more time consuming and complicated to make the replacement. But that split understanding and the difference in terms of what it does to GPUs—to chips in the data centers—and thinking about it almost in the context of cars going to church on Sunday versus taking it in a 24 hour race is incredibly important because it tells you a lot about the dynamics of what we should expect in terms of the future capital costs associated with replacing the GPUs and data center. There’s an ongoing replacement wave not driven necessarily by technological change, but driven by the actual thermal stress on the chips themselves.
Krugman: Okay, so we have lots and lots of analogies, with the telecoms boom of the 90s. We all said, “well okay, a lot of companies went bust. The returns never added up.” But on the other hand, you had all this fiber in the ground, which eventually became useful. But you’re saying basically that’s not what’s going to happen here. What we’re going to end up with is a bunch of burned out chips.
Kedrosky: That’s right, a bunch of burnt out cases. That’s exactly it. It’s kind of like The Big Lebowski as a chip, where it’s like, “I’m not sure what’s going to happen here, but all I know is this guy’s long past his due date,” and so that’s a part a big part of the problem here is not just that technological change makes this 10,000 GPU data center less useful, it’s that it’s also gone through cycles of thermal stress, and likely its lifespan isn’t particularly long anyway as a result of what it’s already done. There’s a double whammy here that will make it less useful. So the response of the technology industry to that is generally this idea, they say, “well, that doesn’t really matter that much. What we’ve created is a powered shell. It’s this giant building that’s got power, it’s got cooling, it’s got all the things. So we can just hot swap in all of these GPUs again in future.” And that’s of course assuming away the problem, which is that 60 to 70% of the cost of the data centers—the chips themselves. So I’ll give you the power, the electricity there, the cooling and the walls and the concrete. I’ll give you all that for free. You still have the preponderance of the cost in front of you in terms of replacing the GPUs.
So the notion that I built a fixed asset that’s perpetually useful is really dangerous. I hear this a lot in particular from regional economic development officials who are talking about why they’re offering really extreme subsidies and tax abatements to hyperscalers to build data centers in their area. And they talk about them as—and I’ve heard this expression so many times—that data centers are “the factories of the new industrial revolution.” The analogy is just so fraught, for this exact reason that there isn’t that longevity that you would hope from this—leaving aside the analogy is bad—there isn’t that kind of longevity for these reasons.
Krugman: I think Jim Chanos may have been the first to say this to me, but I know other people have said it. It’s like shale wells, in which a lot of people lost a lot of money because it turned out that shale, gas or oil well, doesn’t keep yielding the same way that a conventional oil or gas well does. It depreciates really fast.
Kedrosky: That’s just another extractive resource economy. It’s an extractive resource economy in surprising ways. So not just in terms of the nature of a declining return from the GPUs themselves, but also the declining return—as I was talking about earlier—of these giant training sets that allowed us to scale up the so-called scaling laws for large language models that got us to the point of GPT-4 and 5 or Claude—that there’s a declining return on that, at a much higher cost.
The cycle times are longer, more training cycles. The amount of cost is higher. So in both ways, the extractive economy that underlies all of this is producing declining returns, which in the context of shale, wasn’t just the one point of failure with respect to declining returns to extraction. There are multiple ways you begin to see that. It’s masked by the capital expenditures because people try to spend their way out of the problem. So I’ll run more training cycles to produce better data. Of course, that doesn’t work. Then they go into the mode of—like Elon Musk has been doing with his Grok model—where I’ll spend half the time doing post training.
So instead of relying on finding new data, I’m going to do all kinds of work to make it more sycophantic in terms of the response it gives to people. If you look at his training data, almost 50% of the of the training cycle time on the latest Grok models was about 50% from post training, which can work but in the limit leads to these kinds of obsequious and sycophantic behaviors that make the response at best unstable and realistically not very helpful.
Krugman: The last number you had was a little bit bigger as a share of GDP than the telecoms boom of the 90s. But presumably you think it’s higher than that now?
Kedrosky: I do. It’s something like, nonresidential fixed investment, probably around 14% now. So, we’re considerably ahead of where we were in the telecom bubble, we are somewhere between rural electrification and World War two rearmament.
Krugman: But not yet like railroads in the 19th century.
Kedrosky: Not yet like railroads, but on a path to a similar place. Given that—and this is really important—we’re in that point where there’s a financial flywheel now, where increasingly the financing of these data centers is in a place where it’s somewhat divorced from what goes on inside the data center, because we’ve created a financing template for how to finance data centers, where you have these SPV, these special purpose vehicles into which a third party contributes capital and the tech company contributes technology. And out the other side magically pops these securities that have great income and yield characteristics that are hugely attractive to investors. They look at it as almost like a synthetic security, where I understand it’s the SPV, the data center that’s producing this.
But on the other side of this is Meta and Google and they’re a prime credit. They’re a really strong credit. So they’re going to keep paying on this. I don’t really care what goes on inside the data center, because I have a lot of confidence in the counterparty in this. We know where all of this kind of thing leads when you have these financing flywheels driven by securitization and high yields and people not caring what goes on inside the actual structure itself: it leads to a lot more construction and eventually over-building.
Krugman: Oh, God. I’m having flashbacks to 2008, 2009. All of the stuff that was “perfectly safe” because after all, AIG was backing it, right?
Kedrosky: Right, exactly. Very much so. When you have the same phenomenon of this look-through mechanism, where you have these legal vehicles, where people look through the legal vehicle and say, “oh, well, it doesn’t matter, because on the other side of this, it’s Google and Meta,” and it’s even more insidious. Some of the private credit providers have been straight up about this, that in the contractual terms that underlie the data centers, if you were to cancel early and no longer continue as a technology company using one of these centers, there are whole provisions which basically force you to pay, in some sense, the net present value of the future lease payments back to the private credit company. They’ve been very clear about—this actually works out in their benefit given time value of money, that actually they don’t mind if you walk away from it early and make the payment because “now I have more capital to do more building.” So in a weird way, there’s a perverse incentive in the system to make bad loans.
Krugman: I do want to ask about circular financing, except that I’ve been looking at all of these pictures showing the money flows among all the players and my eyes were glazing over—I’m supposed to be good at this!—but there is some sense that things are kind of inflated by taking in each other’s washing, is that wrong?
Kedrosky: No, it’s absolutely right that we increasingly will see circumstances where an Nvidia will make an investment in a provider with the provision that they use Nvidia’s chips. In turn, that becomes their primary source of semiconductors for their training centers. Then that in turn feeds back and leads to more buying. And so round and round and round it goes. It gets very sort of incestuous and complicated because we have all of these interlocking combinations. But the reality is it creates the impression of much more demand than there is. And it’s done in part for strategic reasons, because Nvidia is trying to block up a position in the marketplace where it says, “there’s really no point in even looking at a Google chip or an AMD chip or anyone else, because look how much we’re dominating the market, and look at the lengths to which we’re prepared to go to make sure we continue to do that.” So it’s not so much that it’s some kind of malfeasance, it’s just this kind of rogue strategic move that ends up causing this impression of more growth than actually exists, because these companies all believe there’s a kind of land grab, literal and figurative, going on right now, that I need to make sure I populate these things with my technology now, because who knows what other opportunities I’ll get to do it in the past or in the future.
But all this tends to do is create this circularity, and round and round and round it goes. It becomes very difficult to get a true sense of actually what demand looks like. That’s made worse by this hoarding that’s going on where people don’t know what the demand is going to look like in future. But they do know that there’s relative scarcity of access to power. So I want to make sure I lock up every location I can now, and we’ll just let the chips fall out—no pun intended—how they do in future. So there’s this hoarding phenomenon that’s going on, which also leads to overbuilding, this circular phenomenon, and even leads to this kind of Chinatown-like speculation with respect to land grabs that might one day turn out to be useful.
We see the emergence of these companies called powered land companies, which are kind of analogous to what went on in the days leading up to LA taking over the Owens Valley’s water supply, where you show up with numbered companies and you buy up locations and no one knows exactly what you’re doing, and it’s all in anticipation of eventually one day someone wanting that and you say, “haha, I’m already here and I’ve already got the rights to access to power here and so if you want to build a data center, away you go,” and we’ve seen there’s a whole host of these so-called powered land companies that have no interest in building data centers. They just want to kind of go through a Chinatown-like model of preemptively buying the land in anticipation of an eventual buyer showing up.
Krugman: Wow. Power, that’s one of those things that I was completely caught off guard by was the sheer power requirements and how that becomes a constraint.
Kedrosky: So part of the problem is the technology industry itself isn’t used to anyone saying no, they’re kind of like a petulant toddler. So the problem is that that power is the connection to the real world of what’s going on, and so these things have to be grid connected. We have to get power from somewhere. We’re looking at certainly hundreds of megawatt buildouts, but also even into the gigawatts. This is obviously far in excess of what you can straightforwardly attach to an orthodox grid. At the same time though, there’s this huge temptation on the part of utilities to say, “we’ll take this because of the predictability of the load and the high quality of the credit makes it really appealing.”
But then the problem becomes, I have to make whole. So now I have to turn back and probably increase rates to my ratepayers, which is why we’re seeing soaring electrical bills all over the place. We’re even seeing people pushing back and saying, “I don’t want to have data centers connecting in my region” and that in turn turns into what’s called “behind the meter power,” which is you show up but you’re supposed to bring your own power. Well, that’s easier said than done. It turns out it takes a long time to build a nuclear generating station. It turns out that it’s like 4 to 5 years now to bring in natural gas. So people connect to the grid now with the promise that it will eventually be self-sufficient. But who knows whether they’ll ever be self-sufficient. So you get into these perverse situations, like recently in Oregon, where Amazon connected three data centers to the grid and has now registered an official complaint with the Oregon PUC because they can’t get power for any of them, but they were promised. So this is the beginning of what you would expect to have happen, because the temptation to take on these loads is immense, but the loads themselves are so large that it’s not straightforward how you attach it without actually changing the bills back to ratepayers.
Krugman: Yeah, the utilities may like it, but the governor elect of New Jersey probably doesn’t.
Kedrosky: That’s exactly right. Then you get even crazier situations, like a recent one with Allegheny Energy and Power, AEP, where you actually have utilities speculatively buying futures with respect to providing power that they hope will be used by data centers. The data center demand doesn’t show up, and so they in turn turn around.
This is happening right now with AEP. They’re trying to dump that power back into another interconnect. So it’s essentially a secondary distortion of a market. Because they have 700MW of power that’s just burning a hole in their pocket. But that’s because they were borrowing speculatively, trying to control some power such that they could then turn around to data centers and say, “hey, come here.” That didn’t happen, and now they’re dumping power, which is distorting another market.
Krugman: So we have a big problem of power. We have probably much faster depreciation rates than are being built. The question is, what is the prospect of the stuff actually generating the kinds of returns that would justify the investments?
Kedrosky: They’re low, this is why you get into these perverse conversations, which I seem to get drawn into all the time about what that might look like. So you get people doing these top down models and saying, for example—and this one just makes me crazy—that “the TAM (the total available market) for global human labor is like $35 trillion.” What if we get 10% of that? That would be a $3.5 trillion revenue stream, which just for a host of reasons, are indefensible ways of approaching this. It’s partly the old mistake of saying, “if I just got 5% of the Chinese market, I would be a huge business.” Well, no one gets 5% in the Chinese market. You succeed or you fail. But it doesn’t work that way. Same thing with this 10% of the global labor market. But more fundamentally— and this is more your bailiwick than mine—is that a $35 trillion market into which AI makes huge incursions is no longer a $35 trillion market. It’s a massive deflationary force. You have 10% of something, maybe, but I have no idea what it is anymore.
So the idea that you can predictably say, “I will continue to pay as much for labor when it’s done this way versus that way,” just seems naive, at best inept, really self-serving at worst. So all of these models about trying to come up with a defensible, whether it’s top down or bottoms up models where people say, “well, what if 5 billion people worldwide are all paying $100 a month for some kind of large language model subscription? Well, then we’re making enough back.” It’s like, that’s not the way it’s going to happen! That’s an incredibly naive way of thinking about the way this will play out. It’s more likely it’s just running for free on my phone and I don’t even notice. I’m not gonna be paying for it at all.
Krugman: There are not 5 billion people in the world who can afford $100 a month.
Kedrosky: No, of course. It’s just a staggering misinterpretation. So both ways of thinking about it really don’t make a lot of sense. You fall into this—and I use this expression all the time—faith based argumentation where “it has worked out before.” This is what everyone said during the fiber bubble, or this is what everyone said during the dot com bubble, or pick your favorite moment with respect to a technological change. They say, “these things always work themselves out.” I find that a really patronizing approach to the problem, because the scale of the spending is now on a sovereign level, the amounts of debt being raised by companies like Oracle rival a mid-sized European powers’ sovereign debt raising on an annual basis. These are non-trivial numbers, and it’s even rippling through to places like Taiwan where, for example, TSMC now is something like 15% of Taiwan’s GDP. Every other sector in the country is struggling, not just not least because of technology, but also because of tariffs. So we’re creating new fragilities in all kinds of places as we merrily extrapolate our way along on the basis of this debt-fueled spending.
Krugman: Of course, there’s always the possibility that “other players, other approaches.” I mean, it’s a little bit like last year where the Danish economy is all about Novo Nordisk, and it turns out other people can produce weight loss drugs, too.
Kedrosky: The analogy is spot on, because at peak, Novo Nordisk was something like 14% of Danish GDP. So in a weird sort of way, TSMC holds the same role with respect to Taiwan now as a result of it, and faces the same risks with respect to fragility because LLMs, large language models, as the basis of much of the current excitement, are at a kind of natural architectural dead end with respect to some of the things we’ve been talking about. So the idea that it’s going to continue, we can project in the same way and extract the same gains from the same kinds of spending are just incredibly unrealistic. That’s one of the reasons why you’re seeing people increasingly look at other approaches. I think in all likelihood, none of them will lead to anything like AGI. But it doesn’t really matter. The point is that it’s a demonstration of the extractive exhaustion of what we’ve currently done.
Krugman: There is this talk among mostly uninformed circles that I run in, but about smaller models trained on a more limited base to the Chinese approach and that are much cheaper to run and that that would be a huge blow to these companies, if that turns out to be right.
Kedrosky: Absolutely. So you have these small and micro models that are much cheaper to train. Deep Seek was loosely an example last year. This idea of a much less expensive method for training models. We saw it recently with Moonshot’s Kimi model which just came out of some of these Chinese models. So these are in a sense they’re a different approach to the same problem. We’re not taking a new architectural approach. These are still large language models. They’re just at a much smaller scale in terms of the amount of time required to train them and the cost required to use them. So they’re really important, but they’re even more important because, let’s think forward, if I’m right, that the amount of training we do in future has to decline, because the natural architecture that we’ve had with respect to large language models, the economics are dictated then by inference, by the ability of these models to respond to requests.
But most of inference is not you and I. This is a mistake we make all the time. People think that we are the story. With respect to inference, most of the global inference from consumers—from you and I and others—could be satisfied by a single data center in northern Virginia. That’s how small a fraction of the total load we are with respect to inference worldwide.
So 60%, let’s say, is training. We’re maybe 5 or 6% of the total workload of data centers. That bit in the middle, a huge chunk of that, is software itself—is coding, which turns out to be a huge profligate use of tokens. So what you’re forced to project as you go forward, is you say, “well, is everyone on earth going to be writing software using Copilot or Cursor or any of these tools?” That seems unrealistic. So where is the balance going to come from with respect to the increased usage of these models? Then at the same time, you have the incursion of these small models which are going to eat up even more of it at the margin. So it’s very difficult to see how the current extrapolated model, with respect to the workloads at these data centers makes any sense.
Krugman: It’s amazing. One of my pastimes now is watching old tech ads from the 90s. The ads were a lot better, by the way. I don’t know why the 90s were so much more fun than this one, but the old Qwest ads about all the wonders of fiber optics. It all came true, except, not for Qwest.
Kedrosky: Right, which is a sort of the perennial problem here, it’s: you turned out to be the pioneers with the arrow in your back. But yeah, I think that’s a big part. The other thing I think is that what’s really unusual about this bubble and confuses people a lot—or this moment, I’ll say—is that historically, the U.S. has been very good at speculative bubbles. This is one of our main core competencies here. They tend to be about real estate, or they tend to be about technology, or they tend to be about loose credit, and sometimes they even have a government role with respect to some kind of perverse incentive that was created. This is the first bubble to have all four. We’ve got a real estate component, we have a loose credit component, we have a technology component, we have a huge government component, because we’re told “we’re in an existential crisis with China, and we must win this at any cost.” So all of those forces together means that you have people looking at it through four different silos and lenses, rather than just saying, like in the global financial crisis, “it’s always about real estate and credit.” Or in telecom, “it’s about technology and loosely some credit.” This is the first one where you end up in the rational bubble theory of all of this, where everyone feels like they’re doing something rational. Yet in aggregate, all of these different people who are looking at the problem through their own lenses are actually profligate contributors to the problem, because it’s the first one that combines all of the forces that historically have made some of the largest bubbles in U.S. history.
Krugman: Oh, joy. (laughs) Sorry. Well, it does have a sort of a feeling again. I think the bursting of the housing bubble played an important role in my life because it made it possible for me to afford my New York apartment. And, of course, I was paying a lot of attention, though not financial stake but to the tech bubble of the 90s. But now we have the sum of all these things...
Kedrosky: The sum of all bubbles.
Krugman: Wow. Let me just ask. We’re running a little long, but I want to add that one of the interesting posts that you had recently was about—economic geography, location, that’s one of my things—San Francisco is having a revival, you want to talk a little bit about what are the places that are affected?
Kedrosky: So this is probably one of the narrowest moments with respect to risk capital in the last 30 years in terms of either the money is going to one thing or it’s going to nothing, which is to say venture, secondary credit, growth capital, it’s all going into AI, which is having this impact in those centers which are most prone to having companies doing this kind of work.
So San Francisco is a good example, where it’s gone from a relative commercial real estate glut as recently as four years ago to it’s now back to historical norms. And probably by this time next year at the latest we’ll be well below the levels that we saw even 10-15 years ago and entirely driven by this influx of capital around this single sector. So the narrowness is one thing, but the scale of the money flowing in is another, to the point that it’s actually distorting. It’s doing the same thing in New York. It’s doing the same thing in San Francisco, to a lesser extent in other centers. But it’s narrow geographically and it’s narrow sectorally which is really unusual.
I think the flip side of that, and the point I always make, is that whenever all of this capital is flowing to a single thing, it also means that it’s not flowing somewhere else. I think that’s incredibly important to understand. I gave the Taiwan example earlier, where if you’re in AI or semiconductor manufacturing in Taiwan, you’re awash in capital. If you’re a manufacturer of literally everything else, you cannot get a loan. The same thing is true in the U.S, where if you’re an early stage company or a mid-stage company looking for growth capital for almost anything and it doesn’t have an AI component, you’re out of luck, my friend.
This notion of starving not just manufacturers, but growth companies for capital because of the narrowness of the spending almost always has historical consequences. We saw this in the 90s, with the rise of China and sort of coincident with the telecom bubble, and how U.S. manufacturers are increasingly starved of capital because it was all flowing sectorally to telecom. We’re seeing the same thing now. That will play out over the next few years. But it’s dramatic right now.
Krugman: Sounds bizarre unless you know the history, but in international economics it’s “the Dutch disease.” There was this famous period when after the Netherlands discovered natural gas, it really killed their manufacturing sector.
Kedrosky: Exactly. I make the same analogy. I think that’s exactly what’s going on. It plays out in insidious ways. Like let’s say you imagine the tariff policy was going to be effective with respect to offshoring manufacturers, imagine you’re a capital intensive manufacturer trying to onshore and you’re not in the semiconductor sector, how difficult is it to raise capital right now? It’s virtually impossible. It’s really much more difficult than it would be absent the AI spending bubble. Because of this tsunami of cash flowing into a single sector. So even if you believe that policy was likely to be effective, the struggles with respect to getting any capital are dramatic because of this phenomenon. And yet, if you don’t talk about it and understand it, you’ll think that, “oh, well, what we need is probably higher tariffs.” We need to encourage people even more to come. Otherwise we won’t have enough manufacturers manufacturing domestically.
Krugman: It’s just this feeling that—monstrous sums of money, monstrous egos, where does all of this end up?
I have to say, there’s one humble sector that I happen to know is prospering amid all of this, which is the two remaining companies that produce blue books for college exams.
Kedrosky: Oh, yeah.
Krugman: They’re having a revival because we’re going back to handwritten exams.
Kedrosky: You know what? That doesn’t surprise me. I should have thought of that, but I bet that’s exactly right.
Krugman: The problem is the young people don’t know how to write anymore. They literally don’t know cursive. How this thing works is so important, and people like me are thoroughly unequipped. So thank you for helping me a little bit on that front.
Kedrosky: That was great, it was great chatting.