Do human differences persist and scale when decisions are delegated to AI agents? We study an experimental marketplace in which individuals author instructions for buyer-and seller-side agents that negotiate on their behalf. We compare these AI agentic interactions to standard human-to-human negotiations in the same setting. First, contrary to predictions of more homogenous outcomes, agentic interactions lead to, if anything, greater dispersion in outcomes compared to human-mediated interactions. Second, crossing agents across counterparties reveals systematic dispersion in outcomes that tracks the identity and characteristics of the human creators; who designs the agent matters as much as, and often more than, shared information or code. Canonical behavioral frictions reappear in agentic form: personality traits shape agent behavior and selection on principal characteristics yields sorting. Despite AI agents not having access to the human principal’s characteristics, demographics such as gender and personality variables have substantial explanatory power for outcomes, in ways that are sometimes reversed from human-to-human interactions. Moreover, we uncover significant variation in “machine fluency”-the ability to instruct an AI agent to effectively align with one’s objective function-that is predicted by principals’ individual types, suggesting a new source of heterogeneity and inequality in economic outcomes. These results indicate that the agentic economy inherits, transforms, and may even amplify, human heterogeneity. Finally, we highlight a new type of information asymmetry in principal-agent relationships and the potential for specification hazard, and discuss broader implications for welfare, inequality, and market power in economies increasingly transacted through machines shaped by human intent.
Here is the full paper by Alex Imas, Kevin Lee, and Sanjog Misra. Here is a thread on the paper.
The post Agentic interactions appeared first on Marginal REVOLUTION.
Note: Mortgage rates are from MortgageNewsDaily.com and are for top tier scenarios.
“You really are catching me at the end of things” — Nina Klymowska
I imagine that if you like this newsletter, you also love poking around other people’s studios: https://www.youtube.com/@joshuacharow/videos
Dithering and chunky pixels are nostalgia for old digital screens.
Halftone patterns and dots are nostalgia for old print.
Feels like a fairly uncontroversial statement.
And I’m all about the print, so halftones for me, but, ugh, I backed myself into a corner.
I have a box full of POSCA acrylic paint pens. I bought them ages ago thinking they’d be great for drawing on black paper with the pen plotter, I was so wrong. Let’s take a closer look at the pens…
…yeah, you need to shake them before use, fine. Then “pump” them several times to get the ink to flow, also fine; but you have to keep doing that repeatedly throughout the plot or else they dry up. Which is tricky when they’re strapped into the machine.
However, I’ve been getting the ArtFrame to use rubber stamps & ‘cause it uses GCODE, you can use that GCODE to SLAM whatever it’s holding down as fast at possible.
Which got me thinking, why not write the code so the machine draws some lines, and then after a certain distance moves the pen onto some sacrificial paper, in a corner somewhere, z-index the fuck out of it, repeatedly SLAMMING it up and down several times, before going back to drawing.
And as we saw last newsletter with the dots, I ended up decided the whole thing should just be dots/pumping the pen, so it’s always primed with paint/ink.
That’s one half of the halftone equation done.

Basically you can create the appearance of shade by varying the size of dots on a grid, shown above a hexgrid. Below, mixing CYMK dots at various sizes to make different colours.
That’s fine, but I have pens, those pens have nibs, and those nibs make 2mm dots.
THUD, THUD, THUD, DOT, DOT, DOT.
2mm, no more, no less. I either have a dot or no dot.
So I can’t halftone with different sized dots, but I can dither. But I don’t want a chunky pixel square grid dither, ‘cause that’s for screens and retro games, I want print, so I want to use a hexgrid, now to escape the corner I’ve painted myself into.
Here’s an excellent visual primer on dithering: https://visualrambling.space/dithering-part-1/ (thanks )
The only technical article you need on dithering: https://surma.dev/things/ditherpunk
Maths and numbers: https://tannerhelland.com/2012/12/28/dithering-eleven-algorithms-source-code.html
How Return of the Obra Dinn did it: https://forums.tigsource.com/index.php?topic=40832.msg1363742#msg1363742
And lastly, dithering in colour: https://obrhubr.org/dithering-in-colour
That covers the pixels, now dots
The trick to dithering is knowing which pixels or dots to turn on and off. The links above show there’s several ways of doing this, but I like to keep things simple, so here’s a grid of 12 dots…
We’ll get onto why they’re numbered like that in just a moment.
First thing to note though is we can repeat this group of 12 dots over and over to fill in the whole page, I’ve coloured them to make it easier to see.
So what happens if we turn off dot number 11? We get this…
We’ll make it easier to see the whole pattern by turning all the dots to black, and switching half of them off.
The trick to shading is pretty simple. We take an image, turn it greyscale, and then at any particular point work out what the grey value is on a scale of 0-12.
12 is light (pure white), and 0 is dark (pure black).
Then we turn off any dots under that value. So if something is pure white (12), we turn off all the dots that are less than 12, i.e. all of them. If it’s black we turn off all the dots under 0; none of them.
If it’s light grey, say 9, we turn off all the dots under 9, leaving just the 9, 10 & 11 dots.
If it’s dark grey, 3, we turn off all the dots under 3, keeping all the dots numbered 3 to 11.
The dots are numbered in such a way that the pattern remains “interesting” as you remove dots. There’s better ways (the 10th and 11th dots are too closely aligned for my liking), but this is quick & dirty and good enough, shown below is a gradient of all 13 “shades”. Note: the “grid” is rotated about 40° counter-clockwise here.
So now instead of getting a halftone effect by changing the size of the dot, we can use hex-dithering instead. If we take an original image and split it into separate CMYK channels, convert each channel to greyscale, then plot using a different rotation angle for each layer, we get something like this (I threw the black channel away)…
While writing this newsletter I found this set of posts about applying error diffusion to a hex grid: https://loomsci.wordpress.com/tag/error-diffusion/ - which is probably the closest to this, as they’re using hexagon “pixels” of a set size, so they can only vary colour (and in some experiments shade).
This paper also uses error diffusion on a hexagonal grid.
I can see this was posted back in 2019 which at least has the tag ‘penplot’ and mentions pens and dots, but I don’t see any actual pen plots done with it.
So I’m going to go out on a limb here and suggest this is the world’s first hexagonal order dithering with a threshold map of 12 levels, multi-coloured pen plot. It doesn’t take much to be the world’s first at something, you just need to narrow the criteria down a lot 😁
📬 These postcards will be going out to Patreon members at the postcard level an above next Wednesday, just saying.
I’ve run out of bytes, time to go!
Next newsletter will be Thursday the 25th of December, WTF?!
Love you all
Dan
🧡
NASA has lost contact with one of its three spacecraft orbiting Mars, the agency announced Tuesday. Meanwhile, a second Mars orbiter is perilously close to running out of fuel, and the third mission is running well past its warranty.
Ground teams last heard from the Mars Atmosphere and Volatile Evolution, or MAVEN, spacecraft on Saturday, December 6. “Telemetry from MAVEN had showed all subsystems working normally before it orbited behind the red planet,” NASA said in a short statement. “After the spacecraft emerged from behind Mars, NASA’s Deep Space Network did not observe a signal.”
NASA said mission controllers are “investigating the anomaly to address the situation. More information will be shared once it becomes available.”
Up, it being a great frost upon the snow, and we sat all the morning upon Mr. Creed’s accounts, wherein I did him some service and some disservice. At noon he dined with me, and we sat all the afternoon together, discoursing of ways to get money, which I am now giving myself wholly up to, and in the evening he went away and I to my office, concluding all matters concerning our great letter so long in doing to my Lord Treasurer, till almost one in the morning, and then home with my mind much eased, and so to bed.
Links for you. Science:
An Aeromonas variant that produces aerolysin promotes susceptibility to ulcerative colitis
Denmark close to wiping out leading cancer-causing HPV strains after vaccine roll-out
Headless bodies hint at why Europe’s first farmers vanished
All About the Tylenol-Autism Brouhaha
In Yellowstone, Migratory Bison Reawaken a Landscape
US State Restrictions and Excess COVID-19 Pandemic Deaths
Other:
The Sweet Embellishments of the Glucose Goddess. Does Jessie Inchauspé medicalize normalcy?
If The Atlantic Wishes to Honestly Understand the Origins of MAHA, They Need to Investigate The Atlantic.
What the Trump Administration, RFK Jr., and the MAHA Report Got Wrong About Improving Children’s Health
A Terrible and Avoidable Tragedy in D.C.
The lost congressman: What happened to Jeremiah Haralson?
Automated Traffic Enforcement Is More Popular Than You Think
How the Richest People in America Avoid Paying Taxes
Trump says he wants to ‘permanently pause’ migration to the US from poorer countries (the entire statement is just Nazi crap)
The Fear Taking Hold Among Indiana Republicans: “I’d rather my house not get firebombed.”
D.C. Shooting Suspect ‘Could Not Tolerate’ the Violence of His C.I.A.-Backed Unit in Afghanistan, a Childhood Friend Said
Kennedy Katch and Kill Kontinued
A trillion dollars is a terrible thing to waste. The machine learning community is finally waking up to the madness, but the detour of the last few years has been costly.
Why voters may not buy Trump’s messaging on food prices
Trump-Backed Crypto Company Promotes ‘Shit Piss Skin Can’ Coin
Americans are buckling under medical bills. It could get worse.
The Same Stream Twice
Border Patrol chief Gregory Bovino’s ‘word means nothing, protester says after assault case is dismissed
Getting Ready to Party Like It’s 2008
Trump’s Homelessness Crackdown Has Been Tried Before. It Didn’t Work.
Trump’s DC Occupation Costs 4 Times More Than It Would Take to House City’s Entire Homeless Population
A Proposed 5-Step YIMBY Playbook to Fix New York’s Housing Crisis
Maga’s boss class think they are immune to American carnage
Don’t Believe What AI Told You I Said
Why Republicans Are Terrified of Nonexistent Crime
A Real Place Deserves Real Rights
How the New Atheists Joined the MAHA War on Science
Nobody Cares That White Supremacists Are Calling the Shots Now
Andrea Long Chu on Thomas Chatterton Williams’s New Book
We once loved pigeons. We might not remember that, but they do
The American Car Industry Can’t Go On Like This. Ford is taking drastic steps to compete with China’s cheap EVs. Even that might not be enough.
Calls for a Kennedy impeachment: it’s not just for bloggers anymore! Democratic Rep. Haley Stevens* (MI) is pushing filed articles of impeachment against HHS Secretary and Plaguelord Kennedy. Is she doing this to help her Senate bid (she’s running for Senate against two other Democrats)? Probably! But who cares?
This should be the mainstream position of the Democratic Party; it’s not really a reach for them, as every single senator, even that fucker Fetterman, voted against his confirmation. And Democrats should get Republicans on the record as to whether Republicans support harming children out of ignorance and arrogance (aka ‘MAHA’). At least make someone pay a small political price for endangering America’s children.
Of course, the NYT, because they assigned this to a political reporter, played inside baseball with this, rather than assessing the validity of the argument for impeaching Kennedy. But babysteps for everyone involved I guess…
*Stevens’ statement/argument isn’t especially clear (this really doesn’t seem to be in her wheelhouse, but then again, most congresscritters know so little about biology, I’m not sure they really understand where babies come from). She really needs to lay it on thick about the childhood vaccines.
Cotality ... today released the Homeowner Equity Report (HER) for the third quarter of 2025. The report reveals a mixed picture of homeowner equity gains across the United States.
Borrower equity decreased year over year, declining by $373.8 billion or 2.1%. That decline translates to an overall net equity to $17.1 trillion for homes with a mortgage. Homeowner equity peaked at close to $17.7 trillion in the second quarter of 2024 and has since oscillated between $17 trillion and $17.6 trillion.
"As the pace of home price growth slows and markets recalibrate from pandemic peaks, we’re seeing a clear shift in equity trends,” said Cotality Chief Economist Dr. Selma Hepp. “Negative equity is on the rise, driven in part by affordability challenges that have led many first-time and lower-income buyers to over-leverage through piggyback loans or minimal down payments. While overall home equity remains elevated, recent purchasers with smaller down payments may now face negative equity.”
...
While the share of homeowners in negative equity reduced in the second quarter of this year, it ticked up again in the third quarter. In the current quarter, 2.2% of homeowners have negative equity or 1.2 million properties. Another way to think about it is that there’s been a 21% year-over-year rise in the number of homeowners in negative equity with 216,000 more homes falling into the category in the third quarter, a trend that has been gaining steam and signals possible market difficulties ahead.
Compared to the second quarter, there has been a 6.7% increase in the number of mortgaged residential properties sitting in negative equity. This slide in equity tracks with market cycles as the spring homebuying season faded into the slower fall market, during which period there’s a more consistent weakness in home price gains across markets.
This graph compares the distribution of equity (and negative equity) in Q3 vs. Q2. I have long maintained that smart contracts are a dumb idea: that a human process is actually a security feature.
Here’s some interesting research on training AIs to automatically exploit smart contracts:
AI models are increasingly good at cyber tasks, as we’ve written about before. But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents’ ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench)a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoffs (June 2025 for Opus 4.5 and March 2025 for other models), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth $4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.
In June 2023, I wrote: Could 6% to 7% 30-Year Mortgage Rates be the "New Normal"?There is much more in the article.
At that time, the Fed Funds rate was set at 5 to 5-1/4 percent and the Ten Year Treasury was yielding 3-3/4%. I noted in 2023: “the 10-year yield would likely increase even as the Fed lowers the Fed Funds rate.”
And that is what happened. The 10-year is yielding 4-1/4% this morning. This is a key point. Just because the FOMC is cutting rates, doesn’t necessarily mean long rates will follow.
Note: For a discussion of the R* and the neutral rate, see housing economist Tom Lawler's post on Tuesday.[I]f, as expected, the FOMC decides to cut its federal funds rate target by 25 bp tomorrow, then the resulting level of the federal funds rate will be very close to the neutral nominal policy rate.The following graph is from Mortgage News Daily and shows the 30-year mortgage rate since 2000. Rates were in the 5.5% to 6.5% range prior to the housing bust and financial crisis. Then rates were in the 3.5% to 5% range for over a decade prior to the pandemic. Currently rates are at 6.30% for 30-year mortgage rates.
We’ve spent a lot of time examining the problem of construction productivity in the US — the fact that, across a variety of different metrics, construction never seems to get any more efficient (in terms of how much output you get for a given amount of input), or any cheaper. A paper I wrote about by Goolsbee and Syverson, for instance, titled “The Strange and Awful Path of Productivity in the US Construction Sector,” looked at a variety of different productivity metrics and found that they all show either flat or declining productivity. By contrast, other sectors (such as manufacturing), as well as the economy overall, tend to show increasing productivity.
Much of our investigations have been focused specifically on the issues of construction productivity in the US. But it’s also worth looking at construction productivity trends in other countries — if other countries are showing steadily improving construction productivity, that may give us ideas for ways to improve US construction productivity. If they’re not improving, by contrast, that suggests that US-specific things (such as various regulations) aren’t what’s holding American construction productivity back .
To look at international construction productivity, we can use KLEMS databases, which aggregate productivity statistics for different industries in countries around the world. (KLEMS stands for capital (K), labor (L), energy (E), materials (M), and services (S).) These KLEMS datasets are a bit scattered and not amazingly well-maintained (I had to retrieve a lot of the data from archive.org), but by pulling them together we can assemble construction productivity datasets for dozens of different countries going back quite far:
The EU KLEMS dataset has productivity data for European nations, as well as a smattering of other countries. The current EU KLEMS release goes from 1995 to 2021, and in addition to European countries also includes the US, the UK, and Japan. Older EU KLEMS releases (I used the 2008 release) go all the way back to 1970, and in addition to the US, UK, and Japan, also include Korea, Canada, and Australia.
Asia KLEMS has productivity data for Korea, Japan, Taiwan, and India, going from 1980 to 2012.
LA KLEMS has productivity data for several latin american countries, going from 1990 to around 2019.
World KLEMS, in addition to links to the above datasets, also has links to Canada, Russia, and China KLEMS data.
To calculate productivity using this data — specifically, labor productivity, or the amount of output we get for a specific amount of labor — we can use the “chain linked gross value add” measure, VA_Q or VA_QI in the database. Gross value-add is the value of the outputs (in this case, the buildings and infrastructure produced) minus the value of “intermediate inputs” — materials, services, energy, and other things purchased from outside the sector in question. In other words, it’s the total value that the industry itself contributes. “Chain linked” is a way of adjusting for inflation, by calculating the growth rate for one year using the previous year’s prices, then “chaining” those growth rates together. To get sector productivity, we just divide chain linked gross value-add by a measure of total labor effort in that sector. For that labor effort variable, we’ll use H_EMP, which is the total number of hours worked by “engaged persons” — employees, business owners, and people who are self-employed.
For a few countries, we’ll need to calculate labor productivity slightly differently. India’s KLEMS data doesn’t include H_EMP, so we’ll use the number of employees instead. China’s KLEMS data doesn’t include VA_Q, but it does include the growth rate of labor productivity by industry, which provides the same information.
Putting all this together, the below chart shows construction labor productivity by country, for 45 different countries. Countries are color-coded by region, and labor productivity has been normalized to equal 100 in the first year for which there’s data.
(Click to embiggen.)
Since this can be a little hard to parse, here’s a smaller graph showing productivity trends in 16 major countries. This figure also shows labor productivity in manufacturing (solid gray line), and across all industries (dotted gray line).
Let’s start by looking at the US. Per KLEMS data, the US had declining construction labor productivity since 1970, which started to flatten out around the mid-1990s. This exact pattern is a bit different from what other measures of US construction labor productivity show — Goolsbee and Syverson show productivity declining up through 2020, whereas Teicholz shows labor productivity that’s closer to flat since the 1960s — but it’s broadly consistent with them.
If we look at other countries, we can see that for the period from 1970 through the early 1990s, the US is an outlier in having declining construction productivity. From 1970 through 1995, US construction productivity declined by about 1.9% per year on average. For other countries where data goes back to the 70s (which is most of Western Europe, Japan, Korea, Australia, and Canada), only one other country shows an average decline over that same period (Greece), and its decline is much smaller. Most other countries show improving construction productivity of around 1-2% per year.
A 1-2% improvement in construction productivity isn’t amazing — as we can see in the figure above, it’s often much less than the rate of improvement in manufacturing, or in the economy overall — but it’s nevertheless a positive trend.
Since roughly the mid-1990s, however, the trends look somewhat different. The US arrested its construction productivity declines, and the industry’s productivity levels have stayed more or less flat. And many countries that previously were showing positive construction labor productivity growth — Germany, France, Italy, Spain, Austria, Japan, the UK — have since had flat or negative productivity growth. Other countries have maintained positive rates of productivity growth — Netherlands, Belgium, Denmark — but at lower levels than it was previously.
Starting in the 1990s (and earlier for some countries), we also have productivity data on a whole new swath of Asian, Eastern European, and South American countries. Many (but by no means all) of these countries — such as Latvia, Lithuania, Costa Rica, Peru — have shown steadily improving construction productivity over time. Typically it’s smaller and/or less wealthy countries that have shown these improvements.
Some of these countries (such as Taiwan) have shown a similar pattern as Western European countries, with construction labor productivity that improves for a while but then flattens out. The data for Taiwan and Korea (which both show this pattern) stops in 2012; I’d be interested to see what the trends since then have been.
Interestingly, despite the enormous amount of effort it put into infrastructure and building construction, and despite how repetitive much of it is, China does not appear to have particularly impressive growth in construction productivity. Productivity has risen, to be sure, but only at around a rate of 1.9% since 1987, and only around 1.4% since 1995. similar to the rates of growth seen in Western European countries in the 70s and 80s. Between 2007 and 2017, productivity growth is close to flat.1
Japan, the US’s former Asian rival, is similarly unimpressive, with nearly flat construction productivity since the 1970s.
Per this KLEMS data, the most impressive country in terms of productivity growth isn’t China, Japan, or Korea but Belgium, of all places. In the 1970s and 80s, Belgium had a construction productivity growth rate of greater than 3% annually, higher than anywhere else. And while most wealthy countries have had flat or marginal construction productivity improvements since the 1990s, Belgium has maintained a growth rate of around 1% per year. Belgium is also one of the only countries where the rate of productivity growth in construction is consistently similar to overall productivity growth.
It’s not clear to me if this is some sort of statistical or accounting artifact, or if Belgium has figured something out about construction that no one else seems to know. However, I’m somewhat inclined to think it’s the former. If you look at Belgium construction cost indexes, they seem to rise at roughly the same rate that US construction cost indexes do. And if you look at the actual cost builders are charging for new homes in Belgium, they appear to be slightly higher than US costs per unit area.
I’ve noted previously that in general I prefer cost as a measure of construction process improvement rather than productivity, and that I’m somewhat suspicious of these sorts of abstract measures of construction productivity. I’ve moderated this view somewhat, but it’s still true that care is required when trying to measure construction productivity, as it’s very easy for distortions to creep in. When I wrote about Goolsbee and Syverson’s paper on construction productivity, I noted that a very large fraction of the supposed decline in construction productivity is due to an unusually large deflator (used to adjust for inflation) used by the Bureau of Economic Analysis. Additionally, the United States’ Bureau of Labor Statistics sector-specific measures of construction productivity have shown implausibly large swings from year to year. The accounting required for sector-wide productivity estimates is just hard to do reliably.
An illustration of this is that a 2017 Mckinsey report on construction productivity, which uses the exact same KLEMS database that we used above, comes up with substantially different estimates of construction productivity growth.
Per this graph, circa 2015 the UK had around 0.5% construction productivity growth since 1995, while per our calculations it would be slightly negative. Sweden and Germany similarly have positive construction productivity growth under this analysis, while per our calculations they were slightly negative. Both calculations show Belgium having positive productivity growth, but McKinsey has a growth rate that’s about twice as high as ours. And where we have a construction productivity growth rate for China of around 1.4-1.9% per year, McKinsey has it as closer to 7%.
What’s the source of this discrepancy? The chief culprit appears to be revisions in the KLEMS data over time. The Mckinsey report used KLEMS data circa 2017, while our post-1995 analysis above uses KLEMS data from 2025. This matters because KLEMS updates don’t merely include data for more recent years, they also revise previous data. Here, for instance, is UK construction productivity across several iterations of KLEMS data:
And here’re some revisions to the Swedish data overtime:
Between 2019 and 2024, Swedish construction productivity was revised upward substantially — changing it from a steady decline to merely mostly flat — whereas UK construction productivity was revised downward substantially.
My broader point is that it’s tricky to use these high-level, sector-wide productivity estimates: slightly different analysis choices (which deflator, measure of labor input, or dataset you use) can result in pretty different conclusions, so they should be used and interpreted with care. In particular, it’s somewhat suspicious that there’s a big change in productivity trends right around the point where we switch from an older (EU KLEMS 2008) to a newer (EU KLEMS 2025) dataset (though for many countries there’s good overlap between the two datasets).
Looking at this international productivity data, here are my main takeaways:
Per KLEMS data, many countries have historically shown construction labor productivity increases. From the 1970s through the 1990s, this was the norm for most Western European countries, as well as Korea and Taiwan.
The US is unusual for having declining construction labor productivity during this period.
Since roughly the 1990s, some (but not all) Eastern European countries, some (but not all) Latin American countries, and China have seen substantial construction labor productivity increases. A few Western European countries — Ireland, Denmark, the Netherlands — have seen more modest increases since the mid-1990s. But most Western European countries, along with Japan and Korea, have had flat or declining labor productivity growth since the 1990s.
The recent US trend in labor productivity — it staying roughly flat — is fairly consistent with trends in other large, wealthy countries. This includes countries whose construction practices the US is often encouraged to emulate. Sweden, for instance, is often praised for its high adoption of prefabricated construction, but since the 1990s Sweden has had roughly flat construction labor productivity, even as it has made more extensive use of prefabrication. Japan similarly makes wider use of prefabrication than the US does, and Japan has proved willing to experiment with technologies like automated skyscraper construction, but has seen virtually no construction productivity improvement since the 1970s. China is held up as an example of an infrastructure-focused nation that the US should emulate, but its construction productivity gains, while positive, don’t appear to be particularly impressive.
The most impressive country for construction productivity is Belgium — it stands alone as a rich country that has had sustained, high levels of construction productivity growth. However, it’s not clear if these are real improvements or some sort of statistical or accounting artifact. Cost data does not appear to indicate that Belgium is building things massively more cheaply than elsewhere. (If you know anything about construction productivity in Belgium, email me.)
All this must be taken with a large grain of salt, because this sort of productivity accounting is hard to do accurately.
Overall, my take when looking at this data is that stagnant construction productivity is an extremely widespread problem. The trends we see in the US — flat or declining construction productivity — are also what we see in other large, wealthy countries over the past 30 years. Most countries that achieved construction productivity growth at one time haven’t maintained it. The countries that currently show improving productivity tend to be small (Ireland, Denmark, Estonia), poor (Colombia, Peru), or both. And construction productivity improvements in these countries tend to severely lag improvements in manufacturing, or what’s observed in the economy overall. Even sustained, large-scale building programs (such as China’s) or widely adopted factory-based construction (such as Sweden’s) don’t appear to have changed this.
This isn’t to say that there’s nothing the US has to learn from other countries. For one, it’s worth understanding what specifically was going on towards the end of the 20th century when so many countries had rising construction productivity, often at or above the levels of productivity improvements in their overall economies. It’s also worth investigating what, specifically, is going on in places like Belgium and Estonia; I’m not amazingly confident there will be lessons that are broadly applicable, but I’m not so pessimistic that I don’t think it’s worth probing further.
The above data also only considers changes in construction productivity, and doesn’t have anything to say about absolute levels, and it seems obvious to me that, in many cases, US construction practices lag behind European ones for things like transit construction. The US has a lot to learn from other countries for how to catch up to the efficient frontier, especially for certain types of construction (ie: transit) and in certain places in the country (ie: expensive coastal metros). But this data suggests that driving that frontier forward is a much thornier problem.
More charitably, there’s a productivity decline starting around 2007, likely as a result of the global financial crisis’ effect on construction, followed by an increase of around 2.2% per year.
1. Wage effects of El Salvadoran gangs.
2. Did industrial policy drive East Asian growth?
3. U.S. mass killings drop to a twenty-year low.
4. Vero and Jack Salmon with debt worries.
6. Mercatus Center fellowships now open for next year.
The post Thursday assorted links appeared first on Marginal REVOLUTION.
National vacancy rates in the third quarter 2025 were 7.1 percent for rental housing and 1.2 percent for homeowner housing. The rental vacancy rate was not statistically different from the rate in the third quarter 2024 (6.9 percent) and not statistically different from the rate in the second quarter 2025 (7.0 percent).
The homeowner vacancy rate of 1.2 percent was higher than the rate in the third quarter 2024 (1.0 percent) and higher than the rate in the second quarter 2025 (1.1 percent).
The homeownership rate of 65.3 percent was not statistically different from the rate in the third quarter 2024 (65.6 percent) and not statistically different than the rate in the second quarter 2025 (65.0 percent).
emphasis added
Click on graph for larger image.
The HVS homeowner vacancy increased to 1.2% in Q3 from 1.1% in Q2.
For the last few years, I’ve worried about Silicon Valley taking over Hollywood. But things reached a tipping point back in October.
Apple had just swept the Emmy awards—walking away with 22 tiny statues.
David Ellison (son of database billionaire Larry Ellison) was poised to take control of Paramount.
Disney share price was down 50% from its high, and increasingly looked like an acquisition target for some bored web tycoon.
Warner Bros. was already on the selling block, and would inevitably get taken over by the expanding technocracy.
I was alarmed back then. But that was October. In December, things look even worse.
A few weeks ago, Disney announced another miserable quarter—with profits from its entertainment business dropping 35%. Its margins are ugly, and there’s no clear plan for a turnaround in sight.
The company is so creatively drained that CEO Bob Iger actually wants users to generate their own Disney content. What’s next? Does he want us to build our own theme parks? Should I start my own troupe of Mouseketeers in the basement?
The company is looking for a new CEO—and the sooner, the better, if you ask me. But none of the likely candidates inspire much trust. So the company’s Matterhorn-sloped downward slide is likely to continue with accelerating speed.
I’m convinced that the House of Mouse will soon get swallowed up by a tech titan. I see Apple as a likely buyer, but Disney might also get acquired by Google, and bundled with its YouTube business.
That’s the hot strategy nowadays.
You buy a Hollywood movie studio, and turn it into a content farm for tiny screens. And then, in phase two, the people making the content get replaced by AI.
Let me remind you that Google’s market cap is now 20 time the value of Disney. They could buy out Disney with the spare change lost in the CEO’s couch cushions.
And Google already has a huge investment in AI it needs to justify. A captive movie studio would be just the ticket.
The AI situation has also gotten worse in recent weeks. The hottest new concept in film is the AI-generated movie star. We learned in November that forty (or more) of these digital cyborgs are already in development. The future plans for these constructs is still top secret. But demand is off the charts.
That threat is still in the future. So right now I’m fretting more about the fate of Warner Bros. Netflix has emerged as the likely acquirer—although Paramount still might take the prize.
Many people see this as a harmless deal. After all, Netflix is already in the movie business. So what harm can they do at Warner Bros?
“Digital apps are now bigger than Hollywood in every way—except the size of the screen.”
They can do plenty of harm—Netflix may be acquiring a Hollywood studio, but it isn’t really interested in supporting the existing movie ecosystem. What it really wants is:
Intellectual property
Brand franchises
Streaming content
That’s all folks, as they say at Warner Bros. Everything else is just excess baggage—jettisoned at the first available opportunity.
You will see the damage in your home town—where movie theaters are already struggling for survival. We’ve lost 5,700 movie screens since 2020, and ticket sales are still declining.
You can’t blame COVID anymore. But you can blame Netflix.
Back in April, Netflix’s CEO announced that movie theaters are “an outdated concept.” He’s said he wasn’t bothered by the disappearance of so many cinemas. He pointed out that most people can’t walk to a movie theater—but they can watch at home.
That’s his dream. He wants to replace moviegoers with couch potatoes. And the easiest way to do this is preventing theaters from accessing hit movies.
Not long ago that was called restraint of trade. But now it’s business as usual.
Netflix is still forced to show its movies in theaters for a few days—in order to qualify for Oscar consideration. But that rule will obviously change as streamers gain influence. Meanwhile Netflix makes sure that few people see these films on a big screen in a communal setting.

The company’s treatment of Guillermo del Toro’s new film Frankenstein is a case in point. The filmmaker believes that the best audience experience is in a movie theater, but Netflix only released his film for a few days in cinemas—without any marketing or promotion.
So Frankenstein only earned $129,000 at the box office in the US. That’s not per theater—that’s in total at every theater. The film cost $120 million to make. Under the old rules, studios would spend heavily on marketing to draw people to movie theaters in order to recoup their investment. But not anymore.
Netflix has a different plan. They just want you to pay your subscription bill each month. If you get too excited about the big screen movie experience, you might stop paying for Netflix—and that’s threatens their couch potato vision of the future.
But what about us? Do we lose that larger-than-life communal experience at the cinema? Do we all become stay-at-homes staring at tiny screens?
That’s what is at stake right now in Hollywood. The studios fumbled, and gave tech platforms an entry point into the film business. And now the digital apps are bigger than Hollywood in every way—except the size of the screen.
Our only hope is the rise of new indie operators. We need creative people who make great films outside the control of tech billionaires and AI slop factories. But is that even possible anymore?
Keep posted—because 2026 may be the most turbulent year in Hollywood since the rise of talking films a century ago. But it also might be the moment when a serious resistance movement strikes back.
For my part, I’m not willing to give up on big screens and independent filmmaking without a fight.
SpaceX is planning to raise tens of billions of dollars through an initial public offering next year, multiple outlets have reported, and Ars can confirm. This represents a major change in thinking from the world’s leading space company and its founder, Elon Musk.
The Wall Street Journal and The Information first reported about a possible IPO last Friday, and Bloomberg followed that up on Tuesday evening with a report suggesting the company would target a $1.5 trillion valuation. This would allow SpaceX to raise in excess of $30 billion.
This is an enormous amount of funding. The largest IPO in history occurred in 2019, when the state-owned Saudi Arabian oil company began public trading as Aramco and raised $29 billion. In terms of revenue, Aramco is a top-five company in the world.
With a key Russian launch pad out of service, NASA is accelerating the launch of two Cargo Dragon spaceships in order to ensure that astronauts on board the International Space Station have all the supplies they need next year.
According to the space agency’s internal schedule, the next Dragon supply mission, CRS-34, is moving forward one month from June 2026 to May. And the next Dragon supply mission after this, CRS-35, has been advanced three months from November to August.
A source indicated that the changing schedules are a “direct result” of a launch pad incident on Thanksgiving Day at the Russian spaceport in Baikonur, Kazakhstan.
Here are three kidney papers and proposals that I've noted recently, which will have implications for the growing interest in international kidney exchange on a global scale:
Klaassen MF., de Klerk M, Dor FJ.M.F., Heidt S, van de Laar SC., Minnee RC., van de Wetering J, Pengel LH.M. and de Weerd AE. (2025) Navigating a Quandary in Kidney Exchange Programs: A Review of Donor Travel versus Organ Shipment. Transpl. Int. 38:14804. doi: 10.3389/ti.2025.14804
Abstract: In multicenter kidney exchange programs (KEPs), either the explanted kidney must be shipped, or the donor must travel to the transplanting center. This review describes the available data on these two approaches and formulates recommendations for practice. We searched for studies addressing organ shipment or donor travel in KEPs. Data were categorized into four domains: cold ischemia time (CIT), logistics, donor/recipient perspectives and professional perspectives. From 547 articles screened, 105 were included. Kidneys are shipped in most countries. Prolonged CIT due to shipment may increase the risk of delayed graft function, but does not seem to impact graft survival. Planning the shipment requires a robust logistical framework with guaranteed operating room availability. Donor travel is reported to be both emotionally and financially distressing for donors and exposes them to inconsistencies in donor evaluation and counseling across centers. Reduced willingness to participate in KEP when travelling was reported by 36%–51% of donors. Professionals generally support offering organ shipment to donors not willing to travel. In conclusion, the decision between donor travel or organ shipment should be tailored to local circumstances. Healthcare professionals should prioritize minimizing barriers to KEP participation, either by facilitating organ shipment or reducing the burden of donor travel.
######
Neetika Garg, Joe Habbouche, Elisa J. Gordon, AnnMarie Liapakis, Michelle T. Jesse, Krista L. Lentine,
Practical and ethical considerations in kidney paired donation and emerging liver paired exchange,
American Journal of Transplantation,
Volume 25, Issue 11, 2025, Pages 2292-2302,
ISSN 1600-6135, https://doi.org/10.1016/j.ajt.2025.07.2459.
(https://www.sciencedirect.com/science/article/pii/S1600613525028382)
Abstract: Since the first kidney paired donation (KPD) transplant in the United States in 1999, the volume and scope of KPD has expanded substantially, accounting for nearly 20% of living donor kidney transplants in 2021-2022. This review article discusses the practical and ethical issues specific to paired donor exchange that patients, transplant centers, and exchange programs commonly encounter. Access to paired donor exchange and education of candidates regarding the potential benefits, risks, and logistics of KPD are important considerations. Transplant centers and patients must consider practical issues including wait times, allocation and matching strategies, assessment of organ quality, complex donors, cold ischemia time, and risks of broken chains. Protections available to donors from current KPD programs, the potential psychosocial effects, and the ethical concerns related to variable access and the proprietary nature of private exchange programs are also discussed. More detailed, timely data collection at a national level, and ability to merge national data with individual donor exchange registries will enable the analysis of the impact and outcomes of future trends in paired donation. KPD experience and key concepts may inform liver paired exchange, which has been used internationally to expand living donor liver transplantation and is emerging in the United States.
######
Alliance for Paired Kidney Donation (APKD) Launches Wish Upon a Donor: A Hope-Focused Advocacy Program Helping Kids Who Need Kidneys Find Living Donors
"TOLEDO, OHIO / ACCESS Newswire / December 9, 2025 / The Alliance for Paired Kidney Donation (APKD) is proud to announce Wish Upon a Donor, a groundbreaking program that amplifies the voices of families fighting for a better and brighter future for their child. While pediatric kidney patients cannot advocate for themselves, their parents can - and too often, they face this battle alone. Wish Upon a Donor helps families share their child's story, shining a light on their hopes, dreams, and urgent need for a living kidney donor.
...
"The onboarding process is fast and simple, taking just 10-15 minutes to complete, and finalized videos are sent to patients in just one to three days. Participation is free, and patients retain full control over how and where their stories are shared.
Wish Upon a Donor offers a range of support for families as they seek living donors, including:
Production of a personalized, high-quality video designed to reflect the patient's wishes, personality, and future - not just their disease
Dedicated campaign webpage to make it easy to convert interest into action
QR-coded postcards and magnets for sharing in local communities
Social media guidance to help families and supporters spread the word
Spanish- and English-language outreach materials for broader access
A living donor mentor to answer any non-medical questions about the process
"Wish buddy" volunteers to assist with video narration and/or sharing patient videos with a broader audience
When interest is generated through the Wish Upon a Donor campaign, APKD ensures both patients and transplant centers are effectively supported with guidance grounded in real-life experience from a dedicated living donor mentor. The organization manages all incoming donor inquiries, educates potential donors on the process, protections, and realities of living donations, and then refers qualified donors to an appropriate transplant center partner. APKD maintains communication and support throughout the evaluation and donation process. This approach empowers potential donors with education while easing the burden on transplant centers."
The U.S. Census Bureau and the U.S. Bureau of Economic Analysis announced today that the goods and services deficit was $52.8 billion in September, down $6.4 billion from $59.3 billion in August, revised.
September exports were $289.3 billion, $8.4 billion more than August exports. September imports were $342.1 billion, $1.9 billion more than August imports.
emphasis added
Click on graph for larger image.
The blue line is the total deficit, and the black line is the petroleum deficit, and the red line is the trade deficit ex-petroleum products.In the week ending December 6, the advance figure for seasonally adjusted initial claims was 236,000, an increase of 44,000 from the previous week's revised level. The previous week's level was revised up by 1,000 from 191,000 to 192,000. The 4-week moving average was 216,750, an increase of 2,000 from the previous week's unrevised average of 214,750.The following graph shows the 4-week moving average of weekly claims since 1971.
emphasis added
Click on graph for larger image.Substack hosts a semi-regular series of live debate events, where I had the pleasure of participating in a live debate with fellow substacker on the topic of “Should robots take our jobs?”. I officially won the debate, but in the end, I think Brian and I agreed on more than we disagreed on.
We agreed that it’s unlikely that robots actually do take all our jobs; in other words, we’re arguing about a fairly sci-fi future instead of a likely scenario. And our basic stance is that society needs to develop institutions to make sure that the wealth from automation is widely distributed throughout society. In the Industrial Age, those institutions were things like welfare states, taxes, unions, labor regulations, minimum wage laws, and so on. It’s not clear what new redistributive institutions would be necessary in an age of pervasive automation — Sovereign wealth funds? UBI? Resources set aside for human consumption? — but it seems likely that we would need some new ones.
Where Brian and I disagree is on the likelihood of these institutions being developed, and on the pain and suffering that will be required in order to build them. Brian thinks that as long as rich guys like Sam Altman are in charge of the development of AI, it will be hard to change society to “redistribute the robots”. I’m far more optimistic; I don’t think the Sam Altmans of the world will ultimately have that much power over our institutions.
It’s also a question of whether you take the long view or the short view in terms of how this all shakes out. In the case of the Industrial Revolution, it took centuries of social and political struggle to wrestle the new productive system into something egalitarian, and along the way there were some very horrific failures such as communism. Whole lifetimes and whole generations were swallowed up by the struggle to tame industrial technology.
But in the end, we succeeded. Our societies are immeasurably better than if we had simply shied away from inventing power looms, or machine tools, or harvesters, or any of the other labor-saving automation that took most of our ancestors’ jobs. Stopping a new technological revolution in its tracks, or significantly slowing it down, is very hard. In general, the only way out is through — the best world comes not from resisting new technology, but from accelerating the development of sociopolitical institutions that make sure new technology’s benefits are widely shared.
I think the audience at the debate agreed with my more optimistic, long-term perspective.
Anyway, big thanks to , , , , and for producing the event, and especially to for getting the event together in the first place. It was very fun, and I’m looking forward to the next one!

China continued a surge in launch activity with a pair of missions Tuesday, adding to an opaque satellite series and launching new remote sensing satellites.
The post China launches new TJS satellite, commercial Kinetica-1 lofts 9 spacecraft appeared first on SpaceNews.

GOLDEN, CO — A university team has found that small orbital debris could emit radio bursts as they collide or approach each other in space. The signal can be detected with large radio dishes on Earth, as well as satellites in orbit. This new intelligence agency-funded research is focused on gauging the interaction of orbital debris […]
The post Colliding space debris produces radio bursts, raising prospect of ‘debris weather’ alerts appeared first on SpaceNews.

SAN FRANCISCO – British startup Odin Space raised $3 million in a seed round to begin commercializing tiny sensors to map and analyze sub-centimeter orbital debris. With its first sensor launched in 2023 on D-Orbit’s ION orbital transfer vehicle, Odin demonstrated its ability to detect debris that’s generally too small to track but still capable […]
The post Odin Space raises $3 million in seed funding appeared first on SpaceNews.

Space missions are entering a new era defined by complexity: more sensors, more software-driven behavior, more tightly coupled subsystems and more interactions between spacecraft and orbital infrastructure. As these systems evolve, the number of potential failure modes grows — ranging from thermal drift and aging hardware to configuration errors, environmental disturbances, and unfamiliar system behavior. […]
The post How multi-agent AI can strengthen space missions against the unknown appeared first on SpaceNews.

SAN FRANCISCO – Benchmark Space Systems’ ASCENT-fueled Macaw thruster performed a 10-minute continuous burn, clearing the way for an on-orbit application of the propulsion technology, the company announced Dec. 10. “Because ASCENT has 50% greater impulse density than other monopropellants, mission planners and spacecraft designers can get the similar delta-v [change in velocity] with less […]
The post Benchmark demonstrates high-throughput ASCENT thruster in hotfire testing at Edwards Air Force Base appeared first on SpaceNews.

Lt. Gen. Phil Garrant, who leads the Space Systems Command, said Blue Origin selected the four-flight benchmark and the government agreed
The post Blue Origin targets four-flight campaign for New Glenn’s path to Space Force certification appeared first on SpaceNews.

The contract will focus on signals intelligence for U.S. Air Force platforms.
The post Voyager wins $21 million Air Force contract for AI-driven signals processing appeared first on SpaceNews.

NATO has picked 150 companies from 24 of its member countries to join its Defence Innovation Accelerator for the North Atlantic next year, including more than two dozen with ties to the space sector.
The post Multiple space companies join NATO’s DIANA defense accelerator appeared first on SpaceNews.

As on-orbit capabilities grow more advanced, ground systems are undergoing a transformation of their own. Ground network specialist ST Engineering iDirect, with headquarters in Herndon, Virginia, is investing in new […]
The post Delivering the next generation of cloud-native, multi-orbit ground systems appeared first on SpaceNews.

Two defense technology companies from Norway and Germany have joined forces to bolster Europe’s sovereign intelligence and communications capabilities, with plans to start deploying small satellites in about three years.
The post Helsing and Kongsberg plot multi-mission European defense space network appeared first on SpaceNews.

The search for past or present life should be the top science objective of future human missions to Mars, a new National Academies report concludes.
The post Report identifies science objectives of human Mars exploration appeared first on SpaceNews.

The findings come from a report by the Consortium for Space Mobility and ISAM Capabilities, or COSMIC
The post GEO satellite refueling a priority for national security, commercial markets, new analysis finds appeared first on SpaceNews.

Lt. Gen. Phil Garrant, head of the Space Systems Command, said Five Eyes partners would likely have access to this technology
The post L3Harris satellite-jamming system approved for export to close U.S. allies appeared first on SpaceNews.

I’ve noted many times the central role of Supreme Court reform to any civic democratic future. If you’re a regular reader, you know my arguments. So I won’t recapitulate them here. I’ve also noted how very few Democratic officials seem at all ready for this and a huge amount of work is required to get them here. Luckily there’s time: The first chance to do anything like this is 2029. But there’s another, even more critical, underlying need. A lot of the Democratic public still sees the idea as disconcerting or extreme. And we shouldn’t run away from this perception. Because it is extreme. It is a remedy only justified and really necessitated by a basically unprecedented development in American history which is robbing the public of its right to self-government. (The question is whether there is any precedent is complicated. There are arguably two similar instances in American history. But we can return to that later.) The point is that there is a lot of work to do. Inherently resistant Democratic politicians certainly aren’t going to be brought along if a substantial number of their own voters, perhaps a majority of them, are spooked by the idea.
So this requires a substantial campaign of public education — activist/political groups dedicated specifically and focusedly to the issue, ones that are political activist in nature, ones that draw from the elite legal world. An entire language of explanation is required.
Obviously people have all sorts of ideas and it’s the ultimate hand wave to say … well, now all we need is a public education campaign to convince the public what I want is a good idea. That’s all! Simple. Just this one neat trick. But while this captures some of the challenge, there are concrete and non-aspirational reasons to believe this is different. I don’t think Democrats need any convincing that something has gone catastrophically wrong in the country. I think that perception actually goes way beyond partisan Democrats. Many people seem to now get — with the totality of the last decade and 2025 — that only extreme actions have any hope of putting the genie back in the bottle or even getting the genie bottle adjacent. The example of the Biden presidency actually adds to the weight. People get that just electing a Democratic president in itself isn’t enough. I think there are various aspects of the Biden presidency that will look better with the passage of time. But from the vantage point of 2025, it certainly has the look of a course of antibiotics you started and then stopped after just a day or two, which led to the infection coming back in a more virulent form. (I actually read about some recent studies that throws into question that old “finish the course” mantra. But again, let’s not get distracted!)
The point here is that I think the public realizes something has gone terribly wrong and that very strong medicine will be required to fix it. But unless you’re paying very close attention to the details, it’s just not clear that this is one of two or three sine qua non things that are required. The predicates are already there, so this isn’t like saying, Well we just need a public education campaign for my new idea of citizenship and unlimited bacon treats for dogs. As I’ve explained before, even I find the idea of SCOTUS reform extreme and disconcerting. But I spend my days watching the direction of politics and political power, looking at the building blocks and details very closely. When you do that, you pretty quickly realize that you can have democratic self-government (and the possibility of Democrats ever actually being in power), or you can have this Supreme Court. But not both. I genuinely believe that when the matter is examined closely, the case is that tight.
I’ve made my point. This requires an all-out effort not only to bring along Democratic lawmakers but the Democratic voters they ultimately respond to — political activists, legal elites, inside players, outside players — to be ready to act when the moment comes. There’s time. But not much.
Before anyone starts patting the Trump administration on its back for one good typographic decision, take a gander at the hard-to-believe-this-is-real new signage at (and alas, on) the White House. This is the sort of signage that typically spells “Business Center” across from the check-in desk at a Courtyard Marriott. The Biden State Department replacing Times New Roman with Cabrini was a typographic misdemeanor. Festooning the White House with signage set in gold-plated Shelley Script ought to land Trump in the Hague.
(The idea that the Oval Office ought to be explicitly labeled “The Oval Office” — whatever the typeface or signage style — brings to mind this classic Far Side cartoon, which I think aptly illustrates the president’s mental faculties.)
The fifth of five rules in Matthew Butterick’s “Typography in Ten Minutes”:
And finally, font choice. The fastest, easiest, and most visible improvement you can make to your typography is to ignore the fonts already loaded on your computer (known as system fonts) and the free fonts that inundate the internet. Instead, buy a professional font (like those found in font recommendations). A professional font gives you the benefit of a professional designer’s skills without having to hire one.
If that’s impossible, you can still make good typography with system fonts. But choose wisely. And never choose Times New Roman or Arial, as those fonts are favored only by the apathetic and sloppy. Not by typographers. Not by you.
I’m a big believer in reading original source material. For example, when Apple provided me, alongside only a handful of other outlets, with a statement regarding their decision to delay the “more personalized Siri” back in March, I ran the full statement, verbatim. I added my own commentary, but I wanted to let Apple’s own statement speak for itself first. It drives me nuts when news sites in possession of a statement or original document do not make the full original text available, even if only in a link at the bottom, and choose only to quote short excerpts.
With regard to today’s news regarding Marco Rubio’s directive re-establishing Times New Roman as the default font for U.S. State Department documents (rescinding the Biden administration’s 2023 change to Calibri), I very much wanted to read the original. The New York Times broke the news, stated that they had obtained the memo, and quoted phrases and words from it, but they did not provide a copy of the original.
The State Department has not made this document publicly available, and to my knowledge, no one else has published it. I have obtained a copy from a source, and have made it available here in plain text format. The only change I’ve made is to replace non-breaking spaces (U+00A0) with regular spaces.1
Please do read it yourself, and do so with an open mind.
It seems clear to me that The New York Times did Rubio dirty in their characterization of the directive. The Times story, credited to reporters Michael Crowley and Hamed Aleaziz, ran under the headline “At State Dept., a Typeface Falls Victim in the War Against Woke”, and opens thus:
Secretary of State Marco Rubio waded into the surprisingly fraught politics of typefaces on Tuesday with an order halting the State Department’s official use of Calibri, reversing a 2023 Biden-era directive that Mr. Rubio called a “wasteful” sop to diversity.
While mostly framed as a matter of clarity and formality in presentation, Mr. Rubio’s directive to all diplomatic posts around the world blamed “radical” diversity, equity, inclusion and accessibility programs for what he said was a misguided and ineffective switch from the serif typeface Times New Roman to sans serif Calibri in official department paperwork.
Rubio’s memo ran about 950 words. Here are the full quotes the Times pulled from it, consisting of just 56 words, aside from the memo’s subject line (“Return to Tradition: Times New Roman 14-Point Font Required for All Department Paper”):
“wasteful”
“radical”
“restore decorum and professionalism to the department’s written work.”
“informal”
“clashes”
“was not among the department’s most illegal, immoral, radical or wasteful instances of D.E.I.A.”
“accessibility-based document remediation cases”
“Switching to Calibri achieved nothing except the degradation of the department’s official correspondence.”
“generally perceived to connote tradition, formality and ceremony”
Rubio’s memo wasn’t merely “mostly framed as a matter of clarity and formality in presentation”. That’s entirely what the memo is about. Serif typefaces like Times New Roman are more formal. It was the Biden administration and then-Secretary of State Antony Blinken who categorized the 2023 change to Calibri as driven by accessibility. I do not have access to Blinken’s memo making that change (under the cringe-inducing subject line “The Times (New Roman) are a-Changin”), but it was first reported by John Hudson and Annabelle Timsit at The Washington Post, where they wrote:
The secretary’s decision was motivated by accessibility issues and not aesthetics, said a senior State Department official familiar with the change.
Rubio’s memo makes the argument — correctly — that aesthetics matter, and that the argument that Calibri was in any way more accessible than Times New Roman was bogus. Rubio’s memo does not lash out against accessibility as a concern or goal. He simply makes the argument that Blinken’s order mandating Calibri in the name of accessibility was an empty gesture. Purely performative, at the cost of aesthetics. Going back to that 2023 story at the Post, they quote from Blinken’s memo thus:
In its cable, the State Department said it was choosing to shift to 14-point Calibri font because serif fonts like Times New Roman “can introduce accessibility issues for individuals with disabilities who use Optical Character Recognition technology or screen readers. It can also cause visual recognition issues for individuals with learning disabilities,” it said.
The bit here about OCR is utter nonsense, a voodoo belief. No OCR or screen-reader software in use today has any problem whatsoever with Times New Roman. That’s just made-up nonsense, and I’d like to see sources for the claim about “visual recognition issues for individuals with learning disabilities”. I don’t think it’s true, and citing it alongside a provably wrong claim about OCR software makes me even more skeptical.
Rubio brings actual numbers to make his case, which is more than can be said for anyone I’ve found arguing that Calibri is somehow more accessible than Times New Roman. Rubio’s argument is alluded to in the Times’s article thus:
But Mr. Rubio called it a failure by its own standards, saying that “accessibility-based document remediation cases” at the department had not declined.
Here’s the full passage from Rubio’s memo:
And although switching to Calibri was not among the Department’s most illegal, immoral, radical, or wasteful instances of DEIA (see, e.g., Executive Orders 14151, 14173, 14281, and Memorandum on Removing Discrimination and Discriminatory Equity Ideology From the Foreign Service (DCPD202500375)) it was nonetheless cosmetic: the switch was promised to mitigate “accessibility issues for individuals with disabilities,” and employees were promised, “Your adoption supports the Department’s commitment to create a more accessible workplace,” but these promises were false. In fact, the number of accessibility-based document remediation cases at the Department of State was the same in the year after adopting Calibri as in the year before (1,192 cases in FY2024 versus 1,193 cases in FY2022). And the costs of remediation actually increased by $145,000 in that period — nearly a 20% jump. Switching to Calibri achieved nothing except the degradation of the Department’s official correspondence.
2024 was a Biden year, not a Trump year, so there’s no reason to think the remediation numbers were counted differently. The change to Calibri was the worst kind of accessibility effort: one that was founded on nothing more than feel-good performance. It was a change everyone could see and notice, but one that had no practical benefit whatsoever. Good on Rubio for rescinding a bad decision, and even better for doing so with a fair and informative explanation.2 (His memo even explains, “Fonts are specific variations of a typeface.... Through common use, the word font has come to mean both typeface and font.”)
The memo, per State Department standards perhaps, uses two spaces after sentences and colons. In the original copy I received, those double-spaces were sometimes in the sequence NON-BREAK-SPACE + SPACE, and other times the other way around: SPACE + NON-BREAK-SPACE. There were also a handful of seemingly random non-breaking space characters between words, mid-sentence. All of them, I suspect, just invisible-to-the-eye detritus from Microsoft Word. I replaced all of them with regular spaces, preserving, in plain text, two spaces wherever two spaces were intended. ↩︎
Do I think it was “fair and informative” to describe all of the Biden State Department’s DEIA initiatives as “illegal, immoral, radical, or wasteful”? No. Did I bother reading any of the documents Rubio referenced as proving such? No. Do I think this particular memorandum, specific to changing State’s font back to Times New Roman, would have been stronger without that line, leaving his defenestration of the Calibri font change to speak for itself? Yes. But that line was just one aside in an otherwise focused, sober, and, yes, fair and informative memo. ↩︎︎
Michael Crowley and Hamed Aleaziz, reporting for The New York Times:
While mostly framed as a matter of clarity and formality in presentation, Mr. Rubio’s directive to all diplomatic posts around the world blamed “radical” diversity, equity, inclusion and accessibility programs for what he said was a misguided and ineffective switch from the serif typeface Times New Roman to sans serif Calibri in official department paperwork.
In an “Action Request” memo obtained by The New York Times, Mr. Rubio said that switching back to the use of Times New Roman would “restore decorum and professionalism to the department’s written work.” Calibri is “informal” when compared to serif typefaces like Times New Roman, the order said, and “clashes” with the department’s official letterhead. [...]
Then-Secretary of State Antony J. Blinken ordered the 2023 typeface shift on the recommendation of the State Department’s office of diversity and inclusion, which Mr. Rubio has since abolished. The change was meant to improve accessibility for readers with disabilities, such as low vision and dyslexia, and people who use assistive technologies, such as screen readers. [...]
But Mr. Rubio’s order rejected the grounds for the switch. The change, he allowed, “was not among the department’s most illegal, immoral, radical or wasteful instances of D.E.I.A.,” the acronym for diversity, equity, inclusion and accessibility. But Mr. Rubio called it a failure by its own standards, saying that “accessibility-based document remediation cases” at the department had not declined.
“Switching to Calibri achieved nothing except the degradation of the department’s official correspondence,” Mr. Rubio said. He noted that Times New Roman had been the department’s official typeface for nearly 20 years until the 2023 change. (Before 2004, the State Department used Courier New.)
When Blinken ordered the change to Calibri in 2023, I wrote:
It is correct for the State Department to have a house style for documents. I’m not sure what font they should use, but it wasn’t Times, and it shouldn’t be Calibri. Off the top of my head, I’d suggest Caslon — a sturdy, serious typeface that looked good 250 years ago, looks good now, and should look good 250 years from now.
While neither is a good choice, between the two, Times New Roman is clearly better. Unstated in my post from 2023 is acknowledgement that the choice might be limited to the default fonts in Microsoft Office. Limited to those fonts, Times New Roman might be the best choice. I just think it’s stupid for an institution with the resources of the U.S. State Department to shrug its shoulders at the notion that they should license and install whatever fonts they want on all of their computers. Anyone making excuses that they “can’t” do that should be fired. It’s the job of IT to serve the needs of the organization, not the organization’s job to limit itself to what makes IT easiest.
Calibri does convey a sense of casualness — and more so, modernity — that is not appropriate for the U.S. State Department. And I do not buy the argument that Calibri is somehow more accessible for those with low vision or reading disabilities. People with actual accessibility needs should be catered to, but they need more than a sans serif typeface, and their needs should not primarily motivate the choice for the default typeface. Dyslexics need typefaces like OpenDyslexic; people with low vision need font sizes much larger than 14-point. Those would make for terrible defaults for everyone.

In case you missed it, we kicked off Golden Duke 2025 voting last week and the competition is heating up. Some of you have reached out with your complaints about the exclusion of ne’er-do-wells such as President Trump, Defense Secretary Pete Hegseth and Rep. Cory Mills (R-FL) from this year’s awards. We hear you! They suck! They can re-earn their places in the Duke competition when they start sucking in a less demonic and more lighthearted way ❤️
If you haven’t had a chance to participate in selecting 2025’s most admirable vermin please follow this link and vote before time runs out! As you’ll see, we’ve added some new categories this year to meet our uniquely maniacal moment 🙃 Early voting shows that Trump’s $300 Million White House Ballroom might easily take the cake for Best Scandal but the competition to win a Meritorious Achievement in Grifting or Best Supporting Hatchet Man remains tight.
P.S. If you submitted the nomination we ended up publishing for Lindsey Halligan, Tom “Cashbag” Homan or Signalgate, please reach out! We don’t have your email address and want to send your complimentary TPM merch.
In our Marginal Revolution Podcast on Crime in the 1970s, I pointed out that blacks were often strongly in favor of tough on crime laws:
Tabarrok: [P]eople think that mass incarceration is a peculiarly American phenomena, or that it came out of nowhere, or was due solely to racism. Michelle Alexander’s, The New Jim Crow, takes this view.
…[But] back then, the criminal justice system was also called racist, but the racism that people were pointing to was that black criminals were let back on the streets to terrorize black victims, and that black criminals were given sentences which were too light. That was the criticism back then. It was black and white victims together who drove the punishment of criminals. I think this actually tells you about two falsehoods. First, the primary driver of mass imprisonment was not racism. It was violent crime.
Second, this also puts the lie, sometimes you hear from conservatives, to this idea that black leaders don’t care about black-on-black crime. That’s a lie. Many Black leaders have been, and were, and are tough on crime. Now, it’s true, as crime began to fall in the 1990s, many blacks and whites began to have misgivings about mass incarceration. Crime was a huge problem in the 1970s and 1980s, and it hit the United States like a brick. It seemed to come out of nowhere. You can’t blame people for seeking solutions, even if the solutions come with their own problems.
A new paper The Racial Politics of Mass Incarceration by Clegg and Usmani offer more evidence challenging the now conventional Michelle Alexander view:
Public opinion data show that not just the white but also the black public became more punitive after the 1960s. Voting data from the House show that most black politicians voted punitively at the height of concern about crime. In addition, an analysis of federally mandated redistricting suggests that in the early 1990s, black political representation had a punitive impact at the state level. Together, our evidence suggests that crime had a profound effect on black politics. It also casts some doubt on the conventional view of the origins of mass incarceration.
As the authors note, the fact that blacks supported tough-on-crime laws doesn’t mean racism was absent. Racial overtones surely influenced the specific ways fear of crime was translated into policy. But the primary driver of mass incarceration wasn’t racism—it was mass crime.
The post Mass Incarceration and Mass Crime appeared first on Marginal REVOLUTION.
I've never been particularly invested dark v.s. light mode but I get enough people complaining that this site is "blinding" that I decided to see if Claude Code for web could produce a useful dark mode from my existing CSS. It did a decent job, using CSS properties, @media (prefers-color-scheme: dark) and a data-theme="dark" attribute based on this prompt:
Add a dark theme which is triggered by user media preferences but can also be switched on using localStorage - then put a little icon in the footer for toggling it between default auto, forced regular and forced dark mode
The site defaults to picking up the user's preferences, but there's also a toggle in the footer which switches between auto, forced-light and forced-dark. Here's an animated demo:

I had Claude Code make me that GIF from two static screenshots - it used this ImageMagick recipe:
magick -delay 300 -loop 0 one.png two.png \
-colors 128 -layers Optimize dark-mode.gif
The CSS ended up with some duplication due to the need to handle both the media preference and the explicit user selection. We fixed that with Cog.
Tags: css, coding-agents, ai-assisted-programming, claude, claude-code, design, llms, ai, generative-ai
Last night Donald Trump gave an important speech on the economy in Pennsylvania — supposedly in a working-class area, although the actual venue was a luxury casino resort. The event was initially touted as the start of an “affordability tour,” the first of a series of speeches intended to reverse Trump’s cratering approval on his handling of inflation and the economy. A number of news analyses suggested that he would use the occasion to blame Democrats for the economy’s troubles.
That was never going to happen. Trump did, of course, take many swipes at Joe Biden, as well as attacking immigrants, women and windmills. But to blame Democrats for the economy’s problems he would have to admit that the Trump economy has problems. And the speech was important because it revealed that he won’t make any such admission, and will continue to gaslight the public.
On Monday Politico interviewed Trump, asking him, among other things, what grade he would give the current economy. His answer: “A-plus-plus-plus-plus-plus.”
In fact, until very recently Trump wouldn’t even accept the reality that ordinary Americans don’t share his triumphalism. When Fox News’s Laura Ingraham asked him a month ago why people are anxious about the economy, Trump replied
I don’t know they are saying that. The polls are fake. We have the greatest economy we’ve ever had.
Since then Trump and his minions seem to have come around to admitting that Americans are, in fact, unhappy with the state of the economy. But if the economy is A+++++, why don’t people see it? The problem can’t possibly lie with him — so it must lie with you. “The American people don’t know how good they have it.”
I put that line in quotes because it isn’t a caricature or a paraphrase. It is, in fact, literally what Scott Bessent, the Treasury secretary, said the other day:
We’ve made a lot of gains, but remember, we’ve got this embedded inflation from the Biden years, where mainstream media, whether it’s Greg Ip at the Wall Street Journal, toxic Paul Krugman at New York Times or former Vice Chair, Alan Blinder, all said it was a vibecession. The American people don’t know how good they have it.
Incidentally, I appreciate the personal plug. Trump has already called me a “deranged bum.” Now Bessent says I’m “toxic.” Give me a fake peace prize, and I’ll have all the honors anyone could ask for.
Anyway, I may not be a political strategist, but I don’t think “You’re all a bunch of ingrates” is a winning message. It was, however, really the only message Trump could deliver, given his utter lack of empathy or humility.
At this point I could bombard you with a lot of data showing that the economy is not, in fact, A+++++. But it isn’t a disaster area, at least not yet. So why are Americans feeling so down? The main culprit is Trump himself.
First, during the 2024 campaign Trump repeatedly promised to bring consumer prices way down beginning on “day one.” We’re now 11 months in, prices are still rising, and voters who believed him feel, with reason, that they were lied to. Last night Trump insisted that prices are, in fact, coming way down. Again, “Who you gonna believe, me or your lying eyes?” is a self-destructive political strategy.
Second, Trump would be in much better political shape right now if he had basically continued Biden’s policies, with only a few cosmetic changes. When he took office inflation was on a declining trajectory. Consumer sentiment was relatively favorable at the start of 2025. Americans were still angry about high prices, but the inflation surge of 2021-3 had happened on Biden’s watch and was receding into the past. My guess is that many voters would have accepted Trump’s claims that high prices were Democrats’ fault and given him the benefit of the doubt about the economy’s future if he had simply done nothing drastic and left policies mostly as they were.
Instead, he brought chaos: Massive and massively unpopular tariffs, DOGE disruptions, masked ICE agents grabbing people off the street, saber-rattling and war crimes in the Caribbean. Many swing voters, I believe, supported Trump out of nostalgia for the relative calm that prevailed before Covid struck. They didn’t think they were voting for nonstop political PTSD.
And there’s more to come. Health insurance costs are about to spike, because Republicans refuse to extend Biden-era subsidies. Inflation may pick up in the next few months as retailers, who have so far absorbed much of the cost of Trump’s tariffs, begin passing them on to consumers.
So the “affordability tour” is off to a disastrous start. And it won’t get better, because while Trump insists that the problem is you, it’s actually him. And he isn’t going to change.
NONMUSICAL CODA
I've started using the term HTML tools to refer to HTML applications that I've been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I've discovered along the way.
First, some examples to show the kind of thing I'm talking about:
These are some of my recent favorites. I have dozens more like this that I use on a regular basis.
You can explore my collection on tools.simonwillison.net - the by month view is useful for browsing the entire collection.
If you want to see the code and prompts, almost all of the examples in this post include a link in their footer to "view source" on GitHub. The GitHub commits usually contain either the prompt itself or a link to the transcript used to create the tool.
These are the characteristics I have found to be most productive in building tools of this nature:
The end result is a few hundred lines of code that can be cleanly copied and pasted into a GitHub repository.
The easiest way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have features where they can write a simple HTML+JavaScript application and show it to you directly.
Claude calls this "Artifacts", ChatGPT and Gemini both call it "Canvas". Claude has the feature enabled by default, ChatGPT and Gemini may require you to toggle it on in their "tools" menus.
Try this prompt in Gemini or ChatGPT:
Build a canvas that lets me paste in JSON and converts it to YAML. No React.
Or this prompt in Claude:
Build an artifact that lets me paste in JSON and converts it to YAML. No React.
I always add "No React" to these prompts, because otherwise they tend to build with React, resulting in a file that is harder to copy and paste out of the LLM and use elsewhere. I find that attempts which use React take longer to display (since they need to run a build step) and are more likely to contain crashing bugs for some reason, especially in ChatGPT.
All three tools have "share" links that provide a URL to the finished application. Examples:
Coding agents such as Claude Code and Codex CLI have the advantage that they can test the code themselves while they work on it using tools like Playwright. I often upgrade to one of those when I'm working on something more complicated, like my Bluesky thread viewer tool shown above.
I also frequently use asynchronous coding agents like Claude Code for web to make changes to existing tools. I shared a video about that in Building a tool to copy-paste share terminal sessions using Claude Code for web.
Claude Code for web and Codex Cloud run directly against my simonw/tools repo, which means they can publish or upgrade tools via Pull Requests (here are dozens of examples) without me needing to copy and paste anything myself.
Any time I use an additional JavaScript library as part of my tool I like to load it from a CDN.
The three major LLM platforms support specific CDNs as part of their Artifacts or Canvas features, so often if you tell them "Use PDF.js" or similar they'll be able to compose a URL to a CDN that's on their allow-list.
Sometimes you'll need to go and look up the URL on cdnjs or jsDelivr and paste it into the chat.
CDNs like these have been around for long enough that I've grown to trust them, especially for URLs that include the package version.
The alternative to CDNs is to use npm and have a build step for your projects. I find this reduces my productivity at hacking on individual tools and makes it harder to self-host them.
I don't like leaving my HTML tools hosted by the LLM platforms themselves for a couple of reasons. First, LLM platforms tend to run the tools inside a tight sandbox with a lot of restrictions. They're often unable to load data or images from external URLs, and sometimes even features like linking out to other sites are disabled.
The end-user experience often isn't great either. They show warning messages to new users, often take additional time to load and delight in showing promotions for the platform that was used to create the tool.
They're also not as reliable as other forms of static hosting. If ChatGPT or Claude are having an outage I'd like to still be able to access the tools I've created in the past.
Being able to easily self-host is the main reason I like insisting on "no React" and using CDNs for dependencies - the absence of a build step makes hosting tools elsewhere a simple case of copying and pasting them out to some other provider.
My preferred provider here is GitHub Pages because I can paste a block of HTML into a file on github.com and have it hosted on a permanent URL a few seconds later. Most of my tools end up in my simonw/tools repository which is configured to serve static files at tools.simonwillison.net.
One of the most useful input/output mechanisms for HTML tools comes in the form of copy and paste.
I frequently build tools that accept pasted content, transform it in some way and let the user copy it back to their clipboard to paste somewhere else.
Copy and paste on mobile phones is fiddly, so I frequently include "Copy to clipboard" buttons that populate the clipboard with a single touch.
Most operating system clipboards can carry multiple formats of the same copied data. That's why you can paste content from a word processor in a way that preserves formatting, but if you paste the same thing into a text editor you'll get the content with formatting stripped.
These rich copy operations are available in JavaScript paste events as well, which opens up all sorts of opportunities for HTML tools.
The key to building interesting HTML tools is understanding what's possible. Building custom debugging tools is a great way to explore these options.
clipboard-viewer is one of my most useful. You can paste anything into it (text, rich text, images, files) and it will loop through and show you every type of paste data that's available on the clipboard.

This was key to building many of my other tools, because it showed me the invisible data that I could use to bootstrap other interesting pieces of functionality.
More debugging examples:
KeyCode values) currently being held down.HTML tools may not have access to server-side databases for storage but it turns out you can store a lot of state directly in the URL.
I like this for tools I may want to bookmark or share with other people.
The localStorage browser API lets HTML tools store data persistently on the user's device, without exposing that data to the server.
I use this for larger pieces of state that don't fit comfortably in a URL, or for secrets like API keys which I really don't want anywhere near my server - even static hosts might have server logs that are outside of my influence.
prompt() function) and then store that in localStorage. This one uses Claude Haiku to write haikus about what it can see through the user's webcam.CORS stands for Cross-origin resource sharing. It's a relatively low-level detail which controls if JavaScript running on one site is able to fetch data from APIs hosted on other domains.
APIs that provide open CORS headers are a goldmine for HTML tools. It's worth building a collection of these over time.
Here are some I like:
GitHub Gists are a personal favorite here, because they let you build apps that can persist state to a permanent Gist through making a cross-origin API call.
.whl file for a Python package from PyPI, unzips it (in browser memory) and lets you navigate the files.All three of OpenAI, Anthropic and Gemini offer JSON APIs that can be accessed via CORS directly from HTML tools.
Unfortunately you still need an API key, and if you bake that key into your visible HTML anyone can steal it and use to rack up charges on your account.
I use the localStorage secrets pattern to store API keys for these services. This sucks from a user experience perspective - telling users to go and create an API key and paste it into a tool is a lot of friction - but it does work.
Some examples:
You don't need to upload a file to a server in order to make use of the <input type="file"> element. JavaScript can access the content of that file directly, which opens up a wealth of opportunities for useful functionality.
Some examples:
PDF.js and Tesseract.js to allow users to open a PDF in their browser which it then converts to an image-per-page and runs through OCR.ffmpeg command needed to produce a cropped copy on your own machine.An HTML tool can generate a file for download without needing help from a server.
The JavaScript library ecosystem has a huge range of packages for generating files in all kinds of useful formats.
Pyodide is a distribution of Python that's compiled to WebAssembly and designed to run directly in browsers. It's an engineering marvel and one of the most underrated corners of the Python world.
It also cleanly loads from a CDN, which means there's no reason not to use it in HTML tools!
Even better, the Pyodide project includes micropip - a mechanism that can load extra pure-Python packages from PyPI via CORS.
Pyodide is possible thanks to WebAssembly. WebAssembly means that a vast collection of software originally written in other languages can now be loaded in HTML tools as well.
Squoosh.app was the first example I saw that convinced me of the power of this pattern - it makes several best-in-class image compression libraries available directly in the browser.
I've used WebAssembly for a few of my own tools:
The biggest advantage of having a single public collection of 100+ tools is that it's easy for my LLM assistants to recombine them in interesting ways.
Sometimes I'll copy and paste a previous tool into the context, but when I'm working with a coding agent I can reference them by name - or tell the agent to search for relevant examples before it starts work.
The source code of any working tool doubles as clear documentation of how something can be done, including patterns for using editing libraries. An LLM with one or two existing tools in their context is much more likely to produce working code.
I built pypi-changelog by telling Claude Code:
Look at the pypi package explorer tool
And then, after it had found and read the source code for zip-wheel-explorer:
Build a new tool pypi-changelog.html which uses the PyPI API to get the wheel URLs of all available versions of a package, then it displays them in a list where each pair has a "Show changes" clickable in between them - clicking on that fetches the full contents of the wheels and displays a nicely rendered diff representing the difference between the two, as close to a standard diff format as you can get with JS libraries from CDNs, and when that is displayed there is a "Copy" button which copies that diff to the clipboard
Here's the full transcript.
See Running OCR against PDFs and images directly in your browser for another detailed example of remixing tools to create something new.
I like keeping (and publishing) records of everything I do with LLMs, to help me grow my skills at using them over time.
For HTML tools I built by chatting with an LLM platform directly I use the "share" feature for those platforms.
For Claude Code or Codex CLI or other coding agents I copy and paste the full transcript from the terminal into my terminal-to-html tool and share that using a Gist.
In either case I include links to those transcripts in the commit message when I save the finished tool to my repository. You can see those in my tools.simonwillison.net colophon.
I've had so much fun exploring the capabilities of LLMs in this way over the past year and a half, and building tools in this way has been invaluable in helping me understand both the potential for building tools with HTML and the capabilities of the LLMs that I'm building them with.
If you're interested in starting your own collection I highly recommend it! All you need to get started is a free GitHub repository with GitHub Pages enabled (Settings -> Pages -> Source -> Deploy from a branch -> main) and you can start copying in .html pages generated in whatever manner you like.
Bonus transcript: Here's how I used Claude Code and shot-scraper to add the screenshots to this post.
Tags: definitions, github, html, javascript, projects, tools, ai, webassembly, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, claude-code
The Normalization of Deviance in AI
This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing real economic harm, is anyone going to care?Johann describes the concept of the “Normalization of Deviance” as directly applying to this question.
Coined by Diane Vaughan, the key idea here is that organizations that get away with “deviance” - ignoring safety protocols or otherwise relaxing their standards - will start baking that unsafe attitude into their culture. This can work fine… until it doesn’t. The Space Shuttle Challenger disaster has been partially blamed on this class of organizational failure.
As Johann puts it:
In the world of AI, we observe companies treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe.
Vendors are normalizing trusting LLM output, but current understanding violates the assumption of reliability.
The model will not consistently follow instructions, stay aligned, or maintain context integrity. This is especially true if there is an attacker in the loop (e.g indirect prompt injection).
However, we see more and more systems allowing untrusted output to take consequential actions. Most of the time it goes well, and over time vendors and organizations lower their guard or skip human oversight entirely, because “it worked last time.”
This dangerous bias is the fuel for normalization: organizations confuse the absence of a successful attack with the presence of robust security.
Tags: security, ai, prompt-injection, generative-ai, llms, johann-rehberger, ai-ethics
You should be able to provide an LLM as a job reference, just like you would a coworker, manager, or professor. It can form an opinion and represent you without revealing any private data.
Here is more from John Carmack.
The post Sentences to ponder appeared first on Marginal REVOLUTION.
Jake Becraft was working on mRNA way before it was cool.
In fact, Becraft’s advisors at MIT told him trying to develop therapies with mRNA would be a colossal waste of time. But, here we are in 2025, and Becraft has pushed the mRNA technology that gained so much attention during the pandemic in rather incredible new directions.
Becraft joins the podcast this week to talk about his company Strand Therapeutics and its programmable mRNA technology. Strand has developed a way to send therapies into the body and have them aim right for diseased cells. Its first clinical trial has focused on melanoma where Strand has been able to treat patients who were deemed incurable with any other medicines.
Jake and I met up at Strand’s headquarters in Boston with a double-helix hanging over our heads. We covered Strand’s work, Jake’s background and the future of synthetic biology.
We’ll have a video episode coming on Strand and its lab and technology soon on our YouTube channel, which you should be subscribing to because it’s awesome.
Our show is sponsored by Brex, the intelligent finance platform. Like thousands of ambitious, innovative companies, we run on Brex so we can spend smarter and move faster. And you can too. Learn more at www.brex.com/corememory
The podcast is also made possible by E1 Ventures, which backs the most ambitious founders and start-ups.
In late 2023, Matt Osman went for a preventative MRI.
His life was good. His AI company, Treat, was humming, and he was looking to start a family. He was young, active, and had no medical issues. He had no reason to expect anything was wrong. Still, as a precaution, he decided to get some tests done.
Not long after the MRI, Osman was walking down Orchard Street in the Lower East Side of New York City when his phone buzzed. An e-mail from the screening clinic arrived first and then, ten seconds later, a call. They never call with good news. The radiologist saw a complex mass on his pancreas, the size of a golf ball. The pancreas isn’t a good place to find anything, and the clinic recommended he see an oncologist immediately.
What followed was a year of unbearable uncertainty. Biopsies were inconclusive, and two surgical oncology teams couldn’t agree on what to do next. One surgeon wanted to resect immediately, another warned of dangerous complications that could damage enough pancreatic tissue that he’d almost certainly become diabetic. With a split ticket, Osman opted to monitor, watching for changes in shape or size. If it stayed put, the likelihood of it being benign would ratchet up over time.
It did stay put. Osman is now in the clear. But the experience rewired something inside of him. “It was the thing that started me on the journey of realizing that human tissue is one of the most valuable substances on earth,” he told me, “and that we now have a plausible path to making it on demand.”
That path is Polyphron, the company Osman co-founded with Fabio Boniolo, a Harvard-trained computational biologist. They’re building a platform to grow functional human tissue—not by painstakingly reverse-engineering each tissue type, but by training artificial intelligence models on developmental biology and letting them learn what embryos already know how to do. Success here could mean another viable path to human life extension.
Ultimately, the bulk of human morbidity and mortality isn’t whole limb loss or sepsis. What makes us sick is scarred hearts after cardiac events, damaged lungs from smoking, or sclerotic kidneys that can’t filter like they used to. Polyphron is simply asking whether it’s possible to get to those damaged organs earlier, with smaller interventions, matched to the patient, without waiting for catastrophe.
Here is the audio, video, and transcript. Here is the episode summary:
Gaurav Kapadia has deliberately avoided publicity throughout his career in investing, which makes this conversation a rare window into how he thinks. He now runs XN, a firm built around concentrated bets on a small number of companies with long holding periods. However, his education in judgment began much earlier, in a two-family house in Flushing that his parents converted into a four-family house. It was there where a young Gaurav served as de facto landlord, collecting rent and negotiating late payments at age 10. That grounding now expresses itself across an unusual range of domains: Tyler invited him on the show not just as an investor, but as someone with a rare ability to judge quality in cities, talent, art, and more with equal fluency.
Tyler and Gaurav discuss how Queens has thrived without new infrastructure, what he’d change as “dictator” of Flushing, whether Robert Moses should rise or fall in status, who’s the most underrated NYC mayor, what’s needed to attract better mayoral candidates, the weirdest place in NYC, why he initially turned down opportunities in investment banking for consulting, bonding with Rishi Sunak over railroads, XN’s investment philosophy, maintaining founder energy in investment firms and how he hires to prevent complacency, AI’s impact on investing, the differences between New York and London finance, the most common fundraising mistake art museums make, why he collects only American artists within 20 years of his own age, what makes Kara Walker and Rashid Johnson and Salman Toor special, whether buying art makes you a better investor, his new magazine Totei celebrating craft and craftsmanship, and much more.
Excerpt:
COWEN: Now, I don’t intend this as commentary on any particular individual, but what is it that could be done to attract a higher quality of candidate for being mayor of New York? It’s a super important job. It’s one of the world’s greatest cities, arguably the greatest. Why isn’t there more talent running after it?
KAPADIA: It is something that I’ve thought about a great deal. I think there’s a bunch of little things that accumulate, but the main thing that happens in New York City is, people automatically assume they can’t win because it’s such a big and great city. Actually, the last few presidential elections and also the current mayoral election have taught people that anyone could win. I think that, in and of itself, is going to draw more candidates as we go forward.
What happened as an example, this time, people just assumed that one candidate had the race locked up, so a lot of good candidates, even that I know, decided not even to run. It turns out that that ended up not being the case at all. Now that people put that into their mental model, the new Bayesian analysis of that would be, “Oh, more people should run.”
The second thing: New York has a bunch of very peculiar dynamics. It’s an off-year election, and the primaries are at very awkward times. I believe there’s a history of why the primary shifted to basically the third week of June, in which there’s a very low turnout. The third week of June in New York City, when the private schools are out and an off-year election. You’re able to win the Democratic nomination and therefore the mayoral election with tens of thousands of votes in a city this big. That is absolutely insane.
A couple of things that I would probably do would be to make the primary more normal, change the election timing to make it on-cycle, even number of years. You’d have to figure out how to do that. Potentially have an open primary as well.
COWEN: If we apply the Gaurav Kapadia judgment algorithm to mayoral candidates, what’s the non-obvious quality you’re looking for?
KAPADIA: Optimism.
COWEN: Optimism.
KAPADIA: Optimism.
COWEN: Is it scarce?
KAPADIA: Extraordinarily scarce. I think there’s much more doomerism everywhere than optimism. At the end of the day, people are attracted to optimism. If you think about the machinery of the city and the state, having a clear plan — of course, you need all the basics. You need to be able to govern. It’s a very complicated city. There’re many constituents.
But I think beyond that, you have to have the ability to inspire. For some reason, almost all of the candidates, over the last couple of cycles, have really not had that — with the exception of probably one — the ability to inspire. I think that is the most underrated quality that one will need.
COWEN: I have my own answer to this question, but I’m curious to see what you say. What is, for you, the weirdest part of New York City that you know of that doesn’t really feel like it belongs to New York City at all?
Definitely recommended.
The post My Conversation with the excellent Gaurav Kapadia appeared first on Marginal REVOLUTION.

SpaceX launched another batch of 29 Starlink V2 Mini satellites to low Earth orbit on its Falcon 9 rocket Thursday afternoon breaking its pad turnaround record by nearly five hours.
Liftoff from pad 40 at Cape Canaveral Space Force Station occurred shortly before sunset at 5:01 p.m. EST (2201 UTC). The liftoff broke the pad turnaround record for SpaceX, following close on the heels of the NROL-77 mission, two days, two hours, 44 minutes and 55 seconds earlier. The previous record of two days seven hour 29 minutes and 10 seconds was set back in October.
The mission, Starlink 6-90, was the company’s 161st orbital launch of the year and its 118th flight in 2025 carrying Starlink satellites. It was also the 170th orbital flight for the company in the last 365 days.
SpaceX used the Falcon 9 first stage B1083 for the mission. It was the boosters 16th flight following launches of missions like IM-2, Polaris Dawn and Crew-8.
About 8.5 minutes after liftoff, B1083 landed on the drone ship, ‘Just Read the Instructions’ positioned in the Atlantic Ocean. It was the 137th landing on this vessel and the 549th booster landing for SpaceX to date.
Note: Mortgage rates are from MortgageNewsDaily.com and are for top tier scenarios.This morning rose, receiving a messenger from Sir G. Carteret and a letter from Mr. Coventry, one contrary to another, about our letter to my Lord Treasurer, at which I am troubled, but I went to Sir George, and being desirous to please both, I think I have found out a way to do it. So back to the office with Sir J. Minnes, in his coach, but so great a snow that we could hardly pass the streets. So we and Sir W. Batten to the office, and there did discourse of Mr. Creed’s accounts, and I fear it will be a good while before we shall go through them, and many things we meet with, all of difficulty. Then to the Dolphin, where Sir J. Minnes, Sir W. Batten, and I, did treat the Auditors of the Exchequer, Auditors Wood and Beale, and hither come Sir G. Carteret to us. We had a good dinner, cost us 5l. and 6s., whereof my share 26s., and after dinner did discourse of our salarys and other matters, which I think now they will allow.
Thence home, and there I found our new cook-mayde Susan come, who is recommended to us by my wife’s brother, for which I like her never the better, but being a good well-looked lass, I am willing to try, and Jane begins to take upon her as a chamber-mayde. So to the office, where late putting papers and my books and businesses in order, it being very cold, and so home to supper.
Links for you. Science:
Top Ivermectin Prescriber Now in CDC’s Second Highest Position
Louisiana health official who halted state vaccine campaign tapped as CDC’s No. 2
Old vaccine technology wasn’t safer—it was smallpox scabs
Why Won’t Nora The Leopard Seal Abandon Her Dead Pups?
While no one was watching: Tenuous status of CDC prion unit, risk of CWD to people worry scientists
Yes, COVID Vaccines Do Slow Transmission
Other:
The moment it all clicked for the Replacements — before it fell apart
To Deny The Role of Social Media In Propagating Misinformation is a Form of Germ Theory Denial, Not For Pathogens, But For Ideas
Shorter Days, Signs of Fatigue: Trump Faces Realities of Aging in Office
We Can’t Diet and Exercise Our Way Out of the Next Pandemic
Are Zohran Mamdani and Katie Wilson Democratic Socialists or FDR Democrats? They Are Both
D.C. police got me buzzed to help stop drunk drivers
Sean Duffy Serves Neoliberalism for Thanksgiving
‘I’m losing everything:’ Babson College freshman speaks out about her deportation to Honduras ahead of Thanksgiving
Oaths Of Office, And How Everyone Not Moving To Impeach Trump Is Violating Their Own
Trump Administration Says It Will Not Commemorate World AIDS Day
Indigenous actor Elaine Miles says ICE called her tribal ID ‘fake’
Trump’s EPA moves to abandon tough standards for deadly soot pollution
Trump says Haiti no longer meets requirements for TPS. Haitians have to leave
A notorious proto-MAGA crank may have one more comeback left in him
AAP: ‘Stop wasting government resources to amplify false claims’ about vaccines, autism
My Weekend With the Anti-Vaxxers
Green Card Interviews End in Handcuffs for Spouses of U.S. Citizens
MAGA influencer Benny Johnson’s job is to have no principles. Business is booming.
AI data center ‘frenzy’ is pushing up your electric bill — here’s why
Trump gets it wrong claiming no murders in DC for the last six months
Controversial Louisiana surgeon general tapped for CDC leadership role
AIA members believe James McCrery, Trump’s White House ballroom architect, may have violated the AIA’s Code of Ethics & Professional Conduct (actual letter here)
Tears flowed in S.F. courtroom as immigration judge was fired mid-hearing
X’s new location labels unmask users. Insiders say the idea was rejected for years.
Journalists Have Been Turning Into AI Slop For Years
The Replacements’ ‘Let It Be’ Reissue Catches the Minneapolis Indie Legends’ Courage at its Peak
Why MAGA Influencers Are Masquerading As Americans on X
The Last Americans Really Paying Taxes
AI isn’t replacing radiologists
Amid Rise of RFK Jr., Officials Waver on Drinking Water Fluoridation — Even in the State Where It Started
A trained observer of psychology, Alina had made careful note of the strategies that seemed to work for these gazelle-like young women. For example: “A man values a woman a lot more if she is constantly dragging presents out of him,” she said, “and he values her a lot more than the woman who says, ‘No, no, no, I don’t need anything.’ ” Alina cradled her teacup, half awestruck. “They get everything this way,” she said. “I think that these things should be explained to girls in childhood. It’s very important. And it doesn’t matter if the girl is smart or not, because you can have a girl who goes to university and gets a Ph.D. and is tremendously accomplished but then loses to these pretty young things who will take away her husband before she can count to three.”
Here is more from Julia Ioffe in The New Yorker, interesting throughout.
The post Claims about Russian women (and men) appeared first on Marginal REVOLUTION.
Available indicators suggest that economic activity has been expanding at a moderate pace. Job gains have slowed this year, and the unemployment rate has edged up through September. More recent indicators are consistent with these developments. Inflation has moved up since earlier in the year and remains somewhat elevated.
The Committee seeks to achieve maximum employment and inflation at the rate of 2 percent over the longer run. Uncertainty about the economic outlook remains elevated. The Committee is attentive to the risks to both sides of its dual mandate and judges that downside risks to employment rose in recent months.
In support of its goals and in light of the shift in the balance of risks, the Committee decided to lower the target range for the federal funds rate by 1/4 percentage point to 3-1/2 to 3‑3/4 percent. In considering the extent and timing of additional adjustments to the target range for the federal funds rate, the Committee will carefully assess incoming data, the evolving outlook, and the balance of risks. The Committee is strongly committed to supporting maximum employment and returning inflation to its 2 percent objective.
In assessing the appropriate stance of monetary policy, the Committee will continue to monitor the implications of incoming information for the economic outlook. The Committee would be prepared to adjust the stance of monetary policy as appropriate if risks emerge that could impede the attainment of the Committee's goals. The Committee's assessments will take into account a wide range of information, including readings on labor market conditions, inflation pressures and inflation expectations, and financial and international developments.
The Committee judges that reserve balances have declined to ample levels and will initiate purchases of shorter-term Treasury securities as needed to maintain an ample supply of reserves on an ongoing basis.
Voting for the monetary policy action were Jerome H. Powell, Chair; John C. Williams, Vice Chair; Michael S. Barr; Michelle W. Bowman; Susan M. Collins; Lisa D. Cook; Philip N. Jefferson; Alberto G. Musalem; and Christopher J. Waller. Voting against this action were Stephen I. Miran, who preferred to lower the target range for the federal funds rate by 1/2 percentage point at this meeting; and Austan D. Goolsbee and Jeffrey R. Schmid, who preferred no change to the target range for the federal funds rate at this meeting.
emphasis added
The FBI is warning of AI-assisted fake kidnapping scams:
Criminal actors typically will contact their victims through text message claiming they have kidnapped their loved one and demand a ransom be paid for their release. Oftentimes, the criminal actor will express significant claims of violence towards the loved one if the ransom is not paid immediately. The criminal actor will then send what appears to be a genuine photo or video of the victim’s loved one, which upon close inspection often reveals inaccuracies when compared to confirmed photos of the loved one. Examples of these inaccuracies include missing tattoos or scars and inaccurate body proportions. Criminal actors will sometimes purposefully send these photos using timed message features to limit the amount of time victims have to analyze the images.
Images, videos, audio: It can all be faked with AI. My guess is that this scam has a low probability of success, so criminals will be figuring out how to automate it.
| GDP projections of Federal Reserve Governors and Reserve Bank presidents, Change in Real GDP1 | ||||
|---|---|---|---|---|
| Projection Date | 2025 | 2026 | 2027 | 2028 |
| Dec 2025 | 1.6 to 1.8 | 2.1 to 2.5 | 1.9 to 2.3 | 1.8 to 2.1 |
| Sept 2025 | 1.4 to 1.7 | 1.7 to 2.1 | 1.8 to 2.0 | 1.7 to 2.0 |
| Unemployment projections of Federal Reserve Governors and Reserve Bank presidents, Unemployment Rate2 | ||||
|---|---|---|---|---|
| Projection Date | 2025 | 2026 | 2027 | 2028 |
| Dec 2025 | 4.5 to 4.6 | 4.3 to 4.4 | 4.2 to 4.3 | 4.0 to 4.3 |
| Sept 2025 | 4.4 to 4.5 | 4.4 to 4.5 | 4.2 to 4.4 | 4.0 to 4.3 |
| Inflation projections of Federal Reserve Governors and Reserve Bank presidents, PCE Inflation1 | ||||
|---|---|---|---|---|
| Projection Date | 2025 | 2026 | 2027 | 2028 |
| Dec 2025 | 2.8 to 2.9 | 2.3-2.5 | 2.0 to 2.2 | 2.0 |
| Sept 2025 | 2.9 to 3.0 | 2.4-2.7 | 2.0 to 2.2 | 2.0 |
| Core Inflation projections of Federal Reserve Governors and Reserve Bank presidents, Core Inflation1 | ||||
|---|---|---|---|---|
| Projection Date | 2025 | 2026 | 2027 | 2028 |
| Dec 2025 | 2.9 to 3.0 | 2.4-2.6 | 2.0 to 2.2 | 2.0 |
| Sept 2025 | 3.0 to 3.2 | 2.5-2.7 | 2.0 to 2.2 | 2.0 |
2. Historian sentences to ponder.
3. New paper on why female labor force participation is sometimes low.
4. On my chat with Dan Wang. And an Indian converses with Tyler Cowen at the Bangalore EV meet up.
5. The labor market is not yet very much rewarding claimed skills in AI?
6. Capping Swiss population at ten million polls reasonably well there (Bloomberg).
7. The best philosophy lectures on YouTube?
8. Javier-Milei-Institut in Deutschland gegründet.
The post Wednesday assorted links appeared first on Marginal REVOLUTION.
It's hard to effectively ban something that is legal in neighboring jurisdictions.
The NYT has the story:
"Abortion is legal in Illinois, but the state is surrounded by others that have largely banned the procedure in the three years since the Supreme Court overturned Roe v. Wade. As a result, Illinois now leads the nation in out-of-state abortion patients. Carbondale, a college town in Illinois’s southern tip within driving distance of 10 states with abortion bans, has become a major abortion hub.
"Last year three clinics in this city of 21,000 provided close to 11,000 abortions, almost all for women from other states. The numbers, provided by the clinics, account for nearly a third of all out-of-state abortions in Illinois.
,,,
"The clinics have already drawn protests as well as intervention efforts from Coalition Life, a St. Louis-based anti-abortion group that stations “sidewalk counselors” outside Carbondale’s clinics.
...
"In states without total bans, there were 1,038,100 clinician-provided abortions in 2024, according to the Guttmacher Institute, a research organization that supports abortion rights. The number includes 155,000 abortions for patients who had crossed state lines. Overall, the number of abortions in the country has slightly increased since the Dobbs decision, largely because of medication abortions."

NASA has lost contact with a Mars orbiter that has circled the planet for more than a decade, collecting science data and serving as a key communications relay.
The post NASA loses contact with MAVEN Mars orbiter appeared first on SpaceNews.

A space solar power startup has emerged from stealth after demonstrating a key technology for its plans to transmit power from space to the Earth.
The post Overview Energy demonstrates technologies for space solar power appeared first on SpaceNews.
Mortgage applications increased 4.8 percent from one week earlier, according to data from the Mortgage Bankers Association’s (MBA) Weekly Mortgage Applications Survey for the week ending December 5, 2025. Last week’s results included an adjustment for the Thanksgiving holiday.
The Market Composite Index, a measure of mortgage loan application volume, increased 4.8 percent on a seasonally adjusted basis from one week earlier. On an unadjusted basis, the Index increased 49 percent compared with the previous week. The Refinance Index increased 14 percent from the previous week and was 88 percent higher than the same week one year ago. The seasonally adjusted Purchase Index decreased 2 percent from one week earlier. The unadjusted Purchase Index increased 32 percent compared with the previous week and was 19 percent higher than the same week one year ago.
“Compared to the prior week’s data, which included an adjustment for the Thanksgiving holiday, mortgage application activity increased last week, driven by an uptick in refinance applications,” said Joel Kan, MBA’s Vice President and Deputy Chief Economist. “Conventional refinance applications were up almost 8 percent and government refinances were up 24 percent as the FHA rate dipped to its lowest level since September 2024. Conventional purchase applications were down for the week, but there was a 5 percent increase in FHA purchase applications as prospective homebuyers continue to seek lower downpayment loans. Overall purchase applications continued to run ahead of 2024’s pace as broader housing inventory and affordability conditions improve gradually.”
...
The average contract interest rate for 30-year fixed-rate mortgages with conforming loan balances ($806,500 or less) increased to 6.33 percent from 6.32 percent, with points increasing to 0.60 from 0.58 (including the origination fee) for 80 percent loan-to-value ratio (LTV) loans.
emphasis added
Click on graph for larger image.
On September 14, 2015, our first publicly-trusted certificate went live. [...] Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites.
Their growth rate and numbers are wild:
In March 2016, we issued our one millionth certificate. Just two years later, in September 2018, we were issuing a million certificates every day. In 2020 we reached a billion total certificates issued and as of late 2025 we’re frequently issuing ten million certificates per day.
According to their stats the amount of Firefox traffic protected by HTTPS doubled from 39% at the start of 2016 to ~80% today. I think it's difficult to over-estimate the impact Let's Encrypt has had on the security of the web.
Via Hacker News
- Devstral 2: SOTA open model for code agents with a fraction of the parameters of its competitors and achieving 72.2% on SWE-bench Verified.
- Up to 7x more cost-efficient than Claude Sonnet at real-world tasks.
Devstral 2 is a 123B model released under a janky license - it's "modified MIT" where the modification is:
You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. [...]
Mistral Small 2 is under a proper Apache 2 license with no weird strings attached. It's a 24B model which is 51.6GB on Hugging Face and should quantize to significantly less.
I tried out the larger model via my llm-mistral plugin like this:
llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"

For a ~120B model that one is pretty good!
Here's the same prompt with -m mistral/labs-devstral-small-2512 for the API hosted version of Devstral Small 2:

Again, a decent result given the small parameter size. For comparison, here's what I got for the 24B Mistral Small 3.2 earlier this year.
Tags: ai, generative-ai, llms, llm, mistral, pelican-riding-a-bicycle, llm-release, janky-licenses
I talked to Brendan Samek about Canada Spends, a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, sqlite-utils and pieces of LLM-powered PDF extraction.
Here's the video on YouTube.
Sections within that video:
Build Canada is a volunteer-driven non-profit that launched in February 2025 - here's some background information on the organization, which has a strong pro-entrepreneurship and pro-technology angle.
Canada Spends is their project to make Canadian government financial data more accessible and explorable. It includes a tax sources and sinks visualizer and a searchable database of government contracts, plus a collection of tools covering financial data from different levels of government.
The project maintains a Datasette instance at api.canadasbilding.com containing the data they have gathered and processed from multiple data sources - currently more than 2 million rows plus a combined search index across a denormalized copy of that data.

The highest quality government financial data comes from the audited financial statements that every Canadian government department is required to publish. As is so often the case with government data, these are usually published as PDFs.
Brendan has been using Gemini to help extract data from those PDFs. Since this is accounting data the numbers can be summed and cross-checked to help validate the LLM didn't make any obvious mistakes.
sqlite-utilsTags: data-journalism, politics, sqlite, youtube, datasette, sqlite-utils
It seems not:
De Loecker et al. (2020) (DEU) estimate that markups increased significantly in the United States from 1955 to 2016. We find this result is sensitive to unreported sample restrictions that drop 27% of the available observations. Applying the methodology as described in the article to the full sample, markup increases are more muted until late in the sample period, and are almost entirely driven by Finance and Insurance firms. If these firms are removed, markup increases are modest. We conclude that the DEU methodology and data, as they are described in the article, do not support the conclusion that broad-based increases in market power have occurred in recent decades.
That is from a recent NBER working paper by Benkard, Miller, and Yurukoglu.
The post Did market power go up so much? appeared first on Marginal REVOLUTION.
YouTube in particular, and sometimes X, are among the very best ways to learn about the world. To the extent that the law is effectively enforced, targeting YouTube will have a terrible effect on youth science, and the ability of young scientists and founders to get their projects off the ground will take a huge and possibly fatal hit. If you are only allowed to learn from the internet at age 16, you are probably not ready for marvelous achievements at age 18 or perhaps not even at 20. The country may become more mediocre.
The more serious concern is that this represents a major expansion of government control over tech services and also speech. Over time the government has to decide which are the approved tech companies and services and which are not. That becomes a politicized decision, as any chosen lines will be arbitrary, especially as online services evolve in their functionality. For instance, if excess video usage is what is problematic, it is possible for videos to be embedded more seamlessly into some future version of WhatsApp, an exempt service. Or Australian youth, even under the new law, will be able to access video on a laptop, simply by viewing it and not signing into their accounts…
I predict that either this law stops being effectively enforced, or the controls on companies and users have to become much, much tighter and more oppressive. In a large poll of Australian 9 to 16-year-olds, only 6 percent of them thought the new ban was going to work.
That is true for yet another reason. With gaming and messaging exempt from the ban, we can expect old-style “social media” to move into those areas. It already was the case that Fortnite and other gaming services served as social media networks, and that trend will be accelerated. Discord, for instance, is exempt from the ban, a glaring hole, and in a fast-changing market there probably will be some significant loopholes most of the time. For the ban to continue to work, it will have to spread. It is hard to think of an area of internet services that could not, in principle, serve social media–like functions, or produce the harms being attributed to online life. Regulation of artificial intelligence services is perhaps the next logical albeit misguided move here.
Who is in charge of the family anyway? If I have decided that my 15-year-old should be free to follow Magnus Carlsen on X and YouTube, should we have the boot of the state tell me this is forbidden? This is a big move in the direction of what Socrates advocated in The Republic, namely that the state takes priority over the family in deciding which stories can be told to the youth.
Over time, I expect this ban, again assuming it is kept and enforced, to become one of the biggest free speech restrictions on the internet. It is the incentive of government agencies to boost their budgets, spread their mandates, and enforce their dictates. What starts with a nation’s youth rarely ends there.
You might think that Australia’s regulatory guardians can be trusted to uphold free speech ideals, but has that been the case to date? Under Australian law, it is permissible to restrict free speech for reasons of public order, national security, and protection from harm. That includes limits on “hate speech,” prompting Elon Musk to exaggerate and call the country fascist. Nonetheless the country does not have anything comparable to America’s First Amendment free speech protections.
So why should we empower Australian regulators and restrict free speech further?
It is very defensible to worry that your kid is on his or her phone too much. Furthermore, school bans or limits on smartphone usage are likely to bring some measurable but small gains.
But if you think a massive expansion of state authority over online content is the answer, you ought to know that the associated gains from that decision will at best be modest. You will not be saving civilization or our youth; rather you will be joining the ever-growing parade against free speech.
Recommended, and in this recent piece Ben Yeoh surveys the research-based literature on social media and teen harm.
The post Australia should not ban under-16s from internet sites appeared first on Marginal REVOLUTION.
Note: Mortgage rates are from MortgageNewsDaily.com and are for top tier scenarios.
Update Dec. 10, 9 a.m. EST (1400 UTC): SpaceX confirms deployment of the 27 Starlink satellites.
SpaceX executed a pre-dawn launch from Vandenberg Space Force Base on Wednesday morning. The flight was the 160th of a Falcon 9 rocket so far in 2025.
The Starlink 15-11 mission added another 27 broadband internet satellites to its growing megaconstellation in low Earth orbit. Liftoff from Space Launch Complex 4 East happened at 3:40 a.m. PST (6:40 a.m. EST / 1140 UTC).
This was the third fastest turnaround of SLC-4E to date.
SpaceX launched the mission using the Falcon 9 booster, 1082. This was its 18th flight following missions, like USSF-62, NROL-145 and OneWeb Launch 20.
About 8.5 minutes after liftoff, B1082 landed on the drone ship, ‘Of Course I Still Love You,’ which was the 169th landing on this vessel and 548th booster landing to date.
Deployment of 27 @Starlink satellites confirmed pic.twitter.com/3ZNICuEjIO
— SpaceX (@SpaceX) December 10, 2025