Today I Learned

July 2025

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

This caused a huge splash so you may have already read it. I think it's generally full of excellent insights. There's been a lot of these types of papers experimenting on LLMs to really dissect them and try and understand how they work scientifically.

I am by no means going to claim to understand LLMs emergent behaviour. However, one thing I'm starting to consider is that this may just be what a type of lossy text compression looks like. I'll probably end up posting the keynote at some point, but I remember John Carmack once saying something to the effect of, "Procedural generation is just glorified compression." When you look at procedural generation from the perspective of a decompresser, they look very similar.

Think of it this way, a compressor is a program that takes some input and creates a different program which, when run through an interpreter (the decompresser), yields a result with desired properties mimicking the original data. That last bit may sound weird, but only if you focus on lossless compression. Lossy compression is where it really starts getting interesting. Lossless has a fairly high lower bound. Lossy, now that's an area where we get subjective and weird.

For example, keep dialing down the quality on the compression. When you do, you start to get less faithful artifacts, things that are derived from the original, that in some way bear statistical relation to the original, but who's form after decompression starts to take on a character of their own. Now imagine that an artist is working on the decompresser so that these artifacts are actually somewhat pleasing and desirable instead of the more typical ugly compression artifacts.

That in mind, doesn't procedural generation sound a lot like substituting the compressor for a human? They take a bunch of input data to craft a program which, when executed, yields a result with desired properties mimicking the original data. Like a level generator, the human is removing all the redundant level geometries and instead creating a statistical model that encodes all this redundant information. Now instead of decompressing to get all possible geometries at once, you instead sample, and decompress only part of the space.

With that in mind, LLMs start to sound an awful lot like a compression algorithm. You generate a large model who's data consists of all this statistical similarity from the original, and in the output you sample the space starting at a point (the prompt) and expand to generate a position in the space of all compressed inputs.

That would explain why they have problems counting the number of "R"s in strawberry or solving the Tower of Hanoi. Nobody's written the solutions in the mountains of input data. Why can they now count the number of letters? The companies behind these are doing all sorts of synthetic input data generation. Farms of people creating questions for answers Jeopardy style. Writing programs to procedurally generate more training data helps bulk out the relatively limited human generated work and add examples of things humans haven't bothered to publish.

This doesn't fully account for everything these models seem to be able to do, but the research on these models also seems somewhat limited still. Science is a very slow process of trying to constantly falsify information. It's trivial to declare them god or slight of hand, but it takes time to exhaustively search for a way they can't possibly be and then conclude that you've searched long and hard enough that it's unlikely they aren't. But you also keep an eye out for anyone who can figure out a way to show they aren't, just in case.

Regardless of what my understanding seems to suggest, my thanks to the researchers here for spending the time to better understand what's going on and give us some great insights into this emergent system.

Lecture Friday: Burning Down the House: Carbon Politics, American Power, and the Almighty Dollar

More understanding for the seemingly sudden shifts and what they could mean. Mark Blyth has had a few works that seem to be slightly ahead of the mainstream, so I tend to really pay attention to what he's thinking about and why he thinks these things matter. Anyone telling you what's in store is very likely to get specifics wrong, but general trajectories are often easier to get right and prep for even if they turn out to not matter.

Also, this is probably one of the first question and answer periods for a talk that had any value what so ever. Some people in the audience there really trying to better understand these ideas and play with them together through dialogue. Not just grandstanding or asking about irrelevant things the person in the audience knows about.

I love when you couple this with the video friendlyjordies did about how Australia, having reelected their Labor government, is now perfectly set to do what America couldn't. Australia has Future Made in Australia, which has a lot of the same investment public-private partnerships the now abandoned IRA had. With the end of one, billions, if not trillions in private investment are potentially up for grabs. If not Canada, at least our sister state has a shot at the prize.

You Can Choose Tools That Make You Happy

https://borretti.me/article/you-can-choose-tools-that-make-you-happy

I maintain a website that I've built entirely by hand, being read by basically nobody, for no other reason than the joy it brings me and maybe the vein hope I can inspire someone to see what I see, or at the very least, consider reconsidering.

That said, yeah, get weird. Never fool yourself, you're the easiest to fool, but also never give in and never give up. As Bernard Shaw wrote, The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

Nobody would have created many of the incredibly cool things we have if they'd stuck to what was easy and what already worked. They thought they saw what could work better and went for it. Many failed, some were right, a few even paid a high price for their dream. Most of the things around you are mealy an accident of history and incredibly hard work, not some divine truth. Reinvent things now and again for no other reason than you can. Do unconventional things for unconventional outcomes. To see what's possible, and to see if you really can make things better. Build your dreams, and maybe change the world. Probably not, but the avalanche starts with a single snowflake.

Wikipedia: Norman Borlaug

https://en.wikipedia.org/wiki/Norman_Borlaug

Another one of those people I brought up in a conversation that I think more people should know about. There may not be anyone less well known who's had as big an impact as Norman Borlaug. If you're alive today, that's partly the fault of Norman Borlaug. Truly one of the greats.

A Non-anthropomorphized View of LLMs

https://addxorrol.blogspot.com/2025/07/a-non-anthropomorphized-view-of-llms.html

Less a learning and more a plea. It's so cringe listening to people talk about statistics like they're omnipotent, going to wipe out all life on earth, or obsolete the terrain for a map. Conversations become much more productive when sophists stop trying to bear witness to the next coming in the output of gradient descent. It's just a tool. It's not magic. Yes, this has significantly surpassed the state-of-the-art NLP models of just a few years ago. No, you shouldn't be using it in lieu of actually learning how statistics work and then picking the right model for the job.

As someone who's been using computational statistics and machine learning for about a decade, I'd years ago written off large parts of the literature as mostly non-replicable p-hacking by people running the software equivalent of the egg-drop experiment. So many "advances" boil down to either moving the goal post by making up a benchmark you're conveniently the best at, overfitting a model and declaring it state-of-the-art, Frankensteining an ensemble for a one percent improvement at ten times the cost, or one of a few other shenanigans researchers have been routinely pulling since AlexNet really shook things up.

As it turns out, I was right that Attention Is All You Need was, in fact, game changing. The game I wasn't really paying attention to was BERT, a model so bloated and of such limited utility, I thought it was mostly a joke at the time. Turns out, if you just keep going with the joke, well past reason, until your model begins to use so much energy that it competes with the fossil fuel industry to see who can cook the planet faster, there's some interesting and useful properties to the joke.

Just please stop telling me it's thinking or alive or concious, or any other biological adjective. We still have neither a scientific nor philosophical understanding of what thinking even is. It's just statistically likely unstructured data. Is it useful? Yeah, in some applications. Is it reliable? It's reliable enough for some applications. Is it efficient? Mostly no, but some applications don't yet seem to have efficient alternatives. Using it for those we do, seems pretty foolish given the situation.

Just try and stop it with the pareidolia.

Handles Are The Better Pointers

https://floooh.github.io/2018/06/17/handles-vs-pointers.html

Great breakdown on how to handle memory without uncontrolled malloc()/new everywhere. Instead, create a manager for a subsystem that has its own memory pool. It can then use array indices (either on the pool or a lookup structure) which it issues and accepts through its API.

I'd heard about the technique before elsewhere but this breaks it down and gives some great examples. You've already been doing this with files, windows, and other resources the operating system provides you, so you may even understand the idea intuitively. Either way, a great piece on the technique.

Linux Is Dead, Long-Live Docker Monoculture

https://antranigv.am/posts/2021/08/2021-08-13-13-37/

I'm the type of person to go listen to a live symphony orchestra once every few years. I went to a performance of a famous classic symphony that was preceded by the premier of a brand new symphony. A brand new symphony, delightful, and all I could think while listening was, oh, it's just a movie score.

When the only work for composers is scoring movies, new symphonies are going to sound a lot like movie scores.

When the only work for developers is SaaS, software's going to all start to look like web shit.

So now the Windows start menu is written in React, otherwise promising image viewers require setting up a database and docker, and developers expect you to install their applications by intentionally opening yourself to remote code execution.

Unless a market and/or business model is soon found to bring about a renascence of desktop application development, I'm finding I have to agree with Casey Muratori that gaming really will become the Irish Monasteries of software development.

You MUST listen to RFC 2119

https://ericwbailey.website/published/you-must-listen-to-rfc-2119/

This is pretty funny. I love when someone pays an artist just to bring something fun into the world. I do it now and again but we can always use more of that.

Art and fun aside, if you don't know RFC 2199, now's your chance to learn one of the most influential RFCs ever written.

Already know that one by heart? Do you know about the related RFC 6919?

If you already know both of those, well then perhaps I could interest you in RFC 3339, what people usually mean when they say ISO 8601.

I could go on, but I won't.