The day I started believing

I’ve used AI coding assistants since the first day they were genuinely worth using, and not as a curiosity or for the demo. I’ve used them daily, on real work, on systems other people depend on.

For a long time I was fairly sure I knew exactly what they were. They were a very good autocomplete, a tireless junior who never quite reads the style guide, a way to skip the boring parts of the job. Useful, occasionally astonishing, but bounded. I had a tidy mental model of where the ceiling sat, and I was comfortable working inside it.

That model is gone now, and what follows is the story of the day it finally broke.

The boilerplate years

Late last summer, around August and September, I fell in love with how much sheer tedium these tools could erase. Claude Code had gone generally available in May, OpenAI’s Codex had landed around the same time, Claude Opus 4.1 arrived in August, and then Sonnet 4.5 in September, billed as “the best coding model in the world.” Suddenly the scaffolding of a project, the wiring, the glue, the fortieth CRUD endpoint that looks exactly like the other thirty-nine, all of it could be generated faster than I could type the imports.

It was wonderful, and it saved me real hours. But the hours it saved were hours on work I already knew how to do, and that distinction turns out to matter more than it sounds. Boilerplate is, almost by definition, the stuff that’s already a solved problem in your head, so watching a machine produce it quickly is a relief rather than a revelation. I was delighted and unmoved at the same time.

What I actually believed

Underneath the delight I was holding on to a thesis, and it was a skeptical one. I believed these models were fragile in two specific ways.

The first was context. They forgot things: they lost the thread on anything large, drifted halfway through a refactor, and contradicted decisions they’d made twenty minutes earlier. Simon Willison was already busy naming the failure modes, context rot and context engineering, the whole emerging discipline of keeping the window honest, and to me all of that read as a hard ceiling on anything that didn’t fit inside one tidy head.

The second was conventions. If you didn’t invest serious time up front spelling out how you build things, the model would invent its own conventions, every single time, fresh ones in every session, cheerfully inconsistent with the last. Without a heavy scaffolding of instructions you ended up with a codebase that argued with itself.

So I drew what felt like the obvious conclusion, which was that we were plateauing. Plenty of people were saying the same thing. The tools would stay genuinely useful for small, well-modularized, well-behaved projects, and they’d flounder on anything big, gnarly, or load-bearing. The interesting work, the architecture and the judgment and the hard-won scars, would stay ours.

And here’s the slightly smug part I’m not proud of: I quietly decided this was job security. I figured those of us with decades of systems behind us were safe precisely because of all this. The future I pictured was a long career spent cleaning up after the machines, retrofitting taste into convention-less code and bolting real architecture onto things that had been generated without any. Twenty-five years of hard-won battles, monetized forever as cleanup duty. It was a comfortable theory, and it was also completely wrong.

November

The thesis broke in November, with Claude Opus 4.5, released on the 24th. Anthropic’s framing was the usual superlative, “the best model in the world for coding, agents, and computer use,” and I’d long since learned to discount that kind of language. What I couldn’t discount was the behavior. The change wasn’t really a benchmark; it was that the model could now hold a long, messy, multi-hour problem in its head and stay coherent the whole way through. It excelled, in their words, at “long-horizon, autonomous tasks,” and for once the marketing matched what I was seeing.

Both of the ceilings I’d been so sure about had quietly risen. Context hadn’t disappeared as a problem, but it had become something you manage rather than something you fear. Conventions had stopped being a fight: as I got better at writing my instructions and the models got better at honoring them, the thing started building the way I’d build, instead of improvising a fresh style every time we sat down.

Willison, as it happens, had the better frame for what was going on, and I’d been standing on the wrong side of it. He called it vibe engineering: “the more skills and experience you have as a software engineer the faster and better the results you can get from working with LLMs and coding agents.” The leverage wasn’t flowing to the people who didn’t need to understand the code, it was flowing to the people who understood it best. By the end of the year he’d even put a number on the trajectory I’d dismissed as a plateau: the length of task these systems can take on was doubling roughly every seven months. I hadn’t been looking at a ceiling at all. I’d been looking up a slope, from somewhere near the bottom.

The lengths they went

Let me tell you what finally convinced me, because it wasn’t a benchmark and it wasn’t a toy side project.

I wanted my home network off my ISP’s router and onto hardware I actually control, talking to the fibre directly. Anyone who’s tried this knows the catch: the identity the network checks for, the credentials that authorize your line, are deliberately locked away inside the ISP’s box. They aren’t printed on any screen, and the vendor’s entire design intent is that you never reach them.

This is embedded reverse-engineering, exactly the kind of work I spent years doing by hand. It meant decrypting a proprietary config format, digging a hidden administrator account out of the decoded blob, and climbing a stack of firewall and privilege gates into a deliberately crippled shell, then finding the one undocumented escalation that drops the cage and transplanting a device’s burnt-in identity onto a different one, byte by byte, until the network couldn’t tell anything had changed.

I read the writeups from the handful of people who’d pulled this off before me, and on current hardware the documented routes came down to two. You could physically open the case and solder a serial console onto the board, crank the boot log up to its most verbose level, and fish the password out of the device’s startup chatter as it powered on. Or you could lean on an exploit that only works against firmware so old it has all but vanished from the boxes ISPs actually hand out today. The pieces of a software-only path against current firmware did exist, but they were scattered in fragments across forum threads and bug trackers in three languages, and none of them added up to a recipe for the box sitting in my living room.

Claude and Codex assembled that recipe and then ran it, and I don’t mean they helped me brainstorm. They did the work. They read the firmware, cross-referenced those scattered forum posts across three languages, and wrote the Python tooling themselves. They decrypted the config format and pulled the hidden admin account out of the decoded data. They flipped the right internal flags and added the single firewall rule the box was missing, so its own crippled shell would finally answer. From there they coaxed the device into a factory “manufacture mode” that quietly switches off its own command cage, and once inside, cloning the identity onto the replacement unit came down to a short, deliberate sequence of writes. At every step they reasoned through why a given lock existed and which one to attack next, and they walked the whole chain to the end. When my ISP later moved my line onto entirely different hardware and broke all of it, we rebuilt the fix from scratch in about an hour.

Most of this was done with Opus 4.5, a model that is already, by the absurd clock we’re now running on, a few generations old.

I’ve managed infrastructure serving hundreds of millions of people, so I know what this class of work actually costs. I would have bled over it for months, in stolen evenings, fighting hardware that fights back, and that’s the honest comparison as well as the generous one. The truer answer is that I never would have started, because I didn’t have the months to spend. The thing that changed isn’t that the machine did it faster than I could, it’s that the machine did something I’d quietly filed away under never.

And the whole industry was arriving at the same realization in parallel. The cloud vendors were already shipping agents that resolve incidents and autonomously mitigate production problems. The thing I’d boxed into “writes your boilerplate” was quietly walking into the room where I keep my hardest work.

What I believe now

Since November I keep catching myself doing something I never used to do, which is starting things. Projects I’d carried around for years as someday-maybe, the ones gated not on skill but on time, are actually getting built. Not perfectly, and not without me in the loop, steering and correcting and bringing the taste and the scars. The senior judgment didn’t become worthless, it became the multiplier, and that part of my old theory survived intact. What died was the assumption that the judgment would be spent on cleanup, because it turns out this isn’t cleanup at all. It’s reach.

I used to think these tools guaranteed my job by guaranteeing a mess that only I could fix, and I don’t think that anymore. I think they change what a person like me is actually for. In my last post I wrote that AI doesn’t just accelerate workflows, it changes the primitives those workflows rely on, and I believe that more now than when I wrote it.

There’s a second half to all of this. If one experienced person’s reach can expand this much, then the way teams are built around that person has to change too: how we divide the work, how we review it, what seniority even buys. I have some thoughts on that, but they aren’t ready yet.

So for now I’ll just say the thing I spent the better part of a year refusing to say. I started believing.

Part two, on how teams have to change, is coming.