I was changing teams, lost access to the Anthropic models I’d been using, and needed to debug a really awkward Java 11 issue in an old codebase. JaCoCo was misbehaving in a way that only showed up under a very specific combination of test runners and build flags. Classic “this will ruin your Friday evening” bug.
Anthropic had taken a swing at it before and missed.
Out of necessity, I pointed GPT-5 at the problem instead.
It didn’t just guess – it walked the classloaders, build config, and JaCoCo setup, explained exactly why coverage was broken, and proposed a small, precise fix that actually worked. No yak-shaving, no vague “try cleaning your build” advice. Just: here’s the edge case, here’s why it fails, here’s the patch.
That was the moment I realised: okay, this isn’t just another autocomplete toy. This is a different tier of “pair programmer”.
Using different models for different jobs
Since then I’ve been leaning hard into the newer GPT coding models, and I’m genuinely impressed by how effective they are when you use them deliberately, instead of treating them as one big magic box.
Roughly, my workflow now looks like:
-
Thinking models for planning
I use the “thinking” models as a kind of systems architect + product partner:-
clarify requirements
-
map the system
-
design interfaces
-
plan test strategies
-
generate task lists and prompts
-
-
Coding models for execution
Once the plan is solid, I hand off to a coding-specialist model to:-
scaffold services, tests, and pipelines
-
refactor existing modules
-
implement specific stories or bug fixes
-
optimise hot paths
-
The key is that I don’t ask the coding model to invent the plan and write the code and judge its own work. I let “thinking” models do the heavy lifting on context and intent, then feed that into the coding model as a very explicit brief.
Human-in-the-loop is non-negotiable: I still own the architecture, the trade-offs, and the final review. But the amount of cognitive load that gets lifted is enormous.
ClubHub as a sandbox
I’ve been using my ClubHub side project as a testbed for this style of work.
ClubHub (a platform for small clubs and memberships) has been a nice microcosm of real-world mess:
-
backend APIs, payments, permissions
-
frontend UX experiments
-
CI/CD pipelines and quality gates
-
integration tests and data migrations
It’s perfect for exploring how far I can push “thinking model → prompt → coding model → CI/CD” as a loop, without sacrificing quality or security.
And honestly? It’s already changed how I think about designing systems. I spend more time shaping intent and constraints, and less time fighting scaffolding.
The future of IDEs (if we’re brave enough)
Right now, my loop looks something like:
web chat (planning) → Cursor IDE (editing) → command line / Codex (automation)
It works… but it’s clunky. It feels like 1996 again: hacking Java in vi, compiling on the command line, and wiring everything together with shell scripts.
I don’t think that’s where we should stop.
I think the next-generation IDE should look more like a rich visual design + intent environment, backed by thinking models that:
-
interpret domain models and UX flows
-
generate and refine backlogs from high-level designs
-
reason about architecture, quality, and security constraints
-
break work down into well-formed tasks
…and then hand those tasks off to workbots (coding models, test agents, pipeline agents) that:
-
implement changes in small, reviewable increments
-
auto-wire tests, observability, and security checks
-
open merge requests with clear diffs and rationale
-
learn from your project’s actual style and constraints over time
All of this with the human firmly in the loop:
-
you sketch, shape, and approve
-
the system explains its reasoning
-
you can inspect, override, or roll back at any point
That’s the opposite of “AI replaces developers”. It’s “AI makes the right sort of developer work cheaper, faster, and more humane.”
A call to action for IDE builders
To anyone working on IDEs, editors, and dev tools:
Please step up to this.
The models are already good enough to:
-
find and fix gnarly bugs in legacy code (my JaCoCo moment)
-
reason across config, build, infra, and app code
-
act as genuine collaborators, not just autocomplete engines
What we’re missing is the experience layer:
-
richer visual design and intent capture
-
first-class support for “thinking → coding → CI/CD” workflows
-
workbot orchestration with clear, opinionated human-in-the-loop patterns
Right now, a lot of us are duct-taping web chat, Cursor, and command-line scripts together to simulate what the IDE could (and should) be doing for us.
It’s powerful, but it’s clumsy.
It feels exactly like the early days of Java before modern IDEs caught up — when the language was clearly the future, but the tooling hadn’t been invented yet.
We don’t need another slightly-smarter autocomplete bar.
We need IDEs that treat intent, flow, and collaboration with models as first-class citizens.
If you’re building in this space, I’d love to hear what you’re trying. And if you’re a dev experimenting with thinking-model + coding-model workflows, share your loops — I suspect we’re all building the same future from slightly different angles.

No comments:
Post a Comment