Agile in the Real World by Daragh Farrell: I stumbled into GPT-5 for coding a few months ago, almost by accident...

I was changing teams, lost access to the Anthropic models I’d been using, and needed to debug a really awkward Java 11 issue in an old codebase. JaCoCo was misbehaving in a way that only showed up under a very specific combination of test runners and build flags. Classic “this will ruin your Friday evening” bug.

Anthropic had taken a swing at it before and missed.

Out of necessity, I pointed GPT-5 at the problem instead.

It didn’t just guess – it walked the classloaders, build config, and JaCoCo setup, explained exactly why coverage was broken, and proposed a small, precise fix that actually worked. No yak-shaving, no vague “try cleaning your build” advice. Just: here’s the edge case, here’s why it fails, here’s the patch.

That was the moment I realised: okay, this isn’t just another autocomplete toy. This is a different tier of “pair programmer”.

Using different models for different jobs

Since then I’ve been leaning hard into the newer GPT coding models, and I’m genuinely impressed by how effective they are when you use them deliberately, instead of treating them as one big magic box.

Roughly, my workflow now looks like:

Thinking models for planning
I use the “thinking” models as a kind of systems architect + product partner:
- clarify requirements
- map the system
- design interfaces
- plan test strategies
- generate task lists and prompts
Coding models for execution
Once the plan is solid, I hand off to a coding-specialist model to:
- scaffold services, tests, and pipelines
- refactor existing modules
- implement specific stories or bug fixes
- optimise hot paths

The key is that I don’t ask the coding model to invent the plan and write the code and judge its own work. I let “thinking” models do the heavy lifting on context and intent, then feed that into the coding model as a very explicit brief.

Human-in-the-loop is non-negotiable: I still own the architecture, the trade-offs, and the final review. But the amount of cognitive load that gets lifted is enormous.

ClubHub as a sandbox

I’ve been using my ClubHub side project as a testbed for this style of work.

ClubHub (a platform for small clubs and memberships) has been a nice microcosm of real-world mess:

backend APIs, payments, permissions
frontend UX experiments
CI/CD pipelines and quality gates
integration tests and data migrations

It’s perfect for exploring how far I can push “thinking model → prompt → coding model → CI/CD” as a loop, without sacrificing quality or security.

And honestly? It’s already changed how I think about designing systems. I spend more time shaping intent and constraints, and less time fighting scaffolding.

The future of IDEs (if we’re brave enough)

Right now, my loop looks something like:

web chat (planning) → Cursor IDE (editing) → command line / Codex (automation)

It works… but it’s clunky. It feels like 1996 again: hacking Java in vi, compiling on the command line, and wiring everything together with shell scripts.

I don’t think that’s where we should stop.

I think the next-generation IDE should look more like a rich visual design + intent environment, backed by thinking models that:

interpret domain models and UX flows
generate and refine backlogs from high-level designs
reason about architecture, quality, and security constraints
break work down into well-formed tasks

…and then hand those tasks off to workbots (coding models, test agents, pipeline agents) that:

implement changes in small, reviewable increments
auto-wire tests, observability, and security checks
open merge requests with clear diffs and rationale
learn from your project’s actual style and constraints over time

All of this with the human firmly in the loop:

you sketch, shape, and approve
the system explains its reasoning
you can inspect, override, or roll back at any point

That’s the opposite of “AI replaces developers”. It’s “AI makes the right sort of developer work cheaper, faster, and more humane.”

A call to action for IDE builders

To anyone working on IDEs, editors, and dev tools:

Please step up to this.

The models are already good enough to:

find and fix gnarly bugs in legacy code (my JaCoCo moment)
reason across config, build, infra, and app code
act as genuine collaborators, not just autocomplete engines

What we’re missing is the experience layer:

richer visual design and intent capture
first-class support for “thinking → coding → CI/CD” workflows
workbot orchestration with clear, opinionated human-in-the-loop patterns

Right now, a lot of us are duct-taping web chat, Cursor, and command-line scripts together to simulate what the IDE could (and should) be doing for us.

It’s powerful, but it’s clumsy.

It feels exactly like the early days of Java before modern IDEs caught up — when the language was clearly the future, but the tooling hadn’t been invented yet.

We don’t need another slightly-smarter autocomplete bar.

We need IDEs that treat intent, flow, and collaboration with models as first-class citizens.

If you’re building in this space, I’d love to hear what you’re trying. And if you’re a dev experimenting with thinking-model + coding-model workflows, share your loops — I suspect we’re all building the same future from slightly different angles.

Agile in the Real World by Daragh Farrell

Saturday, November 29, 2025

I stumbled into GPT-5 for coding a few months ago, almost by accident...

Using different models for different jobs

ClubHub as a sandbox

The future of IDEs (if we’re brave enough)

A call to action for IDE builders

No comments:

🐌 From Codex CLI to OpenAI API: Building a Smarter AI Worker in 24 Hours