Saturday, November 29, 2025

I stumbled into GPT-5 for coding a few months ago, almost by accident...


I was changing teams, lost access to the Anthropic models I’d been using, and needed to debug a really awkward Java 11 issue in an old codebase. JaCoCo was misbehaving in a way that only showed up under a very specific combination of test runners and build flags. Classic “this will ruin your Friday evening” bug.

Anthropic had taken a swing at it before and missed.

Out of necessity, I pointed GPT-5 at the problem instead.

It didn’t just guess – it walked the classloaders, build config, and JaCoCo setup, explained exactly why coverage was broken, and proposed a small, precise fix that actually worked. No yak-shaving, no vague “try cleaning your build” advice. Just: here’s the edge case, here’s why it fails, here’s the patch.

That was the moment I realised: okay, this isn’t just another autocomplete toy. This is a different tier of “pair programmer”.


Using different models for different jobs

Since then I’ve been leaning hard into the newer GPT coding models, and I’m genuinely impressed by how effective they are when you use them deliberately, instead of treating them as one big magic box.

Roughly, my workflow now looks like:

  • Thinking models for planning
    I use the “thinking” models as a kind of systems architect + product partner:

    • clarify requirements

    • map the system

    • design interfaces

    • plan test strategies

    • generate task lists and prompts

  • Coding models for execution
    Once the plan is solid, I hand off to a coding-specialist model to:

    • scaffold services, tests, and pipelines

    • refactor existing modules

    • implement specific stories or bug fixes

    • optimise hot paths

The key is that I don’t ask the coding model to invent the plan and write the code and judge its own work. I let “thinking” models do the heavy lifting on context and intent, then feed that into the coding model as a very explicit brief.

Human-in-the-loop is non-negotiable: I still own the architecture, the trade-offs, and the final review. But the amount of cognitive load that gets lifted is enormous.


ClubHub as a sandbox

I’ve been using my ClubHub side project as a testbed for this style of work.

ClubHub (a platform for small clubs and memberships) has been a nice microcosm of real-world mess:

  • backend APIs, payments, permissions

  • frontend UX experiments

  • CI/CD pipelines and quality gates

  • integration tests and data migrations

It’s perfect for exploring how far I can push “thinking model → prompt → coding model → CI/CD” as a loop, without sacrificing quality or security.

And honestly? It’s already changed how I think about designing systems. I spend more time shaping intent and constraints, and less time fighting scaffolding.


The future of IDEs (if we’re brave enough)

Right now, my loop looks something like:

web chat (planning) → Cursor IDE (editing) → command line / Codex (automation)

It works… but it’s clunky. It feels like 1996 again: hacking Java in vi, compiling on the command line, and wiring everything together with shell scripts.

I don’t think that’s where we should stop.

I think the next-generation IDE should look more like a rich visual design + intent environment, backed by thinking models that:

  • interpret domain models and UX flows

  • generate and refine backlogs from high-level designs

  • reason about architecture, quality, and security constraints

  • break work down into well-formed tasks

…and then hand those tasks off to workbots (coding models, test agents, pipeline agents) that:

  • implement changes in small, reviewable increments

  • auto-wire tests, observability, and security checks

  • open merge requests with clear diffs and rationale

  • learn from your project’s actual style and constraints over time

All of this with the human firmly in the loop:

  • you sketch, shape, and approve

  • the system explains its reasoning

  • you can inspect, override, or roll back at any point

That’s the opposite of “AI replaces developers”. It’s “AI makes the right sort of developer work cheaper, faster, and more humane.”


A call to action for IDE builders

To anyone working on IDEs, editors, and dev tools:

Please step up to this.

The models are already good enough to:

  • find and fix gnarly bugs in legacy code (my JaCoCo moment)

  • reason across config, build, infra, and app code

  • act as genuine collaborators, not just autocomplete engines

What we’re missing is the experience layer:

  • richer visual design and intent capture

  • first-class support for “thinking → coding → CI/CD” workflows

  • workbot orchestration with clear, opinionated human-in-the-loop patterns

Right now, a lot of us are duct-taping web chat, Cursor, and command-line scripts together to simulate what the IDE could (and should) be doing for us.

It’s powerful, but it’s clumsy.

It feels exactly like the early days of Java before modern IDEs caught up — when the language was clearly the future, but the tooling hadn’t been invented yet.

We don’t need another slightly-smarter autocomplete bar.

We need IDEs that treat intent, flow, and collaboration with models as first-class citizens.

If you’re building in this space, I’d love to hear what you’re trying. And if you’re a dev experimenting with thinking-model + coding-model workflows, share your loops — I suspect we’re all building the same future from slightly different angles.

Turning a One-Off Prompt into a Repeatable Codex Workflow (using Cursor + GPT-5.1 / GPT-5.1-codex)



I went looking for “just a prompt” and accidentally built a tiny AI coworker.

In Cursor, using GPT-5.1 / GPT-5.1-codex, I asked for a stand-alone prompt to help a Codex-style dev improve test coverage for my ClubHub project.

Instead of spitting out a wall of prose, it designed a repeatable workflow: a small Bash launcher that boots Codex into “ClubHub test coverage mode”, wired to project context and a focused coverage brief.

This post walks through that pattern and how you can steal it for your own repo.


The pattern: “Start Codex in project mode for a specific task”

Here’s the core idea the model produced:

  1. Keep your project context in one file
    prompts/system-project-context.md – tech stack, conventions, non-negotiables.

  2. Keep your task brief in another
    e.g. prompts/improve-test-coverage.md – current coverage, targets, files to touch.

  3. Use a tiny script to stitch them together into a single prompt and launch Codex.

The script it generated, codex-test-coverage.sh, does exactly that:

#!/usr/bin/env bash set -euo pipefail # Resolve repo root (works from subdirs too) REPO_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)" CONTEXT_FILE="$REPO_ROOT/prompts/system-project-context.md" COVERAGE_PROMPT="$REPO_ROOT/prompts/improve-test-coverage.md"

It:

  • Finds the repo root (so you can run it from any subdirectory)

  • Points at:

    • a system context file for the project

    • a coverage-focused task file

Then it validates that both exist:

if [ ! -f "$CONTEXT_FILE" ]; then echo "❌ Context file not found: $CONTEXT_FILE" >&2 echo "Create prompts/system-project-context.md first." >&2 exit 1 fi if [ ! -f "$COVERAGE_PROMPT" ]; then echo "❌ Test coverage prompt not found: $COVERAGE_PROMPT" >&2 exit 1 fi

And finally, it builds the combined prompt and calls codex with it:

PROMPT="$(cat "$CONTEXT_FILE")" PROMPT="$PROMPT --- You are now in the ClubHub project session. General rules: - Be brief and opinionated. Prefer 1–2 strong options over long lists. - Assume: Go (latest stable), PostgreSQL, React + TypeScript + Vite, Tailwind CSS, GitLab CI. - Respect the multi-tenant architecture: all queries must filter by club_id. - Keep changes PR-sized and coherent; update or suggest tests when behaviour changes. - Never hard-code secrets or URLs; use config/env vars. - For CI, keep .gitlab-ci.yml valid and non-blocking for AI analysis. - Mobile-first: all UI should work well on phones. --- $(cat "$COVERAGE_PROMPT") " # Launch Codex with the combined prompt codex "$PROMPT"

That’s it. A one-liner CLI:

./codex-test-coverage.sh

…now starts an opinionated, project-aware AI dev session focused purely on test coverage for ClubHub.


The coverage brief: giving Codex something real to chew on

The second piece is the coverage prompt itself: prompts/improve-test-coverage.md.

Instead of “hey, write more tests”, it gives Codex a concrete target:

  • Current status (overall coverage + by package)

  • Goal (95%+ overall, with specific per-package minimums)

  • Priority areas and edge cases

  • Context (tooling, libraries, styles)

  • Success criteria

For example, the top of the file:

Overall Coverage: 90.2% of statements

  • internal/config: 100.0%

  • internal/db: 100.0%

  • internal/http/router: 73.3% (needs improvement)

  • internal/domain/payment: 0.0% (no tests)

Then it sets a clear goal:

Improve test coverage to 95%+ overall, focusing on:

  1. Router package (currently 73.3%)

  2. Handlers package (currently 90.9%)

  3. Payment domain (currently 0.0%)

  4. Middleware (currently 93.9%)

And it gets very specific about what to write:

  • Router:

    • SPA fallback behaviour (serving index.html for client routes)

    • Static file edge cases & error handling

    • Protection of /api routes in the static handler

  • Handlers:

    • Error paths in CreateMember

    • Soft-delete scenarios in UpdateMember

    • NULL handling for list endpoints

  • Payment:

    • Validation rules and request validation tests

  • Middleware:

    • Extra auth failure paths

    • Logger edge cases

Finally, it shows Codex where to look:

  • internal/http/router/router.go

  • internal/http/handlers/store.go

  • internal/domain/payment/payment.go

  • internal/http/middleware/auth.go

…and how to measure success:

  • Overall coverage ≥ 95%

  • Router ≥ 85%

  • All tests pass and follow project conventions

This turns Codex from “smart autocomplete” into something much closer to a junior engineer working from a ticket.


Why this is better than pasting a giant prompt into the editor

A few things clicked for me as soon as I saw this pattern:

  1. Reproducibility

    Anyone on the team can run the script and get the same project-aware Codex session.
    No more copy-pasting fragile prompts from Notion or Slack.

  2. Single source of truth for project rules

    The “ClubHub mode” rules live in system-project-context.md (not shown here, but referenced by the script).
    When the stack or conventions change, you update one file and all your Codex workflows inherit it.

  3. Focus per script

    codex-test-coverage.sh is just one entry point.
    You can imagine others:

    • codex-api-design.sh

    • codex-ci-hardening.sh

    • codex-frontend-accessibility.sh

    Each one pulls in the same base context, but uses a different task file.

  4. PR-sized output by design

    The script bakes in constraints like:

    • “Keep changes PR-sized and coherent”

    • “Update or suggest tests whenever behaviour changes”

    That language nudges Codex away from huge, repo-wide refactors and towards reviewable chunks.


How to adapt this for your own project

If you want to copy this pattern, the steps are small:

  1. Create a project context file

    prompts/system-project-context.md with things like:

    • Tech stack (language, frameworks, CI tool)

    • Architectural rules (e.g. multi-tenant filters, layering)

    • Security constraints (no secrets in code, how config works)

    • Testing style (frameworks, mocking approach, naming conventions)

  2. Create a focused task brief

    For example: prompts/improve-test-coverage.md, following this structure:

    • Current metrics (coverage, failing areas)

    • Specific targets & packages

    • Concrete behaviours and edge cases to test

    • Pointers to existing tests as patterns

    • Clear success criteria

  3. Drop in a launcher script

    Adapt codex-test-coverage.sh:

    • Point CONTEXT_FILE at your project context

    • Point COVERAGE_PROMPT at your task file

    • Tweak the “General rules” block for your preferences

    • Replace codex with whatever CLI your AI workflow uses

  4. Check it into the repo

    Treat these like dev tools, not personal notes.
    That way, the whole team can benefit, and updates are code-reviewed.


Takeaway

I went asking for a prompt and got handed the skeleton of a Codex operating system for my repo:

  • A project brain (system-project-context.md)

  • A task brief that looks like a real engineering ticket (improve-test-coverage.md)

  • A one-command launcher that wires them into a focused AI dev session (codex-test-coverage.sh)

It’s a small pattern, but it shifts AI from “clever autocomplete inside the editor” to “scriptable coworker that can be put into different modes for different jobs”.

Next up for me: cloning this pattern for CI hardening, performance profiling, and API design reviews — each with their own prompt file and tiny launcher script.

If you’re already using Cursor and Codex (or similar tools), try doing the same:
instead of asking for better prompts, ask the model to design you a repeatable workflow.

Friday, November 28, 2025

Can AI Replace the Product Manager? How I Co-Designed ClubHub with a Machine



Most of the “AI will replace X” hot takes are pretty tiring.

So I decided to try something more practical (and a bit tongue-in-cheek): what happens if I treat AI as my Product Manager?

Over a couple of sessions, I used an AI model as a structured thinking partner to design ClubHub — a platform for community clubs to manage members, events and payments without being milked by high-fee apps.

This post is a write-up of that process: what worked, what didn’t, and where the human still absolutely matters.


The experiment: AI as PM

The rules of the game were simple:

  • I bring intent and constraints: values, context, and the kind of company I want this to appeal to.
  • The AI behaves like a very fast, very patient PM partner: propose options, ask clarifying questions, help structure decisions.

I framed ClubHub as both:

  1. A real product I’d like to exist (for sports clubs, music groups, youth clubs, etc.), and
  2. A “lure” project: a miniature case study that shows late-stage startups how I think about product, architecture and DevEx.

The initial question was deliberately provocative:

“Could AI replace the product manager on this project?”

Spoiler: no. But it made a fantastic collaborator.


Step 1 – Getting clear on values before features

Instead of starting with features, I asked the AI to help me nail down values and constraints.

We used a simple A/B/C format for each big decision. For example:

Q: When you have to choose, what comes first — affordability or UX polish?
A) rock-bottom cost, B) premium UX, C) a balance

I picked C: it has to be cheap and usable by non-technical volunteers on a phone.

From there, we walked through a series of questions:

  • Data ownership and lock-in
    → We landed on: “Easy to join, easy to leave. Clubs fully own their data; exports and portability are a feature, not a threat.”
  • Advertising stance
    → Clear “no ads, ever.” No tracking, no sponsored clubs, no ad-tech hanging off kids’ activities.
  • Raffles / lotteries ethics
    → Fundraising and fun only, not gambling. No casino loops, loot boxes or “infinite spin to win”.
  • Geography and regulation
    → Start with Ireland/EU and treat GDPR + child safety as design constraints, not compliance afterthoughts.

The AI was extremely helpful here: it kept presenting clean option sets and summarising what we’d chosen so far. That made it much easier to see contradictions and refine my thinking.

But the actual choices — especially the ethical ones — were human.


Step 2 – Defining who ClubHub is for

Next up: who this thing should serve.

We knew it was “clubs”, but that’s a big space. Together, we tested different slices:

  • Community music / arts groups
  • Amateur sports clubs
  • School / parent / youth organisations

The answer was: yes to all of them.

Rather than over-optimise for a single niche, we decided ClubHub’s language should be inclusive and generic:

  • “club”, not “team”
  • “event”, not “fixture”
  • “member”, not “player” or “pupil”

The AI helped by sanity-checking wording and pointing out where terminology might exclude certain club types. That’s a small detail that matters a lot when your users are volunteers across wildly different contexts.


Step 3 – Mission and business model, with ethics baked in

Once the values and audience were clear, the AI helped me condense the mission into something I could actually put on a landing page:

Mission:
Help community clubs manage members, payments and events at radically low cost, so they can thrive without being exploited by high-fee platforms.

From that, we derived some concrete product and business principles:

  • Core admin (members, events, “who has paid”) should be very low cost or free for small clubs.
  • Monetisation only where clubs are already making profit, like merch sales, raffles and fundraisers, and premium features (analytics, integrations, white-labelling).
  • Absolutely no lock-in: data export and “easy to leave” are features.
  • No ads, ever.

The AI kept proposing standard SaaS pricing patterns (“per member per month”, “take a % of all payments”), and I kept saying no until it aligned with the mission.

This is a good example of the dynamic:

AI is brilliant at recalling patterns that work for many businesses.
Humans still need to decide what kind of business they want to be.


Step 4 – Shaping the MVP and UX

With values and business model in place, we moved to the question: What should v1 actually do?

We narrowed the MVP scope to:

  1. Clubs and members – basic profiles, roles, child flag.
  2. Membership plans and subscriptions – who’s on which plan, and their status.
  3. Events and registrations – simple events, capacity and sign-ups.
  4. Payment recording – “this person paid for this membership/event”, with method and reference.
  5. Admin UI (mobile-first) – a treasurer or coach should be able to do 90% of tasks on their phone.

The AI kept me honest by pushing for coherence:

  • Don’t attach raffles or merch before the basics work.
  • Don’t go “full marketplace” if the core problem is “we don’t know who’s paid”.
  • Keep the UI deliberately simple and calm.

We also chose SumUp as the initial payment mindset — because they treat small organisations reasonably well — and decided that v1 would just record payments, not own the entire payment flow.


Step 5 – Technical shape (without drowning in detail)

Finally, we sketched the technical architecture, still using AI as a co-pilot:

  • Go backend – modular monolith, multi-tenant by club_id.
  • AWS App Runner – one container per environment, low-ops, easy scaling.
  • RDS Postgres – shared database.
  • S3 – logos and documents.
  • React SPA – mobile-first, built with Vite and Tailwind and served from the Go API.
  • GitLab CI – a paved-road pipeline with stages for linting, frontend build, security checks, image build, and deploy to App Runner.

The AI was very good at:

  • Suggesting plausible stacks for my constraints (cost, EU data, ease of ops).
  • Structuring CI stages and naming them coherently.
  • Keeping everything consistent with the mission (multi-tenant for low per-club cost, etc.).

The choices still reflected my own experience and goals, especially around designing for thousands of clubs and using ClubHub as a mini case study for late-stage startups who need paved-road CI/CD.


So… can AI replace the Product Manager?

After this experiment, my answer is:

No, but it can be an excellent Product co-pilot.

Where AI shines:

  • Generating clear options quickly (A/B/C trade-offs).
  • Summarising decisions so you can spot contradictions.
  • Keeping the conversation moving instead of getting stuck on the blank page.
  • Remembering all the constraints you’ve already set.

Where humans still matter:

  • Setting the mission and ethics: no ads, no exploitation, child safety, fundraising-not-gambling.
  • Choosing which users to serve and how to talk to them.
  • Balancing business reality with values (what you’re willing to charge for).
  • Connecting the product shape to your broader story — in my case, a “lure” for humane, late-stage startups who want to move from chaos to paved roads.

In other words: AI can help you think, structure, and explore. But it can’t be the conscience, the strategist, or the one who chooses what kind of impact you want your product to have.

For ClubHub, that combination worked beautifully: AI did a lot of the grinding work; I supplied direction, values, and judgement.

And I’m much more interested in Product Managers who know how to use AI like this than in replacing them.


In a follow-up post, I’ll go deeper into ClubHub’s mission and how it’s designed to keep club costs low while still being sustainable. After that: a hardcore tech deep dive on the Go/App Runner/GitLab CI side for the platform nerds.

Thursday, November 27, 2025

From Plan to Backlog: Letting Codex Drive the GitLab Issues Loop


I’ve been building PipelineSage, a Spring Boot + Gradle service that analyses CI pipeline failures and returns a structured summary, category, and suggested fix. It’s also my sandbox for “golden-path” GitLab CI/CD for AI-enabled apps – including a dedicated analyze_failure stage that calls back into the service with JSON logs.

Along the way, I ended up with a surprisingly useful pattern:

Let Codex read the plan, create the GitLab issues, and update the docs – using a GitLab API token it never actually sees.

Here’s how that loop works.


1. Plans live in docs, not in my head

The project roadmap sits in docs/project-notes.md as phases, checklists, and “definition of done” for each slice.

For example, Phase 2 (“Fake AI & failure flow”) spells out:

  • implement rule-based analysis at POST /api/analysis,

  • wire a GitLab analyze_failure job that calls it on test failure,

  • document the pattern as a “golden path” CI template.

Each item is now linked to a real GitLab issue ([#10], [#11], etc.), but those links didn’t exist when I wrote the plan – Codex added them.


2. Secrets live in the shell, not in the prompt

To keep things safe, I created a small env loader script so my shell knows how to talk to GitLab, but Codex never sees the raw token:

  • GITLAB_API_TOKEN – a personal access token for the GitLab project

  • PIPELINESAGE_GITLAB_URL – the project URL

These are loaded from a local .env file via a helper script in dev-env, and exported into my shell as environment variables.

Codex only ever works with $GITLAB_API_TOKEN and $PIPELINESAGE_GITLAB_URL as names – not as literal values.


3. Codex does the boring bits end-to-end

The workflow now looks like this:

  1. I start a Codex session “in project mode” using the shared system context doc, which describes the API (LogAnalysisRequest / LogAnalysisResult), the FailureCategory enum, and the CI shape.

  2. I ask Codex to read docs/project-notes.md and identify the tasks for, say, Phase 2.

  3. I then ask:
    “Using the env vars already available in this shell, generate and run the curl calls needed to create GitLab issues for each task, with sensible titles and descriptions.”

Because the token and project URL are already in the environment, Codex doesn’t need to see them; it just uses $GITLAB_API_TOKEN and $PIPELINESAGE_GITLAB_URL in the commands.

The neat part: Codex actually executes the calls, so by the time it’s finished, GitLab has one issue per task, correctly labelled and grouped by phase.


4. Codex then updates the docs to point back at GitLab

Once the issues exist, I ask Codex to:

“Update docs/project-notes.md so each checklist item links to its new GitLab issue.”

Codex re-reads the doc, inserts markdown links like:

- [ ] Implement rule-based analysis in `/api/analysis` (no real LLM yet). ([#9](https://gitlab.com/…/issues/9))

…for every item across phases, and writes the updated file back into the repo.

I didn’t cut and paste a single issue URL by hand.

The result is:

  • docs that know about the backlog, and

  • a backlog that clearly points back to the design and CI docs (architecture, paved road, system context).


Why this feels like “real” DevEx

What I like about this pattern is that it stays true to my original principles for PipelineSage:

  • Golden paths over heroics – GitLab issues and pipelines follow a clear, documented pattern instead of ad-hoc commands.

  • Quality via automation – even the planning artefacts (docs → issues) are automated.

  • Security by design – API tokens never get pasted into prompts; they live in env and are described in the system context doc, not hard-coded.

Codex isn’t “being clever” here. It’s just doing the plumbing work that I’d otherwise do by hand:

  • reading the plan,

  • generating issue creation calls,

  • executing them,

  • and wiring the links back into the docs.

That’s exactly the kind of work we used to script ourselves for CI/CD.

The difference now is that I can describe the outcome in words, let Codex discover the right GitLab API calls, and keep my attention on the design of the system – not on copying issue URLs around.

For a DevEx / platform engineer, that feels like the right division of labour between humans, tools, and the growing number of AI “team members” on the bench.

Wednesday, November 26, 2025

Treating AI as a Team Member: Bootstrapping Codex for CI/CD-Aware Development



Over the last few weeks I’ve been playing with something new:
not a new framework, not a new Agile process – but a new team member.

A very fast, very literal, slightly overconfident team member called Codex.

I’m using it on a small side project called PipelineSage – a Spring Boot and Gradle service that analyses CI pipeline failures and returns:

  • a short human-readable summary,

  • a failure category, and

  • some suggested next steps for the engineer.

Underneath that, the real experiment is this:

How do we build golden paths for AI-assisted development that feel as reliable as good CI/CD, rather than like a random chatbot bolted onto the side?

It turns out the answer looks a lot like Agile transformation used to:
make the rules explicit, make them visible, and put them in the same place as the work.


The problem: clever models, forgetful tools

I’ve been jumping between Cursor, IntelliJ AI and various GPT models.

They’re all impressive. But I kept running into the same issues:

  • Different tools “remember” different versions of my domain model.

  • One happily invents new enum values.

  • Another rewrites the CI pipeline and breaks the failure analysis stage.

  • Free tiers silently throttle me halfway through a refactor.

It reminded me of badly aligned Agile teams:
everyone using the same words, but with completely different mental models.

At some point I realised the real problem:

All of the important context lived in chat history, not in the codebase.

You wouldn’t describe a critical API only in a meeting and never write it down.
But that’s effectively what I was doing with the AI.

So I stopped.


Step 1 – Put the AI’s “mental model” in the repo

The first change was simple: create a single project context file and treat it like any other artefact.

I added:

prompts/system-project-context.md

# PipelineSage – System Project Context Project: - Spring Boot 3, Java 21, Gradle (Groovy), GitLab CI - POST /api/analysis: - LogAnalysisRequest: logs, stage, jobName, commitId - LogAnalysisResult: summary, category, suggestedFix, confidence?, links? FailureCategory enum (canonical): - TEST_FAILURE - BUILD_CONFIGURATION - INFRASTRUCTURE - DEPENDENCY - SECRET_OR_AUTH - OTHER CI shape: - Stages: build → test → package → analyze_failure - analyze_failure: - when: on_failure, allow_failure: true - send JSON LogAnalysisRequest to /api/analysis - never block the main pipeline

This file is now the source of truth for:

  • The main endpoint and its contract.

  • The official list of failure categories.

  • The shape and rules of the CI pipeline.

If I change the design, this file changes too – just like any other spec.


Step 2 – Teach Codex to join the team

Codex in the shell is fast, clean and brutally honest.
It also starts every session with no context at all.

So I wrapped it in a tiny script that “onboards” it whenever I start:

scripts/codex.sh:

#!/usr/bin/env bash set -euo pipefail REPO_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)" CONTEXT_FILE="$REPO_ROOT/prompts/system-project-context.md" if [ ! -f "$CONTEXT_FILE" ]; then echo "❌ Context file not found: $CONTEXT_FILE" >&2 exit 1 fi echo "🚀 Starting Codex in PipelineSage mode" echo " Repo: $REPO_ROOT" echo " Context: $CONTEXT_FILE" echo PROMPT="$(cat "$CONTEXT_FILE")" PROMPT="$PROMPT --- You are now in the PipelineSage project session. General rules: - Be brief and opinionated. - Assume: Java 21, Spring Boot 3.x, Gradle, GitLab CI. - Respect the FailureCategory enum and JSON contracts above. - Keep changes PR-sized and coherent; update/suggest tests when behaviour changes. - Never hard-code secrets or URLs; use env/config. - For CI, keep .gitlab-ci.yml valid and make AI analysis non-blocking. " codex \"$PROMPT\"

Now my flow looks like this:

./scripts/codex.sh # start Codex “in project” cat .gitlab-ci.yml | codex edit # adjust analyze_failure job cat LlmAnalysisService.java | codex edit # add retries + fallback

Because the context lives in the repo, Codex:

  • Stops inventing new failure categories.

  • Respects the POST /api/analysis contract.

  • Understands that the analyze_failure stage is allowed to fail, but must never block CI.

It behaves much more like a new team member who has read the docs, rather than a very clever stranger.


Step 3 – Make the pattern reusable

The nice side-effect of this approach is that it doesn’t just help Codex.

Any tool can plug into the same model:

  • Cursor can be pointed at prompts/system-project-context.md and the docs.

  • IntelliJ AI can use the same file as its “project context”.

  • ChatGPT (the one I’m using to write this) reads from the same contracts.

When the design evolves, I don’t need to “retrain” each tool by hand.
I update the docs and the prompt file, and they all follow.

In Agile terms, I’ve gone from:

“We all kind of remember the process”

to:

“We have a lightweight working agreement written down, and we actually use it.”


Why this matters for real teams

PipelineSage is a small project, but the pattern feels very familiar from Agile coaching days:

  • Start by making expectations explicit.

  • Put them somewhere visible and versioned.

  • Use them to align behaviour across the whole system.

For AI-assisted development, that translates into:

  • Context as code.

  • Prompts as artefacts, not disposable chat.

  • AI tools as replaceable clients of a shared model.

If you’re experimenting with AI in your engineering teams, you don’t have to solve everything at once. A few simple steps go a long way:

  1. Write down the key contracts and constraints in your repo.

  2. Create a small project-context file that any assistant can read.

  3. Add tiny bootstrap scripts (like codex.sh) so new tools start in the right mode.

Just like Agile, it’s a series of small steps that add up.

In my case, the biggest win was psychological: Codex stopped feeling like a random magic trick and started to feel like what it really is:

A very fast junior team member, working inside a system of clear agreements.

And that’s a team I know how to work with.

🐌 From Codex CLI to OpenAI API: Building a Smarter AI Worker in 24 Hours

From Codex CLI to OpenAI API: Building a Smarter AI Worker in 24 Hours How throttling led to a complete rewrite, cost optimization, and a mo...