Agile in the Real World by Daragh Farrell: 💰 From Rate Limits to Autonomous Workers: Building an AI Development Team with Shell Scripts

How a Codex CLI throttle became the catalyst for building something better

The Throttle That Launched a Thousand Lines of Bash

It started, as many good engineering stories do, with frustration.

I was happily using OpenAI's Codex CLI to power my AI development workflow. Workers would pick up tasks from a backlog, implement features, run tests, and merge to develop. It was beautiful—until it wasn't.

Rate limited. Throttled. Queued.

The irony wasn't lost on me: I was being told to slow down by a tool designed to speed me up.

So I did what any reasonable engineer would do. I stared at the ceiling for five minutes, muttered something unprintable, and then asked: "What if I just talk to the API directly?"

The Pivot: From CLI to Raw API

The OpenAI Responses API is surprisingly approachable. It's essentially a conversation loop with function calling:

Send a prompt
Model responds (maybe with tool calls)
Execute the tools, send results back
Repeat until done

The challenge wasn't the API—it was everything around it:

Task selection: How does a worker know what to work on?
Isolation: How do multiple workers avoid stepping on each other?
Observability: What's happening inside that loop?
Cost tracking: Am I accidentally burning through my API budget?
Integration: How do changes get merged back?

Enter: `openai-worker.sh`

800+ lines of bash that probably shouldn't work as well as it does.

The architecture is beautifully stupid:

┌─────────────────────────────────────────────────────────┐
│                    AI Worker Runner                      │
│  - Loads project manifest                               │
│  - Selects/locks task from backlog                      │
│  - Creates git worktree for isolation                   │
│  - Builds role-specific prompt                          │
│  - Launches OpenAI worker                               │
│  - Merges changes back to develop                       │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                    OpenAI Worker                         │
│  - Conversation loop with Responses API                 │
│  - Function calling: read/write files, git, shell       │
│  - Token tracking with live cost estimation             │
│  - Autonomous task completion                           │
└─────────────────────────────────────────────────────────┘

The Fun Parts

Number Emoji Iterations: Because [Iteration 42] is boring, but 4️⃣2️⃣ sparks joy.

number_to_icon() {
  local num="$1"
  local result=""
  local digits=$(echo "$num" | sed 's/./& /g')
  for digit in $digits; do
    case "$digit" in
      0) result="${result}0️⃣" ;;
      1) result="${result}1️⃣" ;;
      # ... you get the idea
    esac
  done
  echo "$result"
}

Live Cost Tracking: Every iteration shows cumulative spend in USD and EUR.

5️⃣7️⃣ Making API call...
   💰 Tokens: in=145780 out=636 | Total: in=4579501 out=23199 | Cost: $5.9443 USD / €5.4688 EUR

When you're burning through 4.5 million tokens in a session, you want to see that counter tick up.

Auto-Select Mode: When the worker can't lock the requested task (someone else got there first), it doesn't sulk—it finds another high-priority task and gets to work. The prompt literally tells it:

"You are expected to be autonomous and eager to work. Do NOT wait for user input."

I'm not saying I'm training my AI workers to have a Protestant work ethic, but I'm not not saying that.

The Tools: Teaching AI to Touch Files

The Responses API's function calling is the secret sauce. We define a schema, and the model tells us what it wants to do:

{
  "name": "write_file",
  "arguments": {
    "path": "frontend/src/pages/Members.tsx",
    "content": "... 12,250 bytes of React ..."
  }
}

Our worker supports:

read_file / write_file - The basics
run_command - With timeout, for when npm test decides to contemplate infinity
read_directory - Because the model needs to explore
git_status / git_commit - Version control awareness
task_complete - The satisfying finish line

Each tool execution gets logged with emoji flair:

📖 Reading: frontend/src/pages/Members.tsx
✅ Read 12250 bytes

✏️  Writing: frontend/src/pages/Members.tsx  
✅ Wrote 12238 bytes

🔧 Command: npm test
✅ Exit code: 0

The Workflow: From Backlog to Merge

Here's what happens when you run ./scripts/clubhub-ai-worker.sh developer A37:

Load manifest - Find project config, locate backlog
Check stale locks - Clean up any abandoned tasks
Lock task A37 - Mark it as in-progress with a 4-hour expiry
Create worktree - Fresh git worktree branched from develop
Build prompt - Include project context, coding standards, task details
Launch worker - Start the API conversation loop
Worker does work - Read files, write code, run tests, commit
Auto-merge - Fast-forward merge to develop
Cleanup - Remove worktree and branch

The whole thing is designed for parallelism. Run 5 workers, they each get their own worktree, their own branch, their own task. No conflicts until merge time.

The Model Choice: gpt-5.1-codex or Bust

We're strict about models. Only two are allowed:

gpt-5.1-codex (default): The workhorse. $1.25/1M input, $10.00/1M output.
gpt-5.2 (max): For when you need the big brain. $1.75/1M input, $14.00/1M output.

Anything else gets rejected:

if [[ "$MODEL" != "gpt-5.1-codex" ]] && [[ "$MODEL" != "gpt-5.2" ]]; then
  echo "⚠️  Warning: Model $MODEL not allowed. Using default: gpt-5.1-codex"
  MODEL="gpt-5.1-codex"
fi

No surprises. No accidental GPT-4o bills. No tears.

Lessons Learned

1. Bash Can Do Surprisingly Much

JSON parsing? Python one-liners.
HTTP requests? curl.
State management? Files and environment variables.

Is it elegant? Debatable.
Does it work? Surprisingly well.

2. Observability Is Everything

Early versions were opaque. The worker would churn for 20 minutes and I'd have no idea if it was making progress or chasing its tail.

Now every iteration shows:

What tool is being called
What file is being touched
How many tokens consumed
Running cost in real currency

The difference between "what is happening?" and "I see exactly what's happening" is about 50 lines of echo statements.

3. Autonomy Requires Guardrails

Telling an AI "work until done" is dangerous without:

Max iterations (100 by default)
Task completion markers (explicit "I'm done" signal)
Cost visibility (so you see that $15 bill coming)
Worktree isolation (so mistakes are contained)

4. Rate Limits Are a Feature, Not a Bug

Getting throttled on Codex CLI forced me to understand what I actually needed. The result is a system that:

Has no external dependencies beyond curl and Python
Gives me complete control over the conversation
Costs roughly the same (maybe less, without CLI overhead)
Scales to as many workers as my API limits allow

The Output: Today's Session

In one afternoon, we went from "rate limited on Codex CLI" to:

✅ Full OpenAI Responses API integration
✅ Autonomous task selection with backlog integration
✅ Git worktree isolation per worker
✅ Live token and cost tracking (USD/EUR)
✅ Emoji iteration counters (because why not)
✅ Auto-merge to develop on completion
✅ Renamed "manual mode" to "auto-select mode" (words matter)

And then we let the worker loose on a real task: A37 - Member archive/reactivate.

60 iterations. 4.8 million tokens. Backend and frontend implementation. Tests updated. Merged to develop.

All while I wrote this blog post.

The Code

It's all open in ai-project-hub:

ai-project-hub/
├── tools/
│   ├── openai-worker.sh      # The API conversation loop
│   ├── ai-worker-runner.sh   # Task selection, worktree, merge
│   ├── git-branch-helpers.sh # Worktree management
│   └── task-lock-helpers.sh  # Backlog integration
└── projects/
    └── clubhub/
        └── manifest.yaml     # Project config

Is it production-ready? For my production, yes.
Would I recommend it? If you're comfortable reading bash, absolutely.

What's Next

Parallel worker orchestration - Launch N workers, distribute tasks
Smarter cost budgets - "Stop if you hit $20"
Cached context - Reduce token usage with conversation summaries
Better error recovery - Right now a failed iteration is terminal

But those are problems for another throttled afternoon.

Written by a human, with an AI worker implementing features in a separate terminal.

Total cost of that parallel session: approximately €5.47

Total value of not being rate limited: priceless

Agile in the Real World by Daragh Farrell

Saturday, December 13, 2025

💰 From Rate Limits to Autonomous Workers: Building an AI Development Team with Shell Scripts

The Throttle That Launched a Thousand Lines of Bash

The Pivot: From CLI to Raw API

Enter: `openai-worker.sh`

The Fun Parts

The Tools: Teaching AI to Touch Files

The Workflow: From Backlog to Merge

The Model Choice: gpt-5.1-codex or Bust

Lessons Learned

1. Bash Can Do Surprisingly Much

2. Observability Is Everything

3. Autonomy Requires Guardrails

4. Rate Limits Are a Feature, Not a Bug

The Output: Today's Session

The Code

What's Next

No comments:

🐌 From Codex CLI to OpenAI API: Building a Smarter AI Worker in 24 Hours

Saturday, December 13, 2025

💰 From Rate Limits to Autonomous Workers: Building an AI Development Team with Shell Scripts

The Throttle That Launched a Thousand Lines of Bash

The Pivot: From CLI to Raw API

Enter: openai-worker.sh

The Fun Parts

The Tools: Teaching AI to Touch Files

The Workflow: From Backlog to Merge

The Model Choice: gpt-5.1-codex or Bust

Lessons Learned

1. Bash Can Do Surprisingly Much

2. Observability Is Everything

3. Autonomy Requires Guardrails

4. Rate Limits Are a Feature, Not a Bug

The Output: Today's Session

The Code

What's Next

No comments:

🐌 From Codex CLI to OpenAI API: Building a Smarter AI Worker in 24 Hours

Enter: `openai-worker.sh`