How a Codex CLI throttle became the catalyst for building something better
The Throttle That Launched a Thousand Lines of Bash
It started, as many good engineering stories do, with frustration.
I was happily using OpenAI's Codex CLI to power my AI development workflow. Workers would pick up tasks from a backlog, implement features, run tests, and merge to develop. It was beautiful—until it wasn't.
Rate limited. Throttled. Queued.
The irony wasn't lost on me: I was being told to slow down by a tool designed to speed me up.
So I did what any reasonable engineer would do. I stared at the ceiling for five minutes, muttered something unprintable, and then asked: "What if I just talk to the API directly?"
The Pivot: From CLI to Raw API
The OpenAI Responses API is surprisingly approachable. It's essentially a conversation loop with function calling:
- Send a prompt
- Model responds (maybe with tool calls)
- Execute the tools, send results back
- Repeat until done
The challenge wasn't the API—it was everything around it:
- Task selection: How does a worker know what to work on?
- Isolation: How do multiple workers avoid stepping on each other?
- Observability: What's happening inside that loop?
- Cost tracking: Am I accidentally burning through my API budget?
- Integration: How do changes get merged back?
Enter: openai-worker.sh
800+ lines of bash that probably shouldn't work as well as it does.
The architecture is beautifully stupid:
┌─────────────────────────────────────────────────────────┐
│ AI Worker Runner │
│ - Loads project manifest │
│ - Selects/locks task from backlog │
│ - Creates git worktree for isolation │
│ - Builds role-specific prompt │
│ - Launches OpenAI worker │
│ - Merges changes back to develop │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ OpenAI Worker │
│ - Conversation loop with Responses API │
│ - Function calling: read/write files, git, shell │
│ - Token tracking with live cost estimation │
│ - Autonomous task completion │
└─────────────────────────────────────────────────────────┘
The Fun Parts
Number Emoji Iterations: Because [Iteration 42] is boring, but 4️⃣2️⃣ sparks joy.
number_to_icon() {
local num="$1"
local result=""
local digits=$(echo "$num" | sed 's/./& /g')
for digit in $digits; do
case "$digit" in
0) result="${result}0️⃣" ;;
1) result="${result}1️⃣" ;;
# ... you get the idea
esac
done
echo "$result"
}
Live Cost Tracking: Every iteration shows cumulative spend in USD and EUR.
5️⃣7️⃣ Making API call...
💰 Tokens: in=145780 out=636 | Total: in=4579501 out=23199 | Cost: $5.9443 USD / €5.4688 EUR
When you're burning through 4.5 million tokens in a session, you want to see that counter tick up.
Auto-Select Mode: When the worker can't lock the requested task (someone else got there first), it doesn't sulk—it finds another high-priority task and gets to work. The prompt literally tells it:
"You are expected to be autonomous and eager to work. Do NOT wait for user input."
I'm not saying I'm training my AI workers to have a Protestant work ethic, but I'm not not saying that.
The Tools: Teaching AI to Touch Files
The Responses API's function calling is the secret sauce. We define a schema, and the model tells us what it wants to do:
{
"name": "write_file",
"arguments": {
"path": "frontend/src/pages/Members.tsx",
"content": "... 12,250 bytes of React ..."
}
}
Our worker supports:
read_file/write_file- The basicsrun_command- With timeout, for whennpm testdecides to contemplate infinityread_directory- Because the model needs to exploregit_status/git_commit- Version control awarenesstask_complete- The satisfying finish line
Each tool execution gets logged with emoji flair:
📖 Reading: frontend/src/pages/Members.tsx
✅ Read 12250 bytes
✏️ Writing: frontend/src/pages/Members.tsx
✅ Wrote 12238 bytes
🔧 Command: npm test
✅ Exit code: 0
The Workflow: From Backlog to Merge
Here's what happens when you run ./scripts/clubhub-ai-worker.sh developer A37:
- Load manifest - Find project config, locate backlog
- Check stale locks - Clean up any abandoned tasks
- Lock task A37 - Mark it as in-progress with a 4-hour expiry
- Create worktree - Fresh git worktree branched from develop
- Build prompt - Include project context, coding standards, task details
- Launch worker - Start the API conversation loop
- Worker does work - Read files, write code, run tests, commit
- Auto-merge - Fast-forward merge to develop
- Cleanup - Remove worktree and branch
The whole thing is designed for parallelism. Run 5 workers, they each get their own worktree, their own branch, their own task. No conflicts until merge time.
The Model Choice: gpt-5.1-codex or Bust
We're strict about models. Only two are allowed:
- gpt-5.1-codex (default): The workhorse. $1.25/1M input, $10.00/1M output.
- gpt-5.2 (max): For when you need the big brain. $1.75/1M input, $14.00/1M output.
Anything else gets rejected:
if [[ "$MODEL" != "gpt-5.1-codex" ]] && [[ "$MODEL" != "gpt-5.2" ]]; then
echo "⚠️ Warning: Model $MODEL not allowed. Using default: gpt-5.1-codex"
MODEL="gpt-5.1-codex"
fi
No surprises. No accidental GPT-4o bills. No tears.
Lessons Learned
1. Bash Can Do Surprisingly Much
JSON parsing? Python one-liners.
HTTP requests? curl.
State management? Files and environment variables.
Is it elegant? Debatable.
Does it work? Surprisingly well.
2. Observability Is Everything
Early versions were opaque. The worker would churn for 20 minutes and I'd have no idea if it was making progress or chasing its tail.
Now every iteration shows:
- What tool is being called
- What file is being touched
- How many tokens consumed
- Running cost in real currency
The difference between "what is happening?" and "I see exactly what's happening" is about 50 lines of echo statements.
3. Autonomy Requires Guardrails
Telling an AI "work until done" is dangerous without:
- Max iterations (100 by default)
- Task completion markers (explicit "I'm done" signal)
- Cost visibility (so you see that $15 bill coming)
- Worktree isolation (so mistakes are contained)
4. Rate Limits Are a Feature, Not a Bug
Getting throttled on Codex CLI forced me to understand what I actually needed. The result is a system that:
- Has no external dependencies beyond curl and Python
- Gives me complete control over the conversation
- Costs roughly the same (maybe less, without CLI overhead)
- Scales to as many workers as my API limits allow
The Output: Today's Session
In one afternoon, we went from "rate limited on Codex CLI" to:
- ✅ Full OpenAI Responses API integration
- ✅ Autonomous task selection with backlog integration
- ✅ Git worktree isolation per worker
- ✅ Live token and cost tracking (USD/EUR)
- ✅ Emoji iteration counters (because why not)
- ✅ Auto-merge to develop on completion
- ✅ Renamed "manual mode" to "auto-select mode" (words matter)
And then we let the worker loose on a real task: A37 - Member archive/reactivate.
60 iterations. 4.8 million tokens. Backend and frontend implementation. Tests updated. Merged to develop.
All while I wrote this blog post.
The Code
It's all open in ai-project-hub:
ai-project-hub/
├── tools/
│ ├── openai-worker.sh # The API conversation loop
│ ├── ai-worker-runner.sh # Task selection, worktree, merge
│ ├── git-branch-helpers.sh # Worktree management
│ └── task-lock-helpers.sh # Backlog integration
└── projects/
└── clubhub/
└── manifest.yaml # Project config
Is it production-ready? For my production, yes.
Would I recommend it? If you're comfortable reading bash, absolutely.
What's Next
- Parallel worker orchestration - Launch N workers, distribute tasks
- Smarter cost budgets - "Stop if you hit $20"
- Cached context - Reduce token usage with conversation summaries
- Better error recovery - Right now a failed iteration is terminal
But those are problems for another throttled afternoon.
Written by a human, with an AI worker implementing features in a separate terminal.
Total cost of that parallel session: approximately €5.47
Total value of not being rate limited: priceless
No comments:
Post a Comment