/ 4 min read

When to Ralph, When to Code


This is Part 6 of a series on the Ralph Loop. Today we synthesize lessons learned and provide guidelines for when autonomous AI development makes sense.


What We Learned


After running Ralph on dozens of feature phases, patterns emerged:


What Ralph Handles Well


Repetitive implementation - Creating 30 similar API endpoints, adding tests for existing functions, migrating patterns across files. The tasks Claude finds “boring” but executes reliably.


Well-defined systems - When you have clear types, existing patterns, and good test coverage, Claude excels. The constraints guide it.


Isolated changes - Tasks that don’t require understanding complex interdependencies. Add this field, implement this formula, create this component.


Backend work - Services, database operations, business logic. Claude’s bread and butter.


What Ralph Struggles With


Emergent architecture - When the right approach only becomes clear during implementation, Claude makes poor decisions early and compounds them.


Frontend polish - Animations, responsive layouts, visual refinement. These require human judgment about what “looks right.”


Complex state - Multi-step async flows, race conditions, state machines with many transitions. Claude loses track of edge cases.


Integration work - Connecting to external APIs, debugging environment issues, handling deployment. Too many unknowns.


The Decision Framework


Before starting Ralph, ask:


QuestionIf Yes → RalphIf No → Interactive
Are requirements well-defined?
Do patterns exist to follow?
Is each task completable in isolation?
Would you trust a junior dev unsupervised?
Is failure cheap to fix?

If you answered “no” to any of these, interactive Claude Code sessions are probably better.


The Hybrid Approach


Our actual workflow combines both modes:


Phase 1: Interactive Discovery

  • Explore the problem space with Claude
  • Make architectural decisions together
  • Write the first implementation of tricky parts

Phase 2: Ralph for Volume

  • Break remaining work into atomic tasks
  • Run Ralph for the mechanical implementation
  • Review results periodically

Phase 3: Interactive Polish

  • Fix stuck points manually
  • Refine edge cases
  • Add finishing touches

This captures the best of both: human judgment for decisions, AI for execution.


Prompt Evolution


Your Ralph prompts will improve over time. Track what causes stuck states:


Stuck PatternPrompt Fix
Tests not runAdd explicit “run tests before marking complete”
Types missingAdd “ensure all new types are exported”
Wrong directoryAdd specific file paths to each task
Incomplete tasksAdd definition of done checklist
Context lossAdd more background in prompt header

Each failure teaches you what Claude needs to succeed autonomously.


Cost Considerations


Ralph uses more API calls than interactive development. For our 114-task phase:


  • 68 iterations × ~3,000 tokens input × ~2,000 tokens output
  • Roughly 200K input + 140K output tokens
  • At current Claude pricing: ~$3-5 per phase

Compare to the developer time saved: 6 hours autonomous vs. 40+ hours manual. The economics strongly favor Ralph for suitable tasks.


The Mental Model Shift


Traditional development: You write code, Claude assists.


Ralph development: Claude writes code, you supervise.


This is a different skill set:

  • Writing clear specifications instead of code
  • Reviewing AI output instead of creating from scratch
  • Debugging prompts instead of debugging implementations
  • Managing autonomous processes instead of hands-on-keyboard

Some developers find this disorienting. Others find it liberating. Know which camp you’re in.


What’s Next


Ralph is a starting point, not an endpoint. The pattern scales:


Parallel Ralph - Run multiple Ralph instances on independent feature branches, merge the results.


Hierarchical Ralph - A “supervisor” Claude instance that monitors multiple worker Ralphs and handles coordination.


Self-improving Ralph - Let Claude refine its own prompts based on what causes stuck states.


We’re experimenting with all of these. The fundamental insight—externalize state, loop until done, detect stuck—applies at every scale.


Getting Started


If you want to try Ralph:


  1. Start small - A 10-task feature, not 100
  2. Stay present - Watch the first run, don’t go AFK
  3. Iterate the prompt - Refine based on failure patterns
  4. Keep tasks atomic - Smaller is better until you calibrate
  5. Trust but verify - Review the code Claude produces

The ralph.sh script from our Aqua-tics project is available as a starting point. Adapt it to your workflow.


The Bottom Line


Ralph doesn’t replace developer skill—it redirects it. Instead of typing code, you’re specifying tasks, writing prompts, and reviewing output. The leverage is enormous: one developer supervising Ralph produces the output of a small team.


The question isn’t whether AI will change how we build software. It’s whether you’ll be the one writing the prompts or competing with those who do.




Thanks for reading the Ralph Loop series. Questions or experiences to share? Find me on Twitter.