When to Ralph, When to Code • Funk

This is Part 6 of a series on the Ralph Loop. Today we synthesize lessons learned and provide guidelines for when autonomous AI development makes sense.

What We Learned

After running Ralph on dozens of feature phases, patterns emerged:

What Ralph Handles Well

Repetitive implementation - Creating 30 similar API endpoints, adding tests for existing functions, migrating patterns across files. The tasks Claude finds “boring” but executes reliably.

Well-defined systems - When you have clear types, existing patterns, and good test coverage, Claude excels. The constraints guide it.

Isolated changes - Tasks that don’t require understanding complex interdependencies. Add this field, implement this formula, create this component.

Backend work - Services, database operations, business logic. Claude’s bread and butter.

What Ralph Struggles With

Emergent architecture - When the right approach only becomes clear during implementation, Claude makes poor decisions early and compounds them.

Frontend polish - Animations, responsive layouts, visual refinement. These require human judgment about what “looks right.”

Complex state - Multi-step async flows, race conditions, state machines with many transitions. Claude loses track of edge cases.

Integration work - Connecting to external APIs, debugging environment issues, handling deployment. Too many unknowns.

The Decision Framework

Before starting Ralph, ask:

Question	If Yes → Ralph	If No → Interactive
Are requirements well-defined?	✓	✗
Do patterns exist to follow?	✓	✗
Is each task completable in isolation?	✓	✗
Would you trust a junior dev unsupervised?	✓	✗
Is failure cheap to fix?	✓	✗

If you answered “no” to any of these, interactive Claude Code sessions are probably better.

The Hybrid Approach

Our actual workflow combines both modes:

Phase 1: Interactive Discovery

Explore the problem space with Claude
Make architectural decisions together
Write the first implementation of tricky parts

Phase 2: Ralph for Volume

Break remaining work into atomic tasks
Run Ralph for the mechanical implementation
Review results periodically

Phase 3: Interactive Polish

Fix stuck points manually
Refine edge cases
Add finishing touches

This captures the best of both: human judgment for decisions, AI for execution.

Prompt Evolution

Your Ralph prompts will improve over time. Track what causes stuck states:

Stuck Pattern	Prompt Fix
Tests not run	Add explicit “run tests before marking complete”
Types missing	Add “ensure all new types are exported”
Wrong directory	Add specific file paths to each task
Incomplete tasks	Add definition of done checklist
Context loss	Add more background in prompt header

Each failure teaches you what Claude needs to succeed autonomously.

Cost Considerations

Ralph uses more API calls than interactive development. For our 114-task phase:

68 iterations × ~3,000 tokens input × ~2,000 tokens output
Roughly 200K input + 140K output tokens
At current Claude pricing: ~$3-5 per phase

Compare to the developer time saved: 6 hours autonomous vs. 40+ hours manual. The economics strongly favor Ralph for suitable tasks.

The Mental Model Shift

Traditional development: You write code, Claude assists.

Ralph development: Claude writes code, you supervise.

This is a different skill set:

Writing clear specifications instead of code
Reviewing AI output instead of creating from scratch
Debugging prompts instead of debugging implementations
Managing autonomous processes instead of hands-on-keyboard

Some developers find this disorienting. Others find it liberating. Know which camp you’re in.

What’s Next

Ralph is a starting point, not an endpoint. The pattern scales:

Parallel Ralph - Run multiple Ralph instances on independent feature branches, merge the results.

Hierarchical Ralph - A “supervisor” Claude instance that monitors multiple worker Ralphs and handles coordination.

Self-improving Ralph - Let Claude refine its own prompts based on what causes stuck states.

We’re experimenting with all of these. The fundamental insight—externalize state, loop until done, detect stuck—applies at every scale.

Getting Started

If you want to try Ralph:

Start small - A 10-task feature, not 100
Stay present - Watch the first run, don’t go AFK
Iterate the prompt - Refine based on failure patterns
Keep tasks atomic - Smaller is better until you calibrate
Trust but verify - Review the code Claude produces

The ralph.sh script from our Aqua-tics project is available as a starting point. Adapt it to your workflow.

The Bottom Line

Ralph doesn’t replace developer skill—it redirects it. Instead of typing code, you’re specifying tasks, writing prompts, and reviewing output. The leverage is enormous: one developer supervising Ralph produces the output of a small team.

The question isn’t whether AI will change how we build software. It’s whether you’ll be the one writing the prompts or competing with those who do.

Thanks for reading the Ralph Loop series. Questions or experiences to share? Find me on Twitter.