Thoughts on being Productive with AI Coding Agents
I recently shared how I went from a blank Xcode project to a live iOS app in 17 days. I used AI coding assistants for most of the code, and my total time in front of the keyboard was under 24 hours.
I’ve been wanting to follow up with thoughts on where the coding tools struggled, where they shined, and the approaches that made them actually useful. My iOS project provided a solid proving ground for this question, but I’ll also talk about other places I’ve used these tools to give a more full picture of where I’ve found productivity.
Projects and tools
I’ve experimented with agentic coding tools on a variety of projects: the Swift/iOS app I mentioned above, during my day to day at Benchling, in personal backend/infra projects, and on this website too.
I’ve spent the most time expermenting with Cursor, Claude Code and Codex. I also tinkered with Devin AI and Gemini CLI, but didn’t have much success with them.
Key takeaways
Below is a list of 5 things I came away with after using these tools.
1. Each tool has its own strengths and weaknesses
Not all tools are good at everything. After months of use, here’s how I think about them:
- Cursor is best for large projects with history, where small changes need lots of context.
- Claude Code shines on smaller, self-contained projects with clear prompts.
- Codex works best as an async partner for bigger changes, where I can throw out a broad idea and see what it does.
2. They’re not magic, and they can’t make your bad ideas good
If you start with the wrong feature or a weak idea, the agent won’t fix it. These tools highlight the importance of the “what” in software. You still need to decide what’s worth building before you turn them loose.
3. Prompts & small features are critical
Clear, focused prompts matter more than anything. I draft mine in Notion to avoid trying to do too much at once. Over-scoped prompts overwhelm the tools, and it’s often tricky to salvage just part of a big change.
Version control is just as important. My best workflow has been mapping out a feature, breaking it into bite-size prompts, and keeping everything on a branch. It’s easy to burn through features quickly, and branching keeps rewinds simple.
4. End-to-end features & APIs require hands-on
I ran into trouble with what I’ll call “end-to-end” features. Push notifications in iOS, CloudKit sync, or Plaid’s API all required changes across multiple layers. The tools could generate pieces, but not the whole flow. I had to get my hands dirty here.
5. There are things they can’t do at all
Plenty of work lives outside the repo. Xcode projects are a good example: push notifications require toggles and setup across Apple’s ecosystem. You can’t just prompt your way into this stuff working.
I bring a specific perspective to this moment in time. I wrote my first code in the nineteen-hundreds, by hand, in Microsoft Notepad. That feels like cave painting compared to modern IDEs like VS Code.
AI coding assistants are the most meaningful new tool I’ve seen since then. They save me hours I used to spend on Stack Overflow, hunting down the right person, or digging through docs. But they’re still just tools; they let me focus on scoping and prioritization, but can’t tell me what to build or why.
I feel like I’ve barely scratched the surface here. I’m excited to dig in more to what makes these tools work and how they’re built, and also to think more about what happens as productivty plateaues with these tools.