AI-Powered Prototyping: How We Are Using GenAI Tools to Build Apps and Architectures Faster

From the Code & Cognition Podcast by Olio Apps At Olio Apps we have been steadily exploring the field of Gen AI development and building software with the assistance of Gen AI tools. From building MVPs to benchmarking LLMs head-to-head with internal developer tools, we’ve been hands-on with testing the bleeding edge of generative AI. In this post, we wanted to share our firsthand experience using Gen AI to speed up prototyping, write production-quality code, and even generate infrastructure diagrams for cloud-native systems. Whether you're scaling a dev team or experimenting solo, we wanted to show how using Gen AI in software development can looks like in practice.

Prototyping Software with GenAI: What Actually Works (and What Doesn’t)

We started with a challenge: Could we build usable software prototypes for apps, games, and UI using only natural language prompts and GenAI tools? We tested tools like Bolt.new, v0.dev, and Claude Sonnet by asking them to create:

Classic games like Snake and Centipede
A bizarre Pac-Man/Snake hybrid with real-time ghost movement
A Reddit-style message board with image uploads, nested comments, and voting

Some prompts worked out of the gate. Others fell apart - usually when backend integration introduced complexity. But what we learned was this: prompting GenAI is like managing a junior engineer. You don’t drop a wall of specs. You guide it step-by-step, providing context and corrections as you go. It’s fast. It’s messy. But it’s getting better every week.

How We Prompt, Collaborate, and Ship with GenAI

We've tried out "vibe coding" for ourselves and after giving it a shot, we have determined some rules of thumb that help make it a usable methodology. It’s less about writing perfect specs, and more about co-creating with AI in a fluid, conversational way. Some principles that emerged from our sessions were:

Treat the AI like a team member: ask clarifying questions, give feedback, iterate
Don’t front-load everything: break tasks into smaller units, step by step.
Include background and context: we got better results by naming the tech stack, conventions, and expected file structure
Version control everything: make frequent commits. One wrong suggestion can undo hours of work.

Following these principles helped elevate our experiments beyond mere vibes into successful Proofs of Concepts and MVPs.

We Tried Benchmarking Claude 3.5 Sonnet vs. Gemini 1.5: Here’s What Happened

We were curious which LLM is actually better at coding, so we used [WebDevArena] (https://web.lmarena.ai/), a head-to-head testing ground, to compare Claude 3.5 Sonnet, Gemini 1.5 Pro, and DeepSeek on real coding tasks. We ran each model through the same prompts - mostly game development and UI logic.

Claude 3.5 Sonnet came out ahead, producing clean, readable code with far fewer hallucinations.
Gemini 1.5 struggled with complex logic in games like Centipede and Snake.
DeepSeek (open-source) surprised us. It wasn’t perfect, but showed promise as a lightweight assistant.

These kinds of direct, practical benchmarks gave us confidence when evaluating which models to use in different parts of our stack, whether as coding copilots or agents for internal developer tooling.

AI Generated Architecture: How We Design Cloud Systems in Minutes

We’re heavy users of Amazon Web Services and one of our pain points has always been visualizing and communicating infrastructure, especially when iterating quickly. That’s where DiagramGPT and Eraser.io came in.

API Gateway triggers a Lambda → writes to DynamoDB → sends a message to an SQS queue → triggers a second Lambda that sends email via SES

And within seconds, we had a polished AWS diagram, complete with icons and connections. Compared to Miro or Lucidchart, this process saved considerable time and made it easier to collaborate with clients and team members during architecture reviews. We’re now integrating it into our solution design workflows for both consulting and internal projects.

Final LLM Dev Tip: Don't Break the Illusion

System prompts matter If your app’s identity relies on LLM behavior, run forensic checks and ensure your system prompt is structured, clear, and hard to break. You may find some LLMs have an easier time holding on to your system prompt so it is good to test and compare.

Additional Insights: Database Modeling and Cursor System Prompts

Two other tools we explored that deserve a shoutout are:

database.build Generated SQL schemas and ER diagrams from casual prompts like “a message board with users, posts, comments, votes, and favorites.” Surprisingly accurate.
Cursor We dug into its **system prompt** (yes, you can view it) to understand how it balances user intent with safe code editing. The result? Our ability to customize our dev environment so we can prompt edits with confidence.

For full-stack prototyping, the combo of schema generation + AI-assisted editing is starting to feel like the future of scaffolding software.

Final Take: Gen AI Software Development Is About Velocity and Direction

Here’s the real shift from our perspective: Gen AI doesn’t just make coding faster, it changes how we think about building. We’re no longer just writing code line-by-line. We’re directing the AI, setting the tone, choosing the structure, guiding the outcome and filling in gaps as necessary. Developers are becoming more like systems architects, especially in the prototyping phase.

Building prototypes in hours instead of days
Testing new architectures without spinning up infrastructure
Onboarding junior devs faster by pairing them with GenAI copilots

If you're in software consulting, running a SaaS, or managing engineering teams—this is the shift to start practicing. Listen to the full conversation on the Code & Cognition Podcast, and follow along as we continue building the future of AI-powered software.

Resources Mentioned

Scott Becker

CEO

A Tampa, Florida native, Scott relocated to the Pacific Northwest in 2008 and founded Olio Apps in 2012. When not hanging out with his family, biking, or playing in his punk rock band, Scott helps define the scope of customer engagements and oversees the business administration of Olio Apps. Scott’s areas of specialty are in DevOps, product design, technical design, and full stack engineering in React, Golang, and Ruby.

Share This Post