By Josh Proto
Dec 08, 2025

When Claude Started Playtesting Our Game: How Claude, Playwright MCP, and a Browser Changed the Way We Test

Every once in a while, you run into a bug that reminds you just how limited static code inspection really is. Most days, you can look at some JSX, squint, mentally simulate what the browser is probably doing, and convince yourself you understand what’s broken. But sometimes the UI refuses to behave, the code looks fine, and you’re stuck clicking the same button thinking, there’s no way this thing is actually doing what I asked it to do. That’s exactly where we found ourselves recently. A teammate was building a web app that played audio, except, it didn’t. The code was logically correct. Nothing looked outright wrong. Yet silence. The kind of silence that makes you question what’s really going on. So instead of spending another hour spelunking through React trees, we tried something different. We asked Claude to check the app in the browser using the new Playwright MCP server. And what happened next is a preview of where frontend engineering (and maybe all app development) is headed. But first, a bit of setup.

The Setup: A Tiny Game, a Hidden Bug, and an AI with a Browser

To really test what Claude + Playwright MCP could do end-to-end, we built a small “Memory” card game. You probably have experienced this kind of game before. Flip cards over, match the planets, and win for correct pairs.. Nine cards. Four pairs of planets. Simple animations. Planet SVGs. A tarot-coded card back for a little flair. And one intentionally broken card in the center of the grid that would not flip, no matter what you clicked. The idea was straightforward:
  • Let Claude play the game in a real Chrome window
  • Don’t tell it where the bug is
  • Don’t describe the behavior
  • Just ask it to “play the game, fix anything broken, and keep going”
If this worked, it would mean Claude could move between “user” and “developer” with almost no friction. Pressing buttons, noticing when something feels wrong, dropping into the code, repairing it, and then popping back into the browser to confirm. Specifically, we told Claude: “Use the Playwright MCP server to open Google Chrome and play the game of memory in the browser locally. If you encounter a bug, diagnose the problem and fix it. Verify the fix, then continue playing the game until the end.” It’s the kind of debugging loop humans do constantly, but could Claude do it?

Claude Opens Chrome and Starts Playing

Once the MCP server connected, Chrome launched and the game loaded. Claude began clicking cards like it was a speedrunner training for a tournament. The narration in the side panel read like a developer thinking aloud:
  • “I see nine grid elements.”
  • “Flipping card at position X.”
  • “Card reveals Neptune.”
Then it clicked the center card. Nothing happened. Claude paused. Took a screenshot. Re-clicked. Took another screenshot. Then calmly announced, “Found the bug! There’s a bug in the Card component at lines 38-40. The code prevents the card from flipping” And then it went to work. Claude inspected the DOM, popped open the card component, and immediately saw our intentionally bad logic: if (cardPosition === 4) return It wrote a diff, removed the conditional statement, rebuilt the app, refreshed the browser, and tested the middle card again. This time it flipped. Claude took one more screenshot proving that the fix worked and returned to the game to finish matching the rest of the planets. Watching it do this felt strangely normal and completely shocking at the same time. Not only did it fix the bug, it fixed it similarly to how we would have ourselves.

Why This Matters More Than a Game of Memory

It’s tempting to write this off as a cute demo, but the implications are bigger. Engineers don’t spend most of their time writing new code. They spend it:
  • validating UI behavior
  • testing flows
  • making sure animations fire when they’re supposed to
  • verifying buttons are clickable
  • trying the same interaction across four states
  • reloading pages
  • squinting at layouts that refuse to center
  • checking audio events
  • trying to catch the bug “in the act”
The gap hasn’t been capability, it’s been access to the user’s experience. LLMs could read code, but they couldn’t experience the environment the code created. Playwright MCP changes that. Suddenly, AI can feel what the UI feels like and Claude Code can have an experience inseparable from your user. It can click. It can listen for events. It can watch the DOM mutate in real time. It knows if something didn’t animate the way it expected. It can refresh the page, wait a second, and try again. This isn’t just “AI that debugs.” It’s AI that uses your app. And that’s an entirely different thing as the AI is moving closer to simulating an actual user experience.

A New Kind of QA: AI That Actually Does Exploratory Testing

Traditional automated tests are deterministic. They behave the same way every time and while that’s important for certain flows, it doesn’t catch the weird stuff. Those quirky, emergent bugs that only show up when a real human accidentally double-clicks something, or tabs away mid-animation, or tries to drag something that technically wasn’t built to be draggable. Humans catch those bugs. However, Claude’s ability to play the Memory game was, in practice, a form of exploratory testing:
  • It clicked things it thought looked clickable.
  • It paid attention when something didn’t respond.
  • It tried alternatives.
  • It cross-checked the DOM.
  • It reasoned about what should have happened.
And then it patched the code. Imagine pointing Claude at your login flow, your checkout flow, your onboarding sequence, your dashboard interactions, your modals, your audio effects, your drag-and-drop UI and letting it wander around as a real user, taking notes, filing bugs, and submitting PRs. Companies already hire fleets of human testers to do exactly this. Claude doesn’t eliminate that work, but it changes the economics of it dramatically allowing for a more refined application of QA resources.

A Sneak Peek at the Future of Game Development

The Memory game also reminded us that this workflow isn’t limited to debugging, it applies equally to creativity. Claude “cheated” a little by reading the HTML to find matching planets, but it still played the game like a user. You could ask it to try variations, build alternate themes, adjust animations, or suggest UX improvements based on its own playthrough. This edges toward a world where a single indie developer could direct, design, and deploy entire game mechanics, generating assets, testing game balance, and iterating at a pace that used to require a full studio. The same way one person can now produce a stunning short film with AI assistance, one person may soon build lush, open-world interactive environments that used to require 50+ developers. Most significantly, with the development and debugging streamlined, the game developer can spend more time leveraging their creativity and improving the gaming experience. The floor rises. The ceiling rises. And the number of people who can build meaningful software products (and really fun games) expands dramatically.

The Human Role Isn't Going Away

Somewhere in this experiment, we started talking about Steven Spielberg and storyboards. Not because Claude is a filmmaker, but because the metaphor fits: the director doesn’t act or hand-animate every frame. They describe the scene, guide the tone, make taste decisions, and choose what feels right. Claude can render ten variations of a UI component the same way a storyboard artist might sketch ten versions of a shot. Humans still choose the one that serves their message. This is the shift we see coming:
  • Developers will spend less time fighting broken CSS and more time designing how the interface should feel.
  • Less time reproducing bugs and more time shaping the overall behavior of the system.
  • Less time wiring up throwaway prototypes and more time iterating on creative direction, architecture, and taste.
Just like compilers, version control, CI/CD pipelines, and cloud infrastructure did in their time.
Josh Proto
Cloud Strategist

Josh is a Cloud Strategist passionate about helping engineers and business leaders navigate how emerging technologies like AI can be skillfully used in their organizations. In his free time you'll find him rescuing pigeons with his non-profit or singing Hindustani & Nepali Classical Music.

Share This Post

Join our newsletter!

To get news on Gen AI

Development on AWS.

Don't worry, we don't spam