AI can do that too? If you have a web app it can use playwright to test function...

Wowfunhappy · 2025-12-16T23:35:03 1765928103

Yeah, but it doesn't work nearly as well. The AI frequently misinterprets what it sees. And it isn't as good at actually using the website (or app, or piece of hardware, etc) as a human would.

UncleEntity · 2025-12-17T01:40:17 1765935617

I've been using Claude to implement an ISO specification and I have to keep telling it we're not interested if the repl is correct but that the test suite is ensuring the implementation is correctly following the spec. But when we're tracking down why a test is failing then it'll go to town using the repl to narrow down out what code path is causing the issue. The only reason there's even is a repl at this point is so it can do its 'spray and pray' debugging outside the code and Claude constantly tried to use it to debug issues so I gave in and had it write a pretty basic one.

Horses for courses, I suppose. Back in the day, when I wanted to play with some C(++) library, I'd quite often write a Python C-API extension so I could do the same thing using Python's repl.

Capricorn2481 · 2025-12-17T01:53:48 1765936428

But then the AI would theoretically have to write the playwright code. How does it verify it's getting the right page to begin with?

simonw · 2025-12-17T02:14:43 1765937683

The recent models are pretty great at this. They read the source code for e.g. a Python web application and use that to derive what the URLs should be. Then they fire up a localhost development server and write Playwright scripts to interact with those pages at the predicted URLs.

The vision models (Claude Opus 4.5, Gemini 3 Pro, GPT-5.2) can even take screenshots via Playwright and then "look at them" with their vision capabilities.

It's a lot of fun to watch. You can tell them to run Playwright not in headless mode at which point a Chrome window will pop up on your computer and you can see them interact with the site via it.