John Saigle

The only good bug is a dead bug.

Browser Use is Solving the Wrong Problem

The browser has been the everything app for a long time, and much hype surrounds the idea of automating browser use with LLMs. But in the age of AI agents, the CLI + LLM combo could be the superior choice for getting anything done.


Anthropic announced their new Piloting Claude for Chrome feature today, unleashing Claude into the wild web where it will click, scroll, and type its way through the digital maze we’ve built around our simplest tasks.

Browser automation leaves me cold. Every browser tab feels like a small defeat, a task that I couldn’t solve in a simple, repeatable way. Where possible, I will always reach for the command-line first so that I can make the most out of my programming skills and the speed that comes with adroit keyboard navigation.

That said, that’s not most people’s experience, so the wider appeal of browser automation is obvious. And unfortunately there’s not a good CLI-based solution for most “real life”, non-coding domains: filing taxes, ordering groceries, booking appointments, managing subscriptions.

These tasks are trapped behind interfaces designed for human eyeballs, which makes their execution extremely inefficient.

Consider what we’re actually asking LLMs to do. Navigate to TurboTax, wait for the page to load, dismiss the promotional popup, click through the cookie consent dialog, find the right form field among dozens of others, enter data, wait for validation, handle inevitable JavaScript errors, and repeat this dance hundreds of times.

Then, of course, are the massive security implications of giving AI agents access to our browsers.

All this to say: we’re solving the wrong problem with the wrong tool.

Instead of teaching AIs to navigate human interfaces, what if we built machine-readable interfaces for the tasks that matter?

Text-based interfaces play to LLM strengths rather than their weaknesses. Instead of parsing visual layouts and guessing click targets, they could process structured data and generate precise commands. An AI that struggles to find the “Submit” button on a cluttered webpage could effortlessly compose complex command pipelines.

Consider the UNIX philosophy applied to daily life. grocery-order --store=whole-foods --list=weekly.txt --delivery="Tuesday 6pm". Recurring deliveries could be done with a cronjob. Workflows could be shared with family and friends: along with recommending a recipe, you could share the script that places a delivery order for all of the ingredients. No stock photos of produce, no upselling widgets, no loading spinners. The same could be done for filing taxes, booking appointments, and so on.

This could do a lot to educate people about programming and help them to feel more empowered too. Instead of losing all of one’s know-how each time a GUI undergoes a rework, scripting knowledge could accumulate and compound. The whole consumer world could be converted into script-kiddies, and some of them would graduate into full-fledged hackers.

Given that innumerable person-hours and lines of code have been devoted to making the web functional for commerce, I don’t know if any of this will come to pass. It would mean a huge shift in how people think about using their computers. For companies it might be more difficult to deploy the dark arts of advertising and surveillance outside of the browser context, and this alone might ensure that the CLI-based future does not come to pass.

But as for sheer efficiency, composability, and ease-of-use, I think there’s an excellent argument to be made for shifting the paradigm.