- AI Business Insights
- Posts
- I left Claude alone with my Mac. Here’s what it actually did.
I left Claude alone with my Mac. Here’s what it actually did.
Not a demo. It works.
Picture this: you text a task to your computer from your phone, walk out to grab coffee, and come back to find the work already done. No script. No Zapier flow. No Python bot you spent two weekends building. Just Claude, quietly working your Mac while you were out.
That’s not a hypothetical. A Redditor named u/Popular-Help5516 made it happen this week. They got access to Claude’s new Computer Use feature — a research preview for Pro and Max plan users on Mac — and then did something most people skip: they tested it on actual work instead of just recording a demo.
What they found is worth paying attention to.
How Jennifer Aniston’s LolaVie brand grew sales 40% with CTV ads
The DTC beauty category is crowded. To break through, Jennifer Aniston’s brand LolaVie, worked with Roku Ads Manager to easily set up, test, and optimize CTV ad creatives. The campaign helped drive a big lift in sales and customer growth, helping LolaVie break through in the crowded beauty category.
What Computer Use actually is
Most AI tools help you think. This one helps you do.
Claude takes screenshots of your screen, figures out what’s on it, then physically moves your mouse and types on your keyboard. Like a person sitting at your desk. One who doesn’t zone out or check their phone every 10 minutes.
The author put it well: this is the shift from “thing that talks” to “thing that does.” If you’ve been watching AI agents get hyped for two years and wondering when something real would actually ship — this is the first version of that thing. Early, clunky, not fully reliable. But real.
Why this matters more than the usual AI news
Computer Use isn’t a chatbot feature. It’s not a plugin or an API call. It’s Claude operating your actual machine, navigating real software, clicking real buttons.
That’s a different category of capability. And the early test results give a surprisingly honest picture of where it actually stands right now — not the polished demo version, but the “threw it at my real work” version.
The honest answer: it works. Not perfectly. But well enough to already be useful for the right tasks. For people who handle repetitive computer work every day, that’s not a small thing.
Here’s what held up under real conditions:
File management — one instruction to rename and sort 40-plus files in a Downloads folder. About five minutes. Every single file handled correctly. The kind of job most people either avoid or do in small painful batches.
Spreadsheet data entry — Claude pulled numbers from a PDF and entered them row by row into a Numbers spreadsheet. Slow, but accurate. No cleanup needed after. The author noted it was careful enough to double-check its own work on a few entries.
Browser form filling — the same web form, filled out 8 times with different data. One date format mistake, fixed with a single follow-up message.
Research compilation — 5 browser tabs, key info pulled from each, compiled into a text doc. No hand-holding required.
Things that needed more babysitting: workflows that jumped between multiple apps (sometimes loses track of which window it’s in), and longer sequences of 20-plus steps where it once failed silently at step 15.
Want to get the most out of ChatGPT?
ChatGPT is a superpower if you know how to use it correctly.
Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.
Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.
*Ad
Things that don’t work yet: anything needing speed, captchas, 2FA, login screens, and complex drag-and-drop interactions.
Tips and tricks from day one testing
A few non-obvious things the original poster figured out:
The best workflow is “start it and leave.” Claude takes over your whole machine while it works — you can’t use your Mac in parallel. So the optimal setup is: hand it a clear task, go do something else, come back to finished work. The author combined it with a phone remote app and was texting tasks from the other room while Claude worked the desktop. That’s the workflow to aim for.
Keep tasks short and specific. Reliability sits around 80% on simple tasks (think: single app, fewer than 10 steps) and drops to roughly 50% on complex ones. Longer workflows can fail silently, which is the dangerous kind of failure. Break bigger jobs into smaller, checkable pieces.
Some things stay off-limits for now. No captchas. No login screens. And the author is clear: don’t let it anywhere near anything with real consequences. Sending emails, making purchases, anything where a mis-click has a cost. That’s still yours to handle.
Speed-sensitive tasks are a bad fit. Two to five seconds per click sounds fine until you’re watching it fill out a 30-field form. Think of this as the careful, thorough worker — not the fast one.
The original poster published a longer breakdown of everything they tested — and the comment thread in r/PromptEngineering has other early testers sharing what they’ve tried.
If you’re on a Pro or Max plan with a Mac, this is worth experimenting with right now. The direction is clear. Getting hands-on experience before everyone else is the move.
Attio is the AI CRM for modern teams.
Connect your email and calendar and Attio instantly builds your CRM. Every contact, every company, every conversation — organized in one place. Then ask it anything. No more digging, no more data entry. Just answers.
*Ad



