Don't Give It Keys

The demo looked alive.

It found the lead, drafted the email, pulled the file, updated the note, and asked for the next tool like it had been working here for years.

Then it asked for the keys.

Gmail. Stripe. The repo. The CRM.

Your hand moved toward Allow because the screen made it feel harmless. One clean button. One little permission prompt. One bright promise that the work would finally move without you.

That is the moment to get suspicious. Not because agents are useless. Useful things are exactly the ones that can hurt you when they are let into the wrong room.

The keyring is the real product.

The False Diagnosis Is Trust

The obvious diagnosis is that you need to trust the system more. You are moving too slowly. You are being old-fashioned. You are stuck in manual review while everyone else is building little digital employees that never sleep, never complain, and never ask for equity.

Sometimes your caution is just fear with a security badge. I will grant that. Some people can turn any risk into a reason to do nothing. They build a fortress around a task that should have been delegated last Tuesday.

But there is a different mistake wearing the costume of progress: giving the system access before it has earned a boundary. You are not trusting it. You are avoiding the boring work of deciding what it is allowed to touch.

Trust is not a mood. Trust is an operating design. It has rooms, locks, logs, limits, and a clean way to stop the machine when the work leaves the safe part of the map.

If those things do not exist, the agent is not trusted. It is loose.

Agency Has Weight

Google describes the new agent promise plainly: agents can understand a goal, build a multi-step plan, and take actions on your behalf, under expert guidance and oversight in its 2026 AI agent trends report. That last part is not decorative. Guidance and oversight are the price of giving software hands.

A chatbot that writes a bad paragraph creates cleanup. An agent that can send, delete, charge, invite, merge, publish, or update records creates a different class of problem. Output becomes operation. A wrong answer can become a wrong action.

This is why OWASP names excessive agency as a major LLM application risk. Its writeup points to the real trio underneath the danger: excessive functionality, excessive permissions, and excessive autonomy as the common root causes. Notice what is missing from that list. It does not say the root cause is that the demo lacked sparkle.

The danger is usually not one dramatic villain moment. It is a dull permission mistake that looked efficient at the time. The agent did not need to delete files, but the connector allowed it. It did not need to write to the live database, but the token could. It did not need to send external messages, but the workflow made that one approval click away.

The agent did not break into the room. You left the key under the mat and called it acceleration.

Speed is not the same as permission.

The Demo Brain Lies

Demos are built to make action feel smooth. That is their job. They show the agent reading the correct file, making the correct inference, and taking the correct next step while a calm cursor glides across the screen like a tiny executive assistant with perfect lighting.

Production is less elegant. Production has stale records, missing context, weird permissions, duplicate names, anxious customers, half-finished migrations, shared inboxes, and a person who once solved a problem in a Slack thread nobody can find.

The demo makes you imagine the agent using access correctly. Production tests whether the access should have existed at all.

That is the split smart builders keep missing. They evaluate the answer and forget to evaluate the permission. They ask, "Can it do the task?" before asking, "What damage can it do while trying?"

You would not give a stranger the warehouse alarm code because he carried one box correctly. You would not hand a valet the title to the car because he parked it without scratching the paint. Yet people connect an unproven agent to live systems because the first draft looked good.

That is not trust. That is intoxication with a login screen.

Least Privilege Is Mercy

Security people have had the answer for a long time. NIST defines least privilege as giving a user or process only the access needed to perform authorized tasks and no more. That principle sounds cold until you understand what it protects.

Least privilege protects the customer from your optimism. It protects the company from a workflow that got one edge case wrong. It protects the agent from being asked to carry consequences it cannot understand. It protects you from discovering, too late, that the tool you were testing had the power of a senior operator and the judgment of a very confident intern.

Read-only is not cowardice. Draft-only is not weakness. Approval before external action is not a failure of faith. These are how a serious system earns the right to move from toy to tool.

The stuck optimizer hates this because it feels inefficient. They want the full loop now. Read the inbox, decide the reply, send the message, update the CRM, schedule the follow-up, and make the graph go up while they make coffee.

Fine. Want that. Build toward that. But do not confuse the destination with the first permission prompt.

A sharp system climbs the keyring one earned permission at a time. First it observes. Then it drafts. Then it proposes. Then it acts in a small room where the wrong move is annoying instead of catastrophic. Only after the logs prove stability does it get a larger key.

Let the machine earn the next key.

The Keyring Test

Before you connect a tool, slow down for the part that feels too obvious to write down. The keyring test is not about whether the agent seems smart. It is about whether the permission matches the proof.

What does it need to see? Read access is not the same as write access, and both are different from send, delete, charge, merge, invite, or publish.
What is the smallest object it can touch? One folder, one label, one branch, one draft queue, one customer segment, one test account. Small rooms make failure legible.
What action must always pause? If the work leaves the company, changes money, exposes private data, or alters a live account, the system should have to raise its hand.
Where does the evidence go? A useful agent leaves a trail. What it saw, what it chose, what it changed, and where it stopped should not vanish inside a cheerful completion message.
Who can take the key back? Revocation is part of design. If the answer is "we will figure that out if something goes wrong," the system is not ready for the key.

These questions are not fancy. That is why they work. Fancy questions let you admire the architecture. Plain questions force the access to defend itself.

If the permission cannot defend itself, the answer is not better vibes. The answer is a smaller key.

Build the Smaller Room

The smaller room is where useful automation becomes possible. It is not a sandbox in the childish sense. It is a proving ground. A place where the agent can touch real work without being allowed to burn the building down while learning where the doors are.

For an email agent, the smaller room might be draft creation with no send permission. For a finance agent, it might be invoice matching with no payment authority. For a coding agent, it might be a branch and a pull request, not direct write access to production. For a sales agent, it might be CRM notes and suggested next steps, not unsupervised outreach to every lead you have ever collected.

This is not bureaucracy. It is design. CISA's Secure by Design work argues that security should be treated as a core requirement from the beginning, not an afterthought bolted on after the product is already in customers' hands when the mess is expensive. Agent workflows deserve the same standard.

The point is not to make the agent timid. The point is to make the consequences legible. If the system fails, you should know where it failed, what it touched, what it could not touch, and what lesson now moves into the workflow.

Without that smaller room, every mistake becomes a mystery tour through whatever access you handed out in a generous mood.

The Stop Rule Matters More Than the Prompt

Prompts get the attention because prompts are visible. They feel like the craft. Better instruction, better tone, better examples, better role, better output. All useful. None of it replaces a stop rule.

A stop rule tells the agent when the work must return to a human. Not when the agent feels nervous. It does not feel nervous. That is part of the problem. The stop rule is external, explicit, and dull enough to work on a bad day.

Stop when money moves. Stop when private data leaves. Stop when the next action changes a customer's account. Stop when the source is missing. Stop when confidence is low. Stop when the request crosses legal, medical, financial, hiring, firing, security, or reputation risk. Stop when the action cannot be reversed cheaply.

That list is not a cage. It is a steering wheel.

The weak system tries to make the agent brave. The strong system makes it precise. Precision is what lets you remove yourself without pretending the world has become safe.

If the older problem is that your agents have no manager, I have written about that here. This problem is narrower and sharper: the manager cannot just define the job. The manager has to hold the keys.

Fast Without a Fence Is Just Fast

The market is going to keep rewarding agent demos that look frictionless. That is fine. Frictionless sells. But frictionless is not the same as safe, and safe is not the opposite of fast.

The fastest serious operators I know do not move fast because nothing can go wrong. They move fast because the wrong thing has somewhere small to land. The branch can be reverted. The draft can be killed. The invoice can be reviewed. The customer message waits for approval. The tool logs what happened. The system narrows the mess before it multiplies.

That is the kind of speed worth wanting. Not a loaded cart at the top of the stairs. A track. A brake. A signal. A rule for who gets the next key and why.

The insecure builder hears this as slowness because they want the agent to rescue them from management. The serious builder hears it as leverage because they know the oldest rule of power: the thing that can act on your behalf must not inherit your whole kingdom on day one.

Give it a room. Give it a job. Give it a log. Give it a stop rule.

Then watch what it does.

If it handles the room, widen the room. If it earns the key, hand over the next one. If it fails, be grateful the lock held.

That is how the tool becomes trust. Not because you believed harder, but because you made the permission smaller than the mistake.

Don't give it keys.

Make it earn them.

The False Diagnosis Is Trust

Agency Has Weight

The Demo Brain Lies

Least Privilege Is Mercy

The Keyring Test

Build the Smaller Room

The Stop Rule Matters More Than the Prompt

Fast Without a Fence Is Just Fast

Bring one decision. Leave with a verdict.

You Might Also Like

Decide Before You Delegate

The Wrong Risk