
tl;dr
Are you a builder? Probably if you’re here. Either way, we’re all getting carried along in several recent trends. As we race to automate everything with shiny new AI tools, we’re also stacking up fragile workflows and hidden dependencies, setting ourselves up for a quiet kind of chaos I’m calling Agent Rot. It’s like Link Rot, but for smart systems. The short answer is to just pay attention to this. For more, here’s why it’s happening, what could go wrong, and what we can do to keep our bots from forgetting what they were built to do.
Agent Rot will put Link Rot to shame. It’s not quite here yet, but I’m giving it a name. Because for now our burgeoning bright, shiny, new AI/GPT automation tools still have that “Fresh New Workflow Scent!” But what challenges will we face when they start to get creaky?
Remember Link Rot? When hyperlinks stop working over time because the linked content has been moved, deleted, or the domain is no longer active? It’s sad because when you click something, you have expectations about a pay off. You got interested, and burned a whole thousandth of a calorie offering up a click. For nothing. How rude. How have you overcome this? Probably just moved on, maybe did a new search.
Now consider when Agentic workflows start breaking. A click is Intentional. Volitional. You made it in realtime. If it fails, you move on. But workflows? Most run on automated triggers. What’s going to happen, (or not happen), when the brittle pipes unifying systems burst or get shut off at the spigot. I’m not saying that “API Apocalypse” is coming to a theater near you tomorrow. Just that there’s likely increasing risk here. These aren’t necessarily just like any app with dependencies. These things are often fundamentally built with widely disparate pieces.
Why Is This Becoming a More Serious Risk?
Because we’re playing Dependency Dominos and don’t know it. Automation fragility has always been a thing. Automations often rely on multiple APIs, third parties, and dynamic data; all of which can change, throttle, break, or be deprecated silently. And Silent failure modes are dangerous. A busted workflow might quietly just stop working. Or maybe worse, keep running and doing the wrong thing.
Is Agent Rot a serious risk or am I just being paranoid? I can’t know for sure. As I write this in mid-2025, I think it’s fair to predict that within the next year or three we’ll see a variety of things breaking to some degree. Sometimes it may be funny. Other times annoying. Ideally not tragic. I just get this sense of risk as I’ve wired up my own workflows using two, three, or more dependencies. And I see the zeal with which many of my fellow monkeys are playing with their newly discovered fire and proudly sending their vibey flows into the world.

Another reason is because we have everyone and their extended family with minimal experience using power tools without having read the manual or gotten any real training. We’re wiring up workflows to automate anything we can get our filthy little mice on. Now think about the following… Think about your own experience with medical scheduling and billing. Think about the incorrect shipments you’ve gotten at your door. Think about how many times you’ve talked with customer service from AnyLargeCo and the agent can’t figure out your issue, OR does understand, yet has no power to press any buttons to solve it. Think about how as a businessperson you’ve read that chatbots save companies money, but as a consumer have had maddening experience with them.
These are often larger companies, supposedly with competent business, marketing, product and software teams. So why do we get such letters or robocalls that have things so wrong? Sometimes we can understand. Even if you’ve never heard the term “edge case,” we all understand that there’s sometimes a completely unanticipated thing going on. But even though most things generally work ok, our collective experience seems to remain: A whole lot of business process remains all but completely FUBAR. There might be additional irony that there’s a cottage industry of startups that try to solve some of these single issues, (medical billing comes to mind). But they struggle to get traction, often fighting and failing in the face of systemic dysfunction. Fine. We all get it and weave our way though as best we can. And sometimes as entrepreneurs some of us will try to take on one of these legit problems.
So What About Now?
It’s no longer just professional teams wiring systems together. It’s anyone scrambling to get their new digital goodies out as quickly as possible. Some of this might end up making things better. Others? Maybe not so much. Ever see the movie Armageddon? At one point, Steve Buscemi’s character, Rockhound, jokes just before his spacecraft launch: “You know, we’re sitting on four million pounds of fuel, one nuclear weapon and a thing that has 270,000 moving parts built by the lowest bidder.” (Here’s the clip.)
That’s happening now. Lot’s of moving parts being built by people with minimal skill. This is arguably a good thing because enabling technology is great! At the same time, there may be some hidden gotcha’s coming up for which we should be on the lookout.
Efficient, Yet Brittle
We all kind of knew this implicitly beforehand. But if there’s one thing COVID-19 taught us more explicitly, it’s that even as we’ve become hyper-efficient as a society, we’ve also gotten brittle.
You want your toast more efficiently? Great. Here’s a toaster that texts you when done. But if the notification server goes down or the firmware updates overnight, your sourdough is charcoal, and you’re left explaining to your toddler why breakfast smells like you cooked it the way daddy used to make it by hand. That’s the world we’re building: optimized for speed and connectivity whether it’s really needed or not, fragile under strain, and full of toaster firmware bugs.
I think some of our digital systems are getting to be a bit Rube Goldberg like, just with better design. They’ll perform dazzling tricks, but if one link slips, there goes your KPI dashboard, your customer onboarding flow, and possibly your job. And we keep wiring them up. Because it’s cool. Because it’s the future. Because someone on LinkedIn got 10,000 likes for saying they automated lunch and 1,000 others of us watched the video on how to do it. And 100 of us went and did it too. (That’s kind of Power Law or Zipf’s law math.)
My Old Lesson On Dangers of Unknown Dependency Breakage
Years ago I led a large effort in pay-per-click advertising; eventually a 9 figure exit product. One day, I was reviewing a referrer report a week before a design update. Something was strange about patterns of large referrers. It turned out unknown to us there were agencies managing services for their… (and our), clients who had written code to automate updates into our interface. (We didn’t have APIs yet.) There was strategic question as to whether or not to allow this, but that was a separate issue. The immediate reality was if we changed our system on schedule, high dollar pipelines would break. We were able to work this out, but if we hadn’t noticed the dependency and switched over? These unknown hidden automations would have broken.
That was simple. What happens if an agentic workflow has a lot of steps building on prior steps and earlier steps are wrong or have other issues? Do we get compounded problems?
What Can We Do As Builders?
First: document. I realize it’s less popular than ever. People are even pushing back at Wikis and fuller user stories. You’re don’t have to write the World’s Greatest Requirement. Just enough so someone, (possibly future you), can understand why the thing you made sends customer emails through your cousin’s Slack bot.
Second: reduce dependency layers. Do you need six third-party tools to send a thank-you note? Pick stable services and know their deprecation policies. If you’re running critical ops on a plugin built by an overseas teenager during summer break, maybe have a backup plan. Beware that if any tool in your agent workflow is compromised, you’re entire workflow is potentially vulnerable. Zero click attacks mean you likely won’t even know about a breech unless you’re paying attention to security alerts about your various bits and pieces. Also consider privacy layer needs. Agents sometimes are automatically given access to information that we’d otherwise generally keep very secure. Make sure you’re not allowing an end run around other security measures that can expose personal data you don’t want automatically and likely irretrievably sucked up into others’ databases.
Third: build graceful failure modes. If something breaks, at least let the system say “Oops, I’m broken” instead of silently doing or not doing its thing. Use error checks and alerts. Use version control. Add health checks and fallbacks. Don’t assume your webhook will always work just because it did once during a full moon on a Tuesday.
Forth: Learn the tools. Don’t just get it to barely work. Watch tutorial videos. Read docs. Know at least a little bit about what you’re doing.
What Can We Do As Clients, Users, Consumers?
Ask questions. Like: “Hey, what happens if the AI that routes my info goes down during a power outage?” Ask vendors what observability tools or failover plans they provide.
Consider resisting the urge to automate every single thing. Yes, it’s fun to make your fridge talk to your calendar and tell your smart speaker to order oat milk when it runs low. But sometimes, a sticky note on the fridge works fine and won’t crash if the WiFi burps.
Agent Rot isn’t the end of the world. But it’s going to creep up slowly. All I’m suggesting is to not automate your whole life with duct tape and optimism. The whole situation reminds me of the quote from Hemingway’s The Sun Also Rises:
“How did you go bankrupt?”
“Two ways. Gradually, then suddenly.”
Let’s try to avoid this with our new helpers. “Perfect may be the enemy of the good” as they say, but “Your product sucks, I want my money back” is something you don’t want to hear anyone say. Though that’s less worse than, “Mr. Agent Owner? Yeah, your SmartThing hurt some people fairly badly. You’re under arrest for assault because your SmartThing did something really stupid dangerous.”
See Also…
Slack’s 10‑Hour API Outage (Feb. 2025)
Google Cloud’s Major API Management Crash (Jun. 2025)
$500M SAP Automation Meltdown (April 2025)
97% Of n8n Automation Workflow Fails (and how to fix)
Critical flaw in Microsoft Copilot could have allowed zero-click attack (June, 2025)