I renamed a skill last Tuesday. Nothing exploded. No error messages. Everything looked fine.
Three days later, a command that used to pull newsletter data from Kit.com silently returned nothing. An agent that once queried my calendar just… skipped that section. A voice profile reference in my writing assistant pointed to a file that had moved weeks ago.
My Claude Code setup had 119 skills, 47 agents, 71 commands, and 17 MCP servers. They all reference each other — by name, by file path, by MCP tool identifier. And when one thing changes, nothing tells the rest.
The Silent Drift Problem
If you’ve been building out your Claude Code configuration beyond the basics — adding custom skills, writing agents, creating slash commands — you’ve probably hit a version of this. Something that worked last week doesn’t work this week, and there’s no obvious reason why.
The root cause is that Claude Code’s configuration layer is a web of text files that reference each other by convention, not by contract. There’s no type system. No import resolver. No broken link checker. A skill can reference /my-other-skill and Claude Code won’t complain if that skill was archived yesterday. It’ll just quietly fail to find it at runtime.
This is fine when you have five skills and two agents. It becomes a maintenance problem somewhere around thirty components. And it becomes genuinely disorienting past a hundred — because you can no longer hold the dependency graph in your head, and failures are silent.
What I Found When I Actually Looked
I built a diagnostic skill — ecosystem-health — that runs seven checks across everything in ~/.claude/. The first time I ran it against my own setup, the results were uncomfortable.
17 critical issues. Not warnings. Critical.
Here’s what was hiding:
-
Broken vault paths — My newsletter writing agent referenced
05 Library/VOICE.mdfor voice matching. The actual file had moved to05 Library/Voice/VOICE.mdmonths ago. The agent was generating content without voice guidance and I hadn’t noticed. -
Phantom MCP servers — Seven skills and agents referenced MCP tools that weren’t configured anymore.
mcp__macos-calendar__*,mcp__kit__*,mcp__pickaxe__*— all replaced by CLI tools weeks earlier as a cost optimization. The old MCP references were still scattered across the codebase, silently failing. -
Wrong skill names — Two active components referenced
log-to-daily-note. The actual skill is calledlog-to-daily. One character difference, complete failure to invoke.
Beyond the critical issues: 55 configuration drift warnings where skills used MCP tools instead of their mandated CLI replacements. Not broken, but burning tokens unnecessarily on every invocation.
The Seven Checks
The skill runs seven diagnostic passes, each looking for a different category of drift:
1. Vault Path Validation — Scans every skill, agent, and command for hardcoded file paths, then checks if those paths actually exist on disk. Catches moved, renamed, or deleted files that are still referenced.
2. Skill Cross-References — Finds every place a skill is referenced by name (in other skills, agents, commands, hooks) and verifies the referenced skill actually exists. Catches renames and archives that left stale references.
3. MCP Server Health — Extracts all mcp__SERVERNAME__toolname patterns from the codebase, then checks whether each server is actually configured. Finds phantom tools — MCP references pointing to servers that were removed or never existed.
4. CLI Tool Availability — Verifies that CLI tools referenced in your configuration are actually installed and responding. Catches tools that were removed, renamed, or never installed.
5. Configuration Drift — Checks for policy violations. In my case, I have a documented policy that certain CLI tools should be used instead of their MCP equivalents. This check finds violations — places where the old MCP pattern persists despite the policy.
6. Staleness Detection — Finds skills and agents that haven’t been modified in 90+ days. Cross-references with usage to distinguish “stable and actively used” from “forgotten and probably outdated.”
7. Orphan Detection — Builds a reference map of all skills, then identifies non-invocable skills that nothing references — dead code that’s a candidate for archiving.
What Makes This Different From Grepping
You might be thinking: “I could just grep for broken references.” You could. I did, for months. Here’s why a structured approach works better:
Context-aware classification. A reference to mcp__kit__* inside the ecosystem-health skill itself isn’t a violation — it’s a detection pattern. Inside a morning briefing skill, it’s a policy violation. Grep can’t distinguish these.
Severity ratings. A broken vault path in an active daily-use agent is critical. A stale skill that nothing references is informational. The report prioritizes what actually matters.
Cross-check correlation. The MCP server check needs to read your .claude.json configuration, extract server names from multiple nesting levels (global and per-project), and compare against references in your skill files. That’s a multi-step validation that doesn’t reduce to a single grep.
The .claude.json problem. In a complex setup, this file can exceed 40,000 tokens. Claude Code’s Read tool truncates it, causing false positives — servers that look missing because their configuration was in the truncated portion. The skill uses jq to extract server names regardless of file size. That’s a lesson I learned the hard way after chasing phantom issues that weren’t real.
Using It
Installation is straightforward — copy or symlink a single markdown file:
mkdir -p ~/.claude/skills/ecosystem-health
# Either copy directly:
cp SKILL.md ~/.claude/skills/ecosystem-health/SKILL.md
# Or clone and symlink:
git clone https://github.com/aplaceforallmystuff/claude-ecosystem-health.git
ln -s ~/Dev/claude-ecosystem-health/SKILL.md ~/.claude/skills/ecosystem-health/SKILL.md
Then customize the checks for your environment — your vault paths, your CLI tools, your policies. The skill file itself documents what to change.
Run it:
/ecosystem-health # Full sweep
/ecosystem-health --quick # Checks 1-5 only (for weekly use)
/ecosystem-health --check vault-paths # Single targeted check
The output is a structured markdown report with a summary table, severity-classified findings, affected file paths with line numbers, and remediation pointers. It doesn’t fix anything — it’s read-only by design. You decide what to act on.
The Broader Pattern
Building this diagnostic taught me something I should have realized earlier: AI tooling has a maintenance burden, and nobody talks about it.
We talk about building agents. We talk about prompt engineering. We talk about MCP servers and tool use and agentic workflows. We don’t talk much about what happens six months later when you’ve accumulated dozens of interconnected components and one rename cascades into silent failures across your entire setup.
This isn’t unique to Claude Code. Any system where configuration lives in text files that reference each other by convention — Obsidian vaults, n8n workflows, Home Assistant automations — develops this kind of drift over time. The difference is that most of those systems have some form of link validation or dependency tracking built in. Claude Code’s skill/agent/command layer currently doesn’t.
Until it does, something like ecosystem-health fills the gap. I run --quick weekly as part of my review process and do a full sweep monthly. It catches things I’d never notice manually — and it caught 17 things on day one that I’d been living with for weeks.
The repo is at github.com/aplaceforallmystuff/claude-ecosystem-health. MIT licensed. If you’ve got a Claude Code setup with more than a handful of custom components, it might be worth a run.
Have questions about maintaining complex Claude Code setups? I write about AI tooling, automation, and working with AI systems at Signal Over Noise.