1 — The Problem
2 — Constraints
3 — Architecture
4 — Two Tool Types
5 — MCP Deep Dive
6 — Skills
7 — Production
Step 0 / 5
The Problem
Every Enterprise is a Snowflake
Same agent. Identical code.
Customer A Customer B Customer C
You ship it. It works beautifully — for your test environment. Then every customer connects completely different tools .
Monitoring Ticketing Knowledge Comms Code/CI
Ops Datadog PagerDuty Confluence Slack —
Support — Zendesk Notion Intercom —
Eng Splunk Jira Confluence Slack GitHub
Sales — Salesforce Notion Outlook —
Every row is different. Every column is different.
5–15 SaaS tools per team, wired differently at every company.
Hardcoded Agent
✕ ✕ ✕ API deprecated Auth expired
N tools × M tenants × P auth = explosion
Every new vendor = months of integration work
How do you build ONE agent that works across ALL tenants?
Discovered, not declared
Tools found at runtime per tenant
Delegated, not embedded
Auth handled by platform
Composable, not monolithic
Domain logic adapts to tools
Production Agent for Enterprise Tenants
Skills
Reusable domain logic that orchestrates tools without vendor coupling
Native Tools
Platform capabilities every tenant gets for free
MCP
A protocol that lets the agent discover & invoke any tool at runtime
Three architectural ideas. Let's build it up.
Why is This Actually Hard?
The Four Constraints Nobody Warns You About
Agent
😊
API
DB
LLM
Demo Day ✓
10,000 tenants
all different
😰
At a hackathon: one user, one set of tools, one API key in an .env file.That agent does not work for 10,000 paying customers.
CONSTRAINT 1
Tenant Isolation
Same Agent Code
TENANT A
Data, credentials,
tool connections,
user permissions
Scoped API responses
TENANT B
Different data,
different credentials,
different permissions
Scoped API responses
ISOLATION WALL
Customer A's agent must never see Customer B's data. One leaked API response across tenants = game over .
CONSTRAINT 2
Tools Unknown Until Runtime
Tenant A — Full Shelf
Datadog
PagerDuty
Slack
Jira
Confluence
Tenant B — Sparse
Splunk
OpsGenie
Tenant C — Empty
∅
Nothing connected yet
tools = ["datadog",
"pagerduty",
"splunk"]
Hardcoded = broken on Day 1
CONSTRAINT 3
Delegated Authentication
Agent (Turn 4)
tool_call()
🔒
401 Unauthorized
GRACEFUL DEGRADATION
✓ Note the gap
✓ Tell the user
✓ Continue with what's available
✕ Don't crash, hallucinate, or pretend
OAuth tokens expire mid-conversation. Some tools need re-consent.The agent doesn't own credentials. The user does.
CONSTRAINT 4
Partial Failure is the Norm
5 Parallel Tool Calls
✓ Alerts fetched 200 OK — 12 results
✓ Logs fetched 200 OK — 48 results
⚠ Partial data 200 OK — empty body
✕ Timeout 30s — no response
✕ 500 Error Internal server error
Best Available Answer
Not "Error: please try again"
Your Agent
TENANT ISOLATION
HETEROGENEOUS TOOLS
DELEGATED AUTH
PARTIAL FAILURE
You must design within these walls — not pretend they don't exist.
Now that we know the constraints — let's look at the architecture that satisfies all four .
Three layers. Three ideas. One protocol that ties them together.
Orchestrator
→
Skills
→
Tools
The Architecture
AI-Powered Ops Investigation Assistant
INTERACTION SURFACE
Enterprise User
→
Chat / Product UI
CONVERSATION RUNTIME
Intent + Planning Parse & plan actions
→
Agent Orchestrator Coordinate agents
→
Response Synthesis Format & deliver
Capability Registry Tool metadata, skill definitions, schemas, routing, guardrails
SKILL EXECUTION ENGINE
1
Skill Selection Route to best skill
→
2
Execution Loop Execute & iterate
→
Evidence Accumulation Gather & synthesize findings
TOOL GATEWAY
Native Platform Tools platform-owned, zero-auth
Alerts, Incidents, Service Dependencies, Knowledge Retrieval, Content Reads
Tenant-Connected MCP Apps 3P, runtime discovery
Observability App A Observability App B Other Apps
Scoped per tenant + user, discovered at runtime
ENTERPRISE TENANT CONTEXT
Historical Incident + Alert Past patterns & resolutions
Product / Service Data Topology, configs, catalogs
Connected App Data Integrated 3P service data
feedback — evidence returns to user
Two Types of Tools, One Skill
The Skill Doesn't Care Where the Data Comes From
"Why is checkout-service throwing 5xx errors?"
SKILL EXECUTION LOOP
1
Get alert context
What fired? When? What service?
Native
2
Get service dependencies
What's upstream/downstream?
Native
3
Get metrics
Is latency/error rate spiking?
MCP App
4
Get logs
What errors are appearing?
MCP App
5
Get recent changes
Any deploys in the window?
Native
6
Synthesize
What does the evidence say?
Output
= Native tool (always available)
= MCP app (varies per tenant)
The skill treats them identically.
The skill has a plan — an evidence-gathering loop. Some steps hit native tools. Some hit MCP apps.The skill doesn't distinguish.
NATIVE TOOLS — ALWAYS AVAILABLE
Platform-owned. Always authenticated. Your guaranteed baseline.
Project Tracker
get-issue-details
find-linked-issues
get-change-history
Service Management
get-alert-context
find-similar-incidents
get-service-topology
Knowledge Base
search-runbooks
get-resolution-guides
find-past-postmortems
Even with zero 3P tools connected, the agent can still reason.
MCP APPS — VARIES PER TENANT
Third-party. Discovered at runtime. Scoped per tenant.
New Relic
query-nrql
get-golden-signals
get-error-groups
Datadog
search-metrics
get-log-patterns
list-deployment-events
Splunk
run-search-query
get-detector-incidents
Sentry, Dynatrace, ...
varies...
The skill says "I need metrics" — the gateway resolves to whatever's connected.
NATIVE EVIDENCE
✓ Alert: checkout-service 5xx spike at 14:32
✓ Topology: depends on payments-db, auth-svc
✓ Change: deploy v2.4.1 at 14:28 by @eng-team
✓ Runbook: "5xx on checkout" → check DB pool
MCP EVIDENCE
✓ Metrics: p99 latency 4200ms (was 180ms)
✓ Logs: "connection pool exhausted" ×847
✓ Traces: 92% of slow spans in payments-db
SYNTHESIZED ANSWER
Root cause: DB connection pool exhaustion
Deploy v2.4.1 introduced a connection leak in the payments-db
adapter. p99 latency spiked 23× within 4 minutes of deploy.
847 "pool exhausted" errors. 92% of slow spans in payments-db.
Recommendation: Roll back v2.4.1 or increase pool_max_size
Runbook: "5xx on checkout" → matches historical pattern from Q2
Confidence: 94%
Neither source alone was enough. Together, a complete picture.
Native tools gave context (what changed, what's connected). MCP apps gave signals (metrics, logs, traces). The skill fused them into a grounded diagnosis .
MCP Deep Dive
How Does the Agent Talk to Tools It's Never Seen?
WITHOUT MCP
Agent
custom API client
custom auth flow
custom response parsing
custom error handling
Per vendor. Per version. Per tenant.
N integrations = N × complexity
WITH MCP
Agent
Gateway MCP
New Relic
Datadog
Splunk
Any...
Agent: "What tools do you have?" → tools/list
Agent: "Run this with these args" → tools/call
Provider: "Here's the result" → structured response
Two operations. Every provider speaks the same language.
Three Hops: Agent → Gateway → Provider
AGENT RUNTIME
"I need tools for this tenant"
Carries:
• tenant ID
• user token
• provider identifier
HOP 1
Streamable HTTP
INTEGRATION GATEWAY
Resolves: Tenant X has New Relic
→ route to New Relic's MCP server
Handles:
• Auth exchange (user's OAuth)
• Connection lifecycle
• Pooling, caching, expiry
The heavy lifter. One endpoint,
multiplexed across all providers.
HOP 2
MCP Protocol
PROVIDER MCP SERVERS
New Relic
Datadog
Splunk
Sentry (greyed)
→ Provider API
→ Provider API
→ Provider API
HOP 3: Each implements MCP spec.
Returns structured observations.
The agent never talks directly to Splunk or Datadog . It talks to the gateway, which multiplexes across all connected providers.
Inside the Gateway: Four Hard Problems Solved
1
Discovery "What's available for this tenant?"
listTools(tenantId, userId) → ["query-nrql", "get-golden-signals", ...]
Different tenant? Different list. Empty? Skill handles the gap.
2
Routing "Which server handles this tool?"
tenant X + provider Y → server Z
One endpoint, multiplexed by tenant-scoped resource identifier
3
Auth Delegation "Whose credentials?"
Forwards user's auth context — not platform credentials
Expired? → structured auth error → "Re-authenticate New Relic"
4
Tool Wrapping "Make MCP tools LLM-invocable"
JSON Schema → LLM function. Read-only → auto-execute.
Write tools → user confirmation. Names normalized for LLM.
Four problems. Four modules. This is the checklist for building multi-tenant tool access.
When Things Break: Graceful Degradation in Practice
Gateway
routing 4 providers
✓ New Relic 200 OK
✓ Datadog 200 OK
✕ Splunk Timeout 30s
✕ PagerDuty 401 Auth
Agent Output (with gaps noted):
✓ Metrics from New Relic: p99 latency 4200ms, error rate 23%
✓ Logs from Datadog: "connection pool exhausted" ×847
⚠ Splunk logs unavailable (provider timeout). Analysis based on available sources.
⚠ PagerDuty: Re-authenticate to access on-call data.
Confidence: Medium (2/4 sources)
Was High — dropped due to missing sources
The system doesn't crash. The output notes the gaps . Confidence adjusts. An answer is still produced.
Skills
The Strategy Layer Between Prompts and Tools
Where Skills Fit
PROMPT → too vague
"You are a helpful assistant"
SKILL → just right
"Investigate an incident end-to-end"
TOOL → too atomic
"get-metrics(service, window)"
A skill defines:
What evidence to gather (not which API to call)
Order — sequential phases, or parallel fan-out
Gap handling — what to do when a tool returns nothing
Stop condition — confidence threshold, or all phases complete
Synthesis — combine observations into a grounded answer
SKILL FLOWCHART — no vendor names
alerts
→
topology
→
metrics
→
logs
changes
history
synthesize
The skill is tool-agnostic . It says "I need metrics" — not "call Datadog". The gateway resolves the how. The skill owns the what and why .
Skills Compose: Like Functions in Code
INVESTIGATION SKILL (orchestrator)
Phase 1: alert-enrichment skill
Resolve alert context
Phase 2: observability-analysis skill
Gather evidence
metrics
logs
traces
primary primary best-effort
Phase 3: dependency-analysis
+ change-correlation skill
Phase 4: Synthesize → Hypothesis
Reuse across contexts:
observability-analysis → used in investigations , health checks , proactive scans
dependency-analysis → used in incidents and capacity planning
Small, composable, single-responsibility. Like functions in code.
The Execution Loop + Context Ledger
EXECUTION LOOP
OBSERVE
What do I know? Missing?
ACT
Call tool or sub-skill
RECORD
Store in ledger + note gaps
DONE?
no → loop
yes ↓
SYNTHESIZE
CONTEXT LEDGER
Persistent state — grows with each iteration
iter 1: alert = checkout-service 5xx @ 14:32 UTC
iter 2: topology = depends on payments-db, auth-svc
iter 3: metrics = p99 4200ms (baseline 180ms)
iter 4: logs = "connection pool exhausted" ×847
iter 5: deploy v2.4.1 by @eng-team @ 14:28
gap: traces unavailable (Splunk timeout)
synthesize reads ONLY from ledger
✓ Carries state across tool calls
✓ Survives sub-skill delegation
✓ Enforces evidence-only reasoning
No hallucination. No invention. Only cited evidence.
The ledger is the difference between "an LLM that sometimes calls tools" and "a disciplined investigator that builds a case."
Production Lessons
What Broke, What Scaled, What Surprised Us
WHAT BROKE
⏱
Tool discovery is slow
Listing tools: 200–800ms per tenant. Every turn = sluggish.
Fix: Per-tenant tool cache with signature-based invalidation
{}
MCP server quality is wildly inconsistent
Some return 50KB JSON blobs. No summary. Not LLM-friendly.
Fix: Wrapping layer that normalizes, truncates, extracts signal
🔒
Auth expiry mid-investigation
Tool #5 returns 401. Start over? Resume? Worst UX.
Fix: Context ledger enables resumability after re-auth
∅
"No data" ≠ "error" — LLMs confuse them
Empty results = valid evidence ("nothing anomalous"). LLMs retry.
Fix: Explicit "absence is information" logic in skill definitions
We shipped to real enterprise tenants. These are the scars.
WHAT SCALED
🧩
Skill composition scales linearly
New domain? Write a sub-skill, register it. No monolith coordination.
Teams ship skills independently.
🔌
MCP is genuinely vendor-agnostic
New vendor ships MCP server → our agent supports it.
Zero code change. Zero deployment on our side.
🌱
"Two arms" solves the cold-start problem
Day 1 tenant, zero 3P tools — agent still works via native tools.
As they connect providers, it gets better. Never useless.
🛡
Evidence-only synthesis eliminates hallucination
Synthesize only from context ledger. Not from LLM memory.
If it's not in the ledger, it's not in the output.
Three ideas that make it work.
Skills
Domain expertise as composable,
tool-agnostic workflows.
The what and why .
• Composable sub-skills
• Evidence-only synthesis
• Context ledger
• Gap handling built in
Native Tools
Platform-owned capabilities that
guarantee a baseline.
The always-available foundation.
• Zero-auth, zero-config
• Solves cold-start
• Context + topology + history
• Day-1 useful
MCP
A protocol for dynamic tool
discovery and invocation.
The how — resolved at runtime.
• Three-hop architecture
• Gateway: discovery + routing
• Zero-deploy vendor support
• Per-tenant scoping
None novel in isolation. The contribution is in how they compose —
and in making them work at enterprise scale, with graceful degradation as a first-class requirement.
Thank you.
Back Start →
→ next ← back Space next