← Back to ma.tthew.site

Anatomy of an AI
Support System

How I took a basic out-of-the-box AI support agent and turned it into a production system at a sports trading startup — across 6 built domains and 2 planned expansions.

Novig · Nov 2024 – Apr 2026 Intercom Fin AI Sports Trading

When I joined Novig as the first customer support hire, we had Intercom and a basic Fin AI agent — but it was essentially out of the box. No real guidance, minimal content, no structured escalation logic, no QA layer.

Over a year and a half, I transformed it into a system that could handle the full complexity of a regulated sports trading platform: purchases, prize redemptions, trade disputes, fraud detection, KYC verification, and responsible gameplay — with appropriate guardrails at every layer.

This is a map of what that system looks like. Click any domain to explore how it works.

A customer message travels through multiple systems before a response is generated. Here's the path:

Customer Classify Guidance Content Escalate? Response
QA Monitors Analytics Improve ↺ feedback loop

Click any domain to see how it works, what decisions I made, and real examples from the build.

01
Guidance
The brain — behavioral rules that shape every response
02
Content
The knowledge — multi-layered information architecture
03
Attributes
The classifier — real-time conversation categorization
04
Escalation
The safety net — knowing when AI should step aside
05
QA Monitors
The quality layer — automated conversation scoring
06
Analytics
The feedback loop — measure, identify, improve

These domains were designed and scoped but not fully implemented before my departure. They represent the natural next layer of the system.

07
Data Connectors
Real-time external data access via API integrations
Designed · Not Built
08
Procedures
Multi-step automated workflows with branching logic
Designed · Not Built
01 — Guidance

What it does

Guidance is the set of natural-language instructions that shape how the AI agent behaves in every conversation. Think of it as the agent's personality, policy knowledge, and decision-making framework — all written as direct instructions.

How I structured it

I organized guidance into distinct categories, each with a single clear objective. This prevents conflicts and makes the system maintainable as it grows.

  • Communication style — tone, vocabulary, response formatting
  • Context & clarification — when to ask follow-up questions
  • Content sourcing — which articles to prioritize for specific topics
  • Policy enforcement — company rules the AI must always follow

Key design decisions

  • Every piece of guidance addresses one behavior — no mixing tone with escalation with content sourcing
  • Conditional logic (if/when/then) everywhere — the AI needs to know when rules apply, not just what they are
  • Written directly to the AI in second person — "Never tell the customer to reinstall" vs. "The AI should not recommend reinstalling"
  • Critical rules in CAPS — signals to the AI that a rule is non-negotiable
Example: Platform Tone
"Use a confident, knowledgeable tone. Refer to wagers as 'trades' unless the customer uses a different term first. Keep interactions professional and neutral regarding outcomes."
Example: Information Gathering
"If a customer reports a trade was settled incorrectly, ask them to provide: (1) the specific trade or slip ID, (2) the event name, and (3) what they believe the correct outcome should be — before attempting to look up any information."

Constraints

Max 100 guidance pieces, each up to 2,500 characters. Guidance can't route conversations, tag them, or trigger other guidance — those actions require separate workflow automation. Each piece is evaluated independently at every point in the conversation.

02 — Content

What it does

Content is the knowledge base the AI draws from when answering questions. The quality of your content directly determines the quality of AI responses — garbage in, garbage out.

The content hierarchy

I built a multi-layered content system where each type serves a different purpose and syncs at different speeds:

  • Public articles — customer-facing Help Center. Syncs immediately. Best for product info, how-tos, FAQs.
  • Internal articles — visible only to agents and AI. For troubleshooting guides, escalation procedures, internal policy.
  • Snippets — short-form, private content. Syncs in ~10 minutes. Perfect for temporary notices, known issues, time-sensitive info that'll be deleted later.
  • Documents — uploaded PDFs/docs for detailed reference material.
  • External URLs — synced weekly from other knowledge bases.

Key design decisions

  • Public articles and snippets are weighted equally by the AI — if info can be public, prefer an article (more scalable)
  • Every article structured with headers that repeat context in the body — the AI may extract a paragraph without its heading
  • "Radio interview" test — no statement should be confusing if quoted out of context
  • Removed all "contact support" language from articles — customers talking to the AI are already in a support conversation
Example: Temporary Snippet
"Delayed prize redemption processing — Feb 12-14, 2026. We are currently experiencing delays due to a payment provider maintenance window. All pending prize redemptions will be processed by end of day Feb 15. No action needed — funds are safe."
03 — Attributes

What it does

Attributes automatically classify every conversation by topic, intent, or sentiment — in real-time, as the customer types. This classification powers routing, reporting, and conditional logic throughout the system.

How I designed the taxonomy

For a sports trading platform, I built a conversation topic taxonomy with clearly bounded categories. The critical design principle: if a human agent would struggle to choose between two categories, they need to be consolidated or clarified.

  • Purchases — adding funds, failed purchases, payment methods
  • Prize Redemptions — cashouts, payout status, redemption issues
  • Trade Disputes — incorrect settlements, voided trades, outcome disagreements
  • Responsible Gameplay — self-exclusion, limits, concerns about play behavior
  • Other — always include a fallback to prevent forced misclassification

Key design decisions

  • Detailed descriptions matter more than names — each value includes what it covers, keywords, example questions, and explicit exclusions
  • Every attribute value explains what does NOT belong — this is what prevents overlap
  • The "Other" category is mandatory — without it, the AI forces bad matches
Example: Exclusion Boundaries
Trade Disputes — does NOT apply if the customer is asking general questions about how trading works (that's "Trading & Gameplay") or has a payment issue unrelated to trade outcomes (that's "Purchases" or "Prize Redemptions").
04 — Escalation

What it does

Escalation controls when the AI should stop trying to help and hand the conversation to a human. In regulated industries like sports trading, getting this right is not optional — it's a compliance requirement.

Two complementary systems

  • Escalation Rules — deterministic, data-driven triggers. Fire based on structured conditions like attribute values, customer properties, or conversation data. Example: VIP customer + negative sentiment = immediate escalation to VIP support.
  • Escalation Guidance — scenario-based, natural language. For nuanced situations the AI needs to interpret. Example: customer mentions a regulatory body or legal action = immediate escalation, no resolution attempt.

Key design decisions

  • Rules handle the objective triggers (data-based). Guidance handles the subjective ones (intent-based). They complement each other.
  • Escalation determines WHEN — separate workflow automation determines WHAT happens after (which team, what ticket type, what priority)
  • The AI only offers escalation once per conversation — this prevents loops
  • Responsible gameplay mentions always trigger immediate escalation, no exceptions
Example: Regulatory Escalation
"If a customer mentions a regulatory body, gaming commission, legal action, or lawyer, escalate immediately without attempting to resolve. Let the customer know their inquiry is being forwarded to the appropriate team."
Example: Rule — VIP + Negative Sentiment
Condition: Customer "VIP Tier" = "Platinum" AND Fin Attribute "Sentiment" = "Negative" → Immediate escalation to VIP Support inbox
05 — QA Monitors

What it does

Monitors are the automated QA system — they select conversations for review based on filters or natural-language flag criteria, then score them against custom scorecards. This is how you know if your AI is actually doing a good job at scale.

Three monitoring patterns

  • Baseline QA — random sampling of conversations to track overall quality trends
  • Risk-based monitoring — targeted flags for high-stakes scenarios: low satisfaction scores, policy breaches, legal/regulatory language, frustrated customers
  • Initiative tracking — monitoring conversations related to a specific launch, pricing change, or product update

How scorecards work

Each scorecard has weighted attributes (answer accuracy, tone, policy adherence, escalation handling). Attributes can be scored by AI, by humans, or both. Critical attributes — like compliance — automatically fail the entire review if they score zero, regardless of other scores.

Key design decisions

  • Flag criteria describe observable behavior, not inferred intent — "customer used the word 'ridiculous'" vs. "customer seemed frustrated"
  • Always include explicit exclusions in flag criteria to reduce false positives
  • Test monitors against real conversation samples before activating
06 — Analytics

What it does

Analytics closes the loop. Without measurement, every other domain is running blind. This is where you identify what's working, what's broken, and what to fix next.

Core metrics I tracked

  • Automation rate — what share of total volume is the AI handling end-to-end?
  • Resolution rate — when the AI engages, how often does it actually solve the problem?
  • Involvement rate — how much coverage does the AI have across all conversations?
  • CX Score — AI-scored customer satisfaction on every conversation

The optimization loop

This is where all the domains connect. The analytics surface identifies patterns — a topic with low resolution, a content gap, an escalation that fires too often. That insight feeds back into content updates, guidance refinements, or adjusted escalation rules.

  • Topics Explorer — auto-grouped conversation topics with performance metrics. Shows WHAT is happening.
  • Optimize — AI-generated recommendations for content gaps, improvements, and new automation opportunities. Shows WHAT TO DO.
  • Trends — weekly automated anomaly detection. Shows WHAT CHANGED.
  • Performance funnel — traces every conversation from arrival through resolution or escalation. Shows WHERE it breaks.
Example: Debugging a Bad Answer
1. Spot low CX Score on a conversation
2. Click "Improve Answer" on the AI's response
3. Trace which content source was used
4. Determine: content problem? guidance problem?
5. Fix at the source → improvement propagates to all similar conversations
07 — Data Connectors Planned

What it would do

Data connectors let the AI pull real-time information from external systems — so instead of telling a customer "let me check on that," it actually checks, immediately, and responds with their specific data. This was designed but not implemented before my departure.

Connectors I designed

  • Trade Status Lookup — retrieve trade details, selections, odds, settlement status by trade ID
  • Account Balance — available funds, pending purchases/redemptions, bonus balance
  • Prize Redemption Status — payout amount, method, processing stage, estimated completion
  • KYC Verification Status — verification level, pending requirements, document status

Design principles

  • Use the most direct API endpoint possible — if the customer has a Trade ID, don't make them also provide their email
  • Minimize required inputs — auto-detect customer identity from the conversation context when possible
  • The "When to use" description is the most important config field — it tells the AI when to trigger each connector
  • Connectors can chain — customer checks trade status, then asks about a redemption, and the AI uses a second connector seamlessly
08 — Procedures Planned

What it would do

Procedures are multi-step automated workflows that combine natural language instructions with deterministic controls. They let the AI handle complex, multi-turn processes end-to-end — not just answer questions, but actually resolve issues. These were scoped and designed but not implemented before my departure.

Anatomy of a procedure

  • Trigger — detailed description of when to start, with example customer phrases and explicit exclusions
  • Instructions — step-by-step in natural language, starting every step with a verb
  • Branching — if/else logic for different scenarios
  • Code blocks — strict calculations or eligibility checks where natural language is too ambiguous
  • Data connectors — inline API calls to fetch or update external data
  • End conditions — resolved, escalated, or redirected to another procedure

Why this was the next priority

With the first 6 domains in place, the system could answer questions and route conversations well. Procedures would have been the step that moved the AI from "answering" to "resolving" — handling multi-step processes like prize redemption troubleshooting or trade dispute resolution end-to-end without human involvement.

An AI support agent isn't a single thing you configure and forget. It's an interconnected system of domains that need to be designed together, tested rigorously, and continuously refined based on real performance data.

The system I built at Novig handled the complexity of a regulated sports trading platform — purchases, prize redemptions, trade disputes, fraud detection, KYC, and responsible gameplay — with appropriate guardrails at every layer. And the domains I designed but didn't build point to where the system would go next.

The hardest part isn't any single domain. It's making them all work together coherently, and building the feedback loops that keep the system improving over time.