How I took a basic out-of-the-box AI support agent and turned it into a production system at a sports trading startup — across 6 built domains and 2 planned expansions.
When I joined Novig as the first customer support hire, we had Intercom and a basic Fin AI agent — but it was essentially out of the box. No real guidance, minimal content, no structured escalation logic, no QA layer.
Over a year and a half, I transformed it into a system that could handle the full complexity of a regulated sports trading platform: purchases, prize redemptions, trade disputes, fraud detection, KYC verification, and responsible gameplay — with appropriate guardrails at every layer.
This is a map of what that system looks like. Click any domain to explore how it works.
A customer message travels through multiple systems before a response is generated. Here's the path:
Click any domain to see how it works, what decisions I made, and real examples from the build.
These domains were designed and scoped but not fully implemented before my departure. They represent the natural next layer of the system.
Guidance is the set of natural-language instructions that shape how the AI agent behaves in every conversation. Think of it as the agent's personality, policy knowledge, and decision-making framework — all written as direct instructions.
I organized guidance into distinct categories, each with a single clear objective. This prevents conflicts and makes the system maintainable as it grows.
"Use a confident, knowledgeable tone. Refer to wagers as 'trades' unless the customer uses a different term first. Keep interactions professional and neutral regarding outcomes."
"If a customer reports a trade was settled incorrectly, ask them to provide: (1) the specific trade or slip ID, (2) the event name, and (3) what they believe the correct outcome should be — before attempting to look up any information."
Max 100 guidance pieces, each up to 2,500 characters. Guidance can't route conversations, tag them, or trigger other guidance — those actions require separate workflow automation. Each piece is evaluated independently at every point in the conversation.
Content is the knowledge base the AI draws from when answering questions. The quality of your content directly determines the quality of AI responses — garbage in, garbage out.
I built a multi-layered content system where each type serves a different purpose and syncs at different speeds:
"Delayed prize redemption processing — Feb 12-14, 2026. We are currently experiencing delays due to a payment provider maintenance window. All pending prize redemptions will be processed by end of day Feb 15. No action needed — funds are safe."
Attributes automatically classify every conversation by topic, intent, or sentiment — in real-time, as the customer types. This classification powers routing, reporting, and conditional logic throughout the system.
For a sports trading platform, I built a conversation topic taxonomy with clearly bounded categories. The critical design principle: if a human agent would struggle to choose between two categories, they need to be consolidated or clarified.
Trade Disputes — does NOT apply if the customer is asking general questions about how trading works (that's "Trading & Gameplay") or has a payment issue unrelated to trade outcomes (that's "Purchases" or "Prize Redemptions").
Escalation controls when the AI should stop trying to help and hand the conversation to a human. In regulated industries like sports trading, getting this right is not optional — it's a compliance requirement.
"If a customer mentions a regulatory body, gaming commission, legal action, or lawyer, escalate immediately without attempting to resolve. Let the customer know their inquiry is being forwarded to the appropriate team."
Condition: Customer "VIP Tier" = "Platinum" AND Fin Attribute "Sentiment" = "Negative" → Immediate escalation to VIP Support inbox
Monitors are the automated QA system — they select conversations for review based on filters or natural-language flag criteria, then score them against custom scorecards. This is how you know if your AI is actually doing a good job at scale.
Each scorecard has weighted attributes (answer accuracy, tone, policy adherence, escalation handling). Attributes can be scored by AI, by humans, or both. Critical attributes — like compliance — automatically fail the entire review if they score zero, regardless of other scores.
Analytics closes the loop. Without measurement, every other domain is running blind. This is where you identify what's working, what's broken, and what to fix next.
This is where all the domains connect. The analytics surface identifies patterns — a topic with low resolution, a content gap, an escalation that fires too often. That insight feeds back into content updates, guidance refinements, or adjusted escalation rules.
1. Spot low CX Score on a conversation
2. Click "Improve Answer" on the AI's response
3. Trace which content source was used
4. Determine: content problem? guidance problem?
5. Fix at the source → improvement propagates to all similar conversations
Data connectors let the AI pull real-time information from external systems — so instead of telling a customer "let me check on that," it actually checks, immediately, and responds with their specific data. This was designed but not implemented before my departure.
Procedures are multi-step automated workflows that combine natural language instructions with deterministic controls. They let the AI handle complex, multi-turn processes end-to-end — not just answer questions, but actually resolve issues. These were scoped and designed but not implemented before my departure.
With the first 6 domains in place, the system could answer questions and route conversations well. Procedures would have been the step that moved the AI from "answering" to "resolving" — handling multi-step processes like prize redemption troubleshooting or trade dispute resolution end-to-end without human involvement.
An AI support agent isn't a single thing you configure and forget. It's an interconnected system of domains that need to be designed together, tested rigorously, and continuously refined based on real performance data.
The system I built at Novig handled the complexity of a regulated sports trading platform — purchases, prize redemptions, trade disputes, fraud detection, KYC, and responsible gameplay — with appropriate guardrails at every layer. And the domains I designed but didn't build point to where the system would go next.
The hardest part isn't any single domain. It's making them all work together coherently, and building the feedback loops that keep the system improving over time.