How an AI Customer Service Agent Learns Your Whole Business in Minutes
InstantAIGuru scans your website and instantly becomes an expert on your products, services, and policies. Here's how it works and how to set it up.
The Guru turns the content you already publish into a structured knowledge base that an AI can search and reason over. This article explains the data flow and how to set it up.
What the system actually does
When you give the Guru your URL, it crawls your site, turns each page into clean text, and indexes that text for retrieval. The crawler runs JavaScript and indexes the content a page renders into view, so dynamically built content is captured. The agent goes live as soon as the crawl completes.
At query time, the Guru runs Hybrid RAG: dense vector retrieval (semantic match for what the customer means) combined with sparse keyword retrieval (exact-term match for the precise SKU or policy clause referenced). It re-ranks the top candidates and passes the best passages to a language model with a prompt that constrains it to answer only from the retrieved context. Each answer is grounded in a specific passage from your indexed content, cites its source, and the source is auditable. This pattern is called Retrieval-Augmented Generation (RAG).
A worked example
Suppose a visitor types "do you take Delta Dental on Saturdays?" into your widget.
- Retrieval pulls two chunks: one from /insurance listing accepted plans including Delta Dental PPO, and one from /hours showing Saturday hours of 9 to 1.
- The model is given both chunks plus the question and instructed to cite the source URLs.
- The reply reads: "Yes, we accept Delta Dental PPO. Our Saturday hours are 9am to 1pm." with links back to both pages.
If the policy page only lists "Delta Dental" without specifying PPO vs HMO, the Guru will return the more general answer rather than fabricate a detail that wasn't on the page.
Setup
- Paste your URL. The crawler runs and the agent goes live when the crawl completes. A typical small site (under 50 pages) is live in about five minutes. Larger sites take roughly one hour per 1,000 crawled pages, depending on source-content complexity.
- Upload Knowledge Files for content that isn't on your site. Knowledge Files upload is available on all plans, in HTML, PDF, DOC/DOCX, XLS/XLSX, Markdown, CSV, and TXT formats. Re-index after adding or removing files so the next answer reflects the change.
- Use the Instructions field, a powerful free-text source where you state authoritative facts, answers, policies, and the directions you want the Guru to follow, in your own words. It is more than a place for manual Q&A: it strongly shapes how the Guru responds. You can add the same kind of material in uploaded documents too.
Advanced crawl controls
In the admin you can also turn on Crawl External Pages, which indexes the external pages your site links out to, not just your own. For finer control, additional options are available on request by contacting support: a hard cap on the number of indexed pages, path exclusions, additional seed domains, and fenced paths. Scheduled re-crawl, to keep the knowledge base fresh automatically, is available via the API.
Why RAG instead of fine-tuning
Fine-tuning bakes information into model weights. It is expensive, slow, opaque (you cannot easily see what the model "knows"), and goes stale the moment your policies change. RAG keeps the source of truth in your content store. Update a page, re-index, and the next answer reflects the change. This is the same architecture used by enterprise search systems and modern documentation assistants because it is auditable and cheap to update.
Where to look when an answer is wrong
Every answer can be traced back to the chunks that produced it. If a reply is incorrect, the first diagnostic is to inspect the retrieved passages. Three common patterns:
- The right page exists but wasn't retrieved. Usually a phrasing mismatch; rewriting the page heading often fixes this.
- The retrieved page is outdated. Update the page and re-index.
- The retrieved page is ambiguous. Tighten the source content.
The knowledge base is only as good as what you publish. The Guru's job is to make that content instantly queryable.