How We Measure What We Claim
Updated 2026-05-02. Maintained by Ron Pinkas, founder. Corrections welcome via LinkedIn or [email protected].
instantAIguru makes specific, falsifiable claims about accuracy and safety. This page documents how each claim was measured, when, by whom, and what it means in practice. If you find anything here imprecise, message us and we'll correct it.
What we measure, and what we don't
We make four primary claims across the site:
- 97%+ answer accuracy on questions answerable from your indexed content
- 85%+ autonomous resolution on phone calls, measured April 2026
- Zero hallucination on agentic actions (payments, account changes, business flows)
- 5-minute deployment with no code
Each is bounded. We do not claim the language model never misunderstands a question. We do not benchmark across questions outside your indexed source content. Out-of-scope questions are handled by deterministic guardrails, not by extrapolation.
97%+ answer accuracy: provenance
This number is not a marketing extrapolation. It is the accuracy threshold our Hybrid RAG retrieval pipeline has held in production reviews at two enterprise customers, validated independently, more than a year apart.
Omie, May 2024
instantAIguru went live at Omie on 2024-05-22. The 97%+ figure was first established in May 2024 from a 30,000-question review by Omie's tier-1 support engineers. Every question was reviewed against Omie's source-of-truth content. Disagreements were classified as either out-of-scope (no answer was retrievable) or genuine accuracy failures. The 97%+ figure refers to the latter category against in-scope questions.
On the morning of go-live, after the system handled its first 205 production questions, Marcelo Lombardo, Founder and CEO of Omie (Brazil's largest ERP/CRM platform for SMBs, serving 190,000+ businesses), sent a WhatsApp message:
"Congrats man. You nail that. 🙏"
Curacao, October 2025
Curacao handles 500 to 1,000 customer phone calls per day, with each call involving multiple AI and JSFE turns. Across all channels (web, WhatsApp, SMS, phone), this totals 100,000+ interactions per month, of which approximately 30,000 are JSFE-executed business flows. Curacao operates these channels on numbers they own directly through Twilio and Meta, not platform-leased numbers. The strategic case for that model is documented in Bring Your Own Numbers.
In October 2025, SVP Joseph Jiron's Customer Service team reviewed thousands of production questions across in-scope topics. Mr. Jiron signed off on the 97%+ accuracy claim. His audit is documented in our Curacao case study. Curacao has been a customer of Ron Pinkas's previous systems since 1987.
85%+ autonomous resolution: phone calls, April 2026
In April 2026, across the Curacao production deployment, 85%+ of inbound phone calls were resolved autonomously by the Guru without human escalation. "Resolved autonomously" means the customer's request was satisfied within the call without transfer to a human agent. The remaining ~15% included calls intentionally escalated by the Guru's deterministic guardrails (high-risk transactions, account ownership disputes, requests outside indexed content).
This metric is operational, not benchmarked. It reflects what production looked like in a single month at one customer; it is not a contractual SLA. We report it because we trust it more than synthetic benchmarks. The business impact of resolving calls inside the call rather than queuing for callbacks is covered in How to STOP Losing Customers to Slow Replies.
Zero hallucination on agentic actions: what we mean
This is the claim that creates the most trust friction, and it deserves precision.
What it means: when the Guru takes a business action against your backend (a payment, an account inquiry, a product inventory check, a dispute ticket creation), that action does not pass through the language model. It runs through the JavaScript Flow Engine (JSFE), a deterministic agentic engine with explicit, code-based workflow definitions. The language model determines which flow to trigger based on customer intent. Once a flow triggers, JSFE executes it deterministically. AI is not used in the execution path, so hallucination is architecturally impossible at the action layer.
What it does not mean: the language model never misunderstands a customer's intent. It can. When that happens, the corresponding business flow simply does not trigger; the conversation continues with a reasonable conversational AI response. The customer is not committed to a wrong action; they get a natural-language reply, possibly a clarifying question, and the right flow can be triggered on the next turn.
Substantiation: across 200,000+ business flows in Curacao production, including 100,000+ phone calls, 100% of business flows (payment, account inquiries, product inventory queries, dispute tickets creation) completed without any error, exactly as scripted. This is not a sample; it is the full census. The architectural guarantee is documented in How JSFE Brings Trusted Workflow Control to Agentic AI and Hybrid RAG.
A traditional "agentic AI" system lets the LLM choose what tools to call and what parameters to pass. Hallucination becomes operational risk. Our system does not do this; it cannot, by design. For the longer engineering argument on why tool-using AI assistants are the wrong abstraction for production action-execution, see Do AI Assistants Really Need Tools?
Hybrid RAG: the retrieval guarantee
Hybrid RAG combines multiple retrieval vendors with a custom orchestration layer. Each customer answer is grounded in a specific passage from the customer's indexed content. Answers cite their source; the source is auditable.
Why this matters: the 97%+ accuracy figure depends on the retrieval layer finding the right grounding passage. When no relevant passage exists in the index, the Guru does not extrapolate; it routes to a guarded fallback.
Technical detail: Hybrid RAG.
Named customers, dates, and permission status
| Customer | Industry | Live since | Public consent | Source |
|---|---|---|---|---|
| Omie | Brazil's largest ERP/CRM for SMBs (190K+ businesses) | 2024-05-22 | Yes, CEO Marcelo Lombardo public endorsement | /about |
| La Curacao | Retail finance, multinational | 1987 (predecessor) / current Guru | Yes, including SVP Joseph Jiron name | /case-studies/curacao |
| The Master's University | Higher education | 2025 | Yes | /case-studies/the-masters-university |
We do not publish customer quotes without explicit consent.
What we don't measure (and don't claim)
To be specific about boundaries:
- We do not claim 100% accuracy. We claim 97%+ on in-scope questions, against named-customer audits.
- We do not benchmark on synthetic test sets we created. Audits are conducted by customers and our team against real production traffic.
- We do not include out-of-scope questions in the accuracy calculation. Out-of-scope is handled by guardrails, separately.
- We do not claim the language model is hallucination-free. We claim the action layer is.
- We do not claim 5-minute deployment includes content authoring time. It includes platform setup; the customer owns content quality.
90-day money-back guarantee
Unconditional. If at any time within the first 90 days of paid service you request a refund, you receive a full refund of all SaaS payments made to instantAIguru during that period. No questions asked, no proof of dissatisfaction required, no proration.
We are confident the Guru will increase your profits within the first 30 days. The 90-day guarantee is the safety net that makes that confident claim safe to make. It covers our SaaS fees only; carrier or platform pass-through costs you incur directly with third parties (Twilio voice and SMS minutes, Meta WhatsApp template message fees if applicable, Stripe processing fees) are paid to those providers and are not refundable through us.
Corrections and contact
If a number on this page disagrees with a number elsewhere on the site, this page is canonical. If you find a number anywhere on the site that this page does not justify, that is a bug. Message Ron Pinkas on LinkedIn or email [email protected] and we'll fix it.