AI-ready Documentation

Why Intercom Fin's Resolution Rate Has a Documentation Problem

Intercom Fin retrieves answers from your knowledge base using retrieval-augmented generation. When that knowledge base contains outdated steps, renamed features, or stale screenshots, Fin delivers those inaccuracies confidently. The resolution rate ceiling is a documentation quality problem, not an AI model problem — and the fix is the same either way.
April 30, 2026
Henrik Roth
Intercom Fin's Docs Problem
TL;DR
  • Intercom Fin uses RAG — it retrieves from your help center articles and synthesizes answers from them. Its hallucination rate is below 1%, but that only covers invented content, not stale content. When an article is outdated, Fin accurately cites the wrong information.
  • Documentation accuracy is the main bottleneck on Fin's resolution rate, not the AI model. Intercom audited and updated 700+ articles before enabling Fin internally — a direct signal of how much content quality matters.
  • Stale articles produce confident wrong answers at scale: renamed buttons, deprecated navigation paths, and removed features appear in Fin's responses until someone manually updates the source article.
  • Intercom's Content Gap Suggestions feature is reactive — it identifies what already failed, not what is about to go stale when the next release ships.
  • The structural fix is GitHub Sync: connecting documentation to the release cycle so UI changes in code automatically flag or update the affected articles. HappyAgent monitors the repository and maps selector changes to guide content without manual intervention.
  • Teams can measure the impact directly: find articles with high Fin involvement and low resolution rates — those are the documentation accuracy failures. Fix them first. Resolution rates improve without any changes to model configuration.

Intercom Fin works well when the documentation underneath it is accurate. When the documentation is stale, Fin works against you. It retrieves confidently, synthesizes fluently, and delivers wrong answers at scale. There is no warning label when it does. Support teams that have run Fin for more than three months without a documentation discipline almost always notice the same pattern: resolution rate plateaus, certain query clusters keep failing, and the failures are weirdly specific. Wrong navigation paths. Deprecated feature names. Steps that used to work but no longer do. The pattern is not a model problem. It is a documentation decay problem.

How Intercom Fin actually works

Intercom Fin uses retrieval-augmented generation (RAG). When a customer sends a message, Fin searches your connected knowledge base for relevant articles, then formulates a response grounded in what it finds. It does not draw on general LLM knowledge or make up information outside your help center content. It retrieves from your articles and synthesizes from that.

This architecture is deliberate and correct for business use. RAG keeps Fin's answers grounded in your product's specific information, makes hallucinations unlikely, and lets you control what Fin knows by controlling the knowledge base. Intercom reports a hallucination rate below 1%, meaning Fin almost never invents information that is not in the source material. That is a genuine achievement and the right starting point for an enterprise AI agent.

The catch is that Fin's accuracy ceiling is set entirely by the accuracy of your articles. Fin cannot tell whether an article is current or eight months out of date. It retrieves the most semantically relevant content and synthesizes from it. If the most relevant article is wrong, Fin delivers that wrong answer with the same confidence as a correct one. This is not a hallucination in the technical sense. Fin accurately represented what the article said. The article was wrong. That distinction matters because the fixes are completely different: you cannot solve a documentation accuracy problem by upgrading the AI model. You solve it by fixing the documentation.

What sets Fin's accuracy ceiling

Three variables determine how well Fin performs on any given query. The first is coverage: if there is no article on the topic, Fin cannot answer it. This is a straightforward gap to identify and address. The second is clarity: if the article is poorly structured or ambiguous, Fin may retrieve it but extract the wrong section. The third is accuracy: if the article exists, is well-structured, but contains outdated steps, Fin delivers those outdated steps.

Most teams invest in coverage and clarity. The accuracy variable is the one that drives the most failures in established deployments, because teams write new articles regularly but update old ones rarely. Intercom's own support team ran a full audit of more than 700 articles before enabling Fin internally. That is not a small number. It is a direct signal of how seriously documentation quality affects AI performance when you operate at scale, and of how much decay accumulates in a real knowledge base over time.

The Knowledge-Centered Service methodology from the Service Innovation Library benchmarks knowledge article useful life at roughly six months. For teams shipping weekly, that lifespan is much shorter in practice. A feature rename can invalidate multiple articles in a single sprint without anyone flagging the change to the docs team. The support content quality problem is not a failure of effort. It is a structural gap between how software is built and how documentation is maintained.

How stale documentation produces confident wrong answers

The failure mode is concrete. Your product had a settings panel called "Account Settings." Your engineering team split it into two panels: "Profile" and "Billing." Fourteen help articles reference "Account Settings." Nobody flagged those articles for the docs team when the change shipped. There was no process for it.

A customer asks Fin: "How do I update my payment method?" Fin retrieves the billing article, which begins: "Go to Account Settings, then select Billing." Fin tells the customer exactly that. The customer searches for "Account Settings" in your product. It no longer exists. They fail. They escalate to a human agent, now frustrated because they already tried and wasted time.

Fin did nothing wrong by its own logic. It retrieved an accurate representation of the article's content. The article described a UI that no longer exists. This is not a model failure. It is a documentation quality failure, and it compounds with every release that touches the product without triggering an article update. The knowledge base accuracy problem multiplies with every sprint.

Intercom's own troubleshooting documentation acknowledges a related version of this: when a knowledge base article conflicts with a configured procedure, Fin defaults to the article. Outdated articles do not just produce wrong answers. They actively block the correct procedures from running. A team that set up a refund procedure but left an old refund article in place will see Fin cite the article and bypass the procedure entirely.

How fast does accuracy degrade at shipping speed

Documentation decay is a function of release cadence multiplied by article coverage. For a team shipping once a week with 200 articles covering UI-dependent workflows, the math works against you. A conservative estimate: each weekly release touches three to five UI elements. Each UI element change potentially invalidates one to three articles. Over a quarter, that is 12 releases touching 36 to 60 elements, producing up to 180 article-level inaccuracies across a knowledge base that may have 200 articles total.

Intercom's documentation for Fin optimization recommends reviewing articles unchanged for six or more months on a monthly basis. But the real problem for fast-shipping teams is not the articles that have not been touched in six months. It is the articles that were accurate three weeks ago and are now wrong because a developer renamed a button. Manual review cadences cannot catch that. One documented case in the Intercom community showed a 70% Fin failure rate on password reset queries traced entirely to a single outdated article describing a flow that had been redesigned two sprints earlier.

The SuperOffice 2023 customer service benchmarks found that customers who fail at self-service are significantly more expensive to serve when they escalate. They arrive already frustrated and with partial context that agents have to untangle. Running Fin on stale documentation is not just inefficient. It actively increases the cost of each failure, because failed AI interactions create more agitated customers than no AI interaction at all.

Why the fix is structural, not a review sprint

The typical response to documentation accuracy problems is a review sprint: pull the top articles by Fin involvement rate, have someone walk through each one against the live product, update what is wrong. This works once. It does not solve the problem over time, because the product keeps shipping and the review cadence never keeps pace with the release cycle.

Intercom's own Content Gap Suggestions feature illustrates the reactive limit clearly. The tool identifies what has already failed: queries Fin could not resolve. Then it suggests content fixes. But it cannot tell you which articles are about to go stale because a pull request merged yesterday. It identifies yesterday's failures, not tomorrow's. For teams shipping weekly, that means the knowledge base is always at least one release behind the optimization signal. The gap between code and documentation never closes. It only gets measured after it causes failures.

The structural fix is to connect documentation to the release cycle at the source. That means recording documentation in a format that knows what it is referencing in the product. Screenshot-based documentation records the visual state of the UI at one moment in time. When the product changes, the screenshot is wrong, but nothing in the documentation system knows it. There is no link between the frozen image and the code element that changed. The only way to find the inaccuracy is a manual review. As covered in why AI chatbots give wrong answers, this is the core architectural problem that breaks AI retrieval at scale: the knowledge base is built on data that has no connection to its own accuracy.

What GitHub Sync does differently

HappyRecorder, HappySupport's Chrome extension, records documentation differently. Instead of capturing screenshots, it captures DOM selectors and CSS metadata at each step: references to the actual code elements the user is interacting with, not a frozen image of how they looked at recording time. When a developer renames a button or reorganizes a navigation panel, the selector reference in the guide points to something that has changed. The system knows this before anyone has to manually find it.

HappyAgent, the GitHub Sync layer, monitors the repository for those selector changes. When a pull request modifies a CSS class or DOM element referenced in a guide, HappyAgent detects the change, maps it to the affected articles, and flags them for review. The connection between the code change and the documentation impact is automated, not manual. An engineer ships a rename. HappyAgent surfaces the three articles that referenced the old label. The support team reviews and confirms. The knowledge base stays current at release cadence, not at review sprint cadence.

This is what Intercom's content gap suggestions cannot do: it cannot tell you a guide is about to break before the guide breaks. GitHub Sync works on the upstream source of documentation decay rather than the downstream symptoms. The failure never reaches Fin's retrieval layer because it is caught at the point of change.

For teams currently running Fin on a screenshot-based knowledge base, the practical sequence is straightforward. Audit the articles Fin uses most against the live product first. Fix the highest-traffic inaccuracies. Then switch the recording method for new articles to one that captures code metadata rather than pixels. The GitHub Sync documentation guide covers how HappyAgent handles the continuous monitoring piece once the knowledge base is rebuilt with selector-aware content.

Measuring Fin accuracy before and after documentation fixes

Intercom's Fin Optimize feature gives teams the data needed to quantify the documentation problem. Look at three metrics in sequence: Fin's involvement rate (the percentage of conversations where Fin engages), the resolution rate per involved conversation, and the per-article resolution rate for the articles Fin references most.

Articles with high involvement and low resolution rate are the documentation accuracy failures in plain sight. They are the articles Fin retrieves frequently but that do not produce resolved conversations. Most of the time, the reason is inaccurate content: the article describes something that no longer works as described. These are not missing articles. They are wrong articles, and wrong articles are harder to catch than missing ones because they look fine until someone follows the steps.

A practical before-and-after benchmark: identify the ten articles with the highest involvement and lowest resolution rate. Fix each one against the live product. Rerun Fin on the same query set. Intercom's own knowledge management guidance reports that teams can reach an 80% resolution rate through refined knowledge management. The ceiling is not the model. It is the documentation quality.

The goal is not to tune the AI. It is to eliminate the documentation accuracy gap that the AI is faithfully reproducing. Once that gap closes, Fin's performance improves without any changes to model configuration. The bottleneck was always the knowledge base. Recognizing that is the first step. Building a process that keeps the knowledge base accurate automatically is what makes the improvement permanent.

If you want to audit your full knowledge base before connecting it to any AI system, the help center content audit guide covers exactly what to check and in what order. HappyRecorder creates guides with code-level references so articles know what they are pointing to. HappyAgent watches the codebase and flags changes when they happen. The combination keeps Fin's knowledge base current at shipping speed, without relying on anyone remembering to check after every release.

FAQs

Why does Intercom Fin give wrong answers?
Intercom Fin retrieves answers from your knowledge base. When that knowledge base contains outdated navigation paths, renamed features, or stale screenshots, Fin retrieves and delivers those inaccuracies confidently. The model is not hallucinating — it is accurately citing wrong content. Fix the content, fix the answers.
How does documentation quality affect AI chatbot resolution rates?
AI chatbots built on retrieval-augmented generation can only perform as well as the documents they retrieve from. A knowledge base where 30-40% of articles are inaccurate produces a chatbot that fails on those topics — regardless of model quality. Documentation freshness is the binding constraint on resolution rates.
What is the relationship between Intercom Fin and a knowledge base?
Intercom Fin uses retrieval-augmented generation: it searches your Intercom Articles knowledge base for relevant content, then formulates a response based on what it finds. The quality of its answers depends directly on the accuracy and completeness of those articles. Fin does not generate answers from general knowledge — it grounds responses in your documentation.
How do you improve Intercom Fin's accuracy?
Start with a documentation audit. Identify which articles contain outdated steps, renamed features, or stale screenshots — these are the direct source of wrong Fin answers. Then establish a process that keeps the knowledge base current with every product release. A GitHub sync that detects when UI elements change and flags affected articles is the most reliable mechanism.
Is a stale knowledge base worse than no knowledge base for an AI chatbot?
In some ways, yes. A chatbot with no knowledge base declines to answer. A chatbot with a stale knowledge base answers confidently and incorrectly. The second outcome creates more frustration because it wastes the user's time and generates an expectation the bot then fails to meet. The severity scales with how wrong the content is and how critical the query is.
A failed self-service interaction costs 2-4x more in downstream support effort than a direct ticket — because the customer spent time trying and failing, and arrives at the human agent more frustrated.
Gartner Research
Table of contents

    Henrik Roth

    Co-Founder & CMO of HappySupport

    Henrik scaled neuroflash from early PLG experiments to 500k+ monthly visitors and €3.5M ARR, then repositioned the product to become Germany's #1 rated software on OMR Reviews 2024. Before SaaS, he built BeWooden from zero to seven-figure e-commerce revenue. At HappySupport, he and co-founder Niklas Gysinn are solving the problem he saw at every company: documentation that goes stale the moment developers ship new code.

    Schedule a demo with Henrik