Building a Contradiction Detection System for RAG: When Your Knowledge Base Disagrees With Itself
The silent confidence problem
Your engineering team updates the API documentation. The rate limit for the /users endpoint was raised from 1,000 to 5,000 requests per second six months ago. The new spec is in Confluence. The old spec is in a PDF that was ingested into your RAG system last year. Both chunks are in the vector store. Both are semantically relevant to the query "What is the rate limit for the /users endpoint?"
The retriever returns both chunks. The LLM reads them, notices they disagree, and does what LLMs do: it picks one. Sometimes it picks the correct one. Sometimes it picks the outdated one. Sometimes it averages them and says "approximately 3,000 requests per second." It almost never says "I found conflicting information and here are both versions with their sources." The result is a system that is confidently wrong on an unpredictable fraction of queries - and neither the user nor your monitoring system knows which answers to trust.
This is not an edge case. Any knowledge base that evolves over time - which is every knowledge base - will accumulate contradictions. Policies get updated. Product specs change. Organizational structures shift. If your RAG pipeline does not explicitly detect and handle these contradictions, it is a matter of when, not if, it will return stale or incorrect information.
Fact representation with subject-predicate-object triples
To detect contradictions, you first need a structured representation of the facts in your knowledge base. Raw text chunks are insufficient because determining whether two passages contradict each other requires understanding what each one claims. Two passages can use completely different words to assert the same fact, or identical words to assert different facts about different entities.
The foundational representation is the subject-predicate-object (SPO) triple. Every factual claim in your knowledge base can be decomposed into triples:
"The /users endpoint has a rate limit of 5,000 req/s" → (/users endpoint, rate_limit, 5000 req/s)
"The contract expires on December 31, 2026" → (Contract #4472, expiration_date, 2026-12-31)
"The VP of Engineering is Maria Santos" → (VP of Engineering, held_by, Maria Santos)
Each triple is linked back to its source chunk with metadata: the document ID, the extraction date, the document's own publication date, and a source authority score. This provenance chain is what enables contradiction detection - you are not comparing raw text, you are comparing structured claims that can be aligned by their subject and predicate.
Triple extraction can be done with LLM-based extraction (prompt the model to output structured triples from text), rule-based extraction for well-formatted documents (parsing tables, key-value pairs, and structured sections), or hybrid approaches that use rules for structured content and LLMs for narrative text. The research on LLM-based knowledge graph construction provides extensive benchmarks on extraction quality across different approaches and domains.
Detecting contradictions: alignment and comparison
Two triples contradict each other when they share the same subject and predicate but assert different objects. The detection pipeline has two stages:
- Triple alignment. Find pairs of triples that refer to the same fact. This is not as simple as exact-matching the subject and predicate strings - "the /users endpoint" and "/api/v2/users" may refer to the same subject, and "rate limit" and "maximum throughput" may refer to the same predicate. Use entity resolution on subjects (linking variant mentions to canonical entities) and predicate normalization (mapping synonym predicates to canonical predicate types) to align triples across documents.
- Object comparison. For aligned triple pairs, compare their objects. This varies by data type:
- Numeric values: Direct comparison with a tolerance threshold. "5,000 req/s" vs "5,000 requests per second" are equivalent after normalization. "5,000 req/s" vs "1,000 req/s" is a contradiction. Set the tolerance based on the domain - financial figures may require exact matches while performance metrics may allow 5-10% variance.
- Categorical values: Exact match after normalization. "Maria Santos" vs "Maria Santos-Rodriguez" might be the same person (apply entity resolution) or might not. "Active" vs "Deprecated" is a clear contradiction.
- Temporal values: Date comparison with calendar awareness. "Q1 2026" and "March 2026" are not contradictory (one is more specific). "Q1 2026" and "Q3 2026" are contradictory.
- Free-text values: The hardest case. Use semantic similarity to determine if two text objects assert the same thing. If similarity is high (>0.9), they are likely consistent. If similarity is low (<0.5), they may be contradictory - but they may also just be discussing different aspects. For ambiguous cases, flag for human review rather than auto-classifying.
Temporal validity windows: when contradictions are just updates
Not every pair of conflicting facts is a contradiction. Many are temporal updates - the fact was true at time T1 and a different fact became true at time T2. The rate limit really was 1,000 req/s until it was increased to 5,000 req/s. Maria Santos really was VP of Engineering until she was promoted to CTO and someone else took the role.
To distinguish contradictions from updates, every triple needs a temporal validity window: the time period during which the fact is believed to be true. The validity window has a start time (when the fact became true or when the document asserting it was published) and an end time (when a superseding fact was detected, or "present" if no superseding fact exists).
When two triples with the same subject and predicate but different objects are detected, check their validity windows:
- Non-overlapping windows: This is a temporal update, not a contradiction. The older fact is superseded by the newer one. Mark the older triple's end time as the newer triple's start time. Both remain in the knowledge base, but queries are answered using the triple whose validity window covers the query's temporal context.
- Overlapping windows: This is a genuine contradiction. Two sources claim different things about the same entity during the same time period. Flag it and apply source authority ranking to determine which is more likely correct.
Implementing temporal validity windows transforms your knowledge base from a flat collection of facts into a temporal knowledge graph that understands how facts evolve. When a user asks "What was the rate limit in January 2025?", the system can correctly return "1,000 req/s" even though the current value is 5,000 - because it tracks the validity of each fact over time.
Source authority ranking
When a genuine contradiction is detected - overlapping validity windows, same subject and predicate, different objects - you need a principled way to rank which source is more likely correct. This is source authority ranking.
Assign each document source a numeric authority score based on:
- Source type hierarchy: Primary sources (official documentation, contracts, regulatory filings) rank above secondary sources (meeting notes, emails, chat transcripts), which rank above tertiary sources (third-party articles, customer-reported information). A typical numeric scale: primary = 1.0, secondary = 0.6, tertiary = 0.3.
-
Recency: More recent documents receive a boost, but recency alone is not sufficient - a recent Slack message should not override an official spec document published a month earlier. Apply a recency multiplier that decays slowly:
recency_score = 1.0 - 0.1 × months_since_publication, floored at 0.3. - Author authority: Documents authored by domain owners (the team that owns the API, the legal team that drafted the contract) score higher than documents authored by downstream consumers. This requires maintaining a mapping of document authors to domain ownership, which is organizational metadata that your ingestion pipeline should capture.
The combined authority score - source_type × recency × author_authority - gives you a ranking of which conflicting fact is most likely correct. But "most likely correct" is not the same as "definitely correct." The real power of contradiction detection is not in automatically resolving conflicts but in surfacing them.
Surfacing contradictions rather than hiding them
The most important design decision in a contradiction-aware RAG system is what to do when contradictions are found. The tempting approach is to silently resolve them - pick the highest-authority source and return its value as if no conflict exists. This is a mistake.
Silent resolution creates a system that appears trustworthy but is not auditable. Users trust the answer because it sounds confident. When it is wrong - and it will be wrong some fraction of the time, because source authority ranking is a heuristic, not an oracle - there is no indication that uncertainty existed. The user has no opportunity to apply their own judgment.
Instead, surface contradictions explicitly. When the retrieval layer detects conflicting facts in the evidence set, structure the LLM's context to include both versions with their sources and authority scores. Instruct the LLM to acknowledge the conflict in its response:
"According to the API specification (v2.4, March 2026), the rate limit is 5,000 req/s. However, an earlier specification (v2.1, September 2025) states 1,000 req/s. The more recent document from the API team is likely current, but you may want to verify with the platform team."
This is dramatically more useful than either "5,000 req/s" or "1,000 req/s" alone. The user gets the most likely answer, the alternative, the sources for both, and a prompt to verify. Trust in the system increases precisely because it demonstrates awareness of its own uncertainty.
For detailed strategies on how to format conflicting evidence in the context window - including XML-tagged source attribution and positional ordering - see our post on context window assembly strategies.
We discovered that 12% of our internal knowledge base contained at least one factual contradiction with another document. After building contradiction detection, our support team's first reaction was alarm - then relief. They said: 'We always suspected the knowledge base had stale information, but we had no way to find it systematically. Now we have a queue of contradictions to resolve, and every one we fix makes the AI answers more reliable.'
Proactive contradiction detection vs. query-time detection
Contradiction detection can run in two modes, and production systems benefit from both.
- Proactive (offline) detection scans the entire knowledge base for contradictions on a scheduled basis - typically after each ingestion batch. It compares all triples with matching subjects and predicates, flags contradictions, and generates a report for content owners to review and resolve. This is the primary mechanism for improving knowledge base quality over time. The computational cost of proactive detection depends on the number of unique subject-predicate pairs and the average number of triples per pair. For most knowledge bases, the vast majority of subject-predicate pairs have only one triple (no contradiction possible), so the actual comparison workload is manageable even at scale. Index triples by canonical subject and predicate for efficient lookup.
- Query-time detection checks for contradictions among the specific facts retrieved for the current query. It is more focused - only examining the 5-20 chunks in the evidence set - but must complete within latency budgets. The implementation extracts triples from retrieved chunks, aligns them, and checks for conflicting objects. If contradictions are found, it restructures the LLM prompt to surface them.
Query-time detection catches contradictions that proactive detection missed (because triple extraction is imperfect and some contradictions only become apparent in context). Proactive detection catches contradictions before they affect any query. Together, they provide defense in depth.
Building a contradiction resolution workflow
Detecting contradictions is only half the problem. Resolving them requires human judgment, and that means building a workflow that routes contradictions to the right people. The key components are:
- Contradiction queue: A prioritized list of detected contradictions, ranked by impact (how many queries are affected), confidence (how certain is the system that this is a genuine contradiction vs. a false positive), and age (how long has it been unresolved).
- Domain routing: Route each contradiction to the team that owns the relevant content. API contradictions go to the platform team. Policy contradictions go to legal. Product spec contradictions go to product management. This requires a domain-to-owner mapping in your metadata layer.
- Resolution actions: When a human resolves a contradiction, they should be able to: (a) confirm one version and deprecate the other, (b) mark both as valid with different temporal windows, (c) mark the contradiction as a false positive (the system incorrectly identified a conflict), or (d) escalate if they lack the authority to resolve it.
Each resolution action should update the triple store, adjust validity windows, and retrigger any downstream processes (like re-ranking affected chunks). For organizations with compliance requirements, every resolution should be recorded in an audit trail that documents who resolved it, when, and why.
Where TypeGraph fits in
TypeGraph's data quality layer includes automated contradiction detection across your knowledge base, with both proactive batch scanning and query-time conflict identification. The system extracts SPO triples during ingestion, maintains temporal validity windows, applies configurable source authority ranking, and surfaces contradictions in LLM responses with full source attribution. Detected contradictions feed into a resolution workflow with domain-based routing and W3C PROV-compatible audit logging. The goal is to turn your knowledge base's inevitable inconsistencies from a hidden reliability risk into a visible, manageable quality improvement process.