Matt Burgess avatar

Matt Burgess

low brow entry to high brow topics

It Doesn't Remember

Mark Cuban posted a question this week that’s been doing the rounds. Why can’t Enterprise AI guarantee the same answer to the same question, every time?

The standard answer: train a specialised domain model, add human-in-the-loop verification, log the audit trail. Not wrong. But it treats the model as the primary artifact. The inconsistency problem isn’t a model quality problem. It’s a memory architecture problem. To understand why, you have to go back to the hairball.


The promise was seductive and coherent. Enterprise knowledge is relational - things connect to other things in ways that flat tables can’t represent. A customer isn’t just a row; they’re a node connected to transactions, products, complaints, locations, household members, lifetime events. Google had demonstrated the Knowledge Graph in 2012. Neo4j, TigerGraph, Amazon Neptune, Azure Cosmos - a whole vendor ecosystem emerged. Gartner put it on the hype curve. Consulting firms built practices around it. Enterprise architects might have built roadmaps with knowledge graphs at the centre.

The vision was ‘connect all your enterprise data into a unified graph’. Traverse the connections. Discover the hidden relationships that siloed systems couldn’t see. Surface the insights that lived in the space between the data, not in the data itself.

It was right about the problem. That knowledge is relational. The connections do matter. The insight does live between the nodes, not in them.

The hairball

But what actually happened in the projects was this. You built the graph. You connected the nodes. You ran the queries. And you got a hairball - a visualisation so dense with connections that no human could read it, traversals so expensive that queries timed out, and insights so buried in the structure that extracting them required specialist skills almost nobody had.

The construction cost was brutal. Ontology design - deciding what the entities and relationships fundamentally are - was inherently contested. Every team has a different view of what a ‘customer’ is, what a ‘product’ is, whether a ‘transaction’ connects to a ‘customer’ or to an ‘account.’ Before you’d built the graph you had to resolve these questions, and resolving them required organisational alignment that most enterprises couldn’t produce. Projects ran 18 months before they got any value out. Most ran out of patience first.

Then the inference problem. That vision assumed that once you had the graph, you could traverse it and reason over it - the machine would find the connections that humans missed. In practice, semantic reasoning over large graphs was slow. The semantic web dream of machines reasoning over linked data never materialised at enterprise scale. What you got instead was graph traversal for well-defined queries - useful for fraud detection and recommendation engines, but not for the general ‘surface-for-me-some-hidden-insights-from-enterprise-knowledge’ promise.

Auto-generated description: Charlie Day conspiracy meme: frantically pointing at a chaotic, string-covered, white board.

The specific use cases that worked were narrow. The general knowledge management use case - ‘connect everything and discover what we know’ - produced expensive infrastructure, impressive B-of-the-Bang architecture diagrams, and maybe some dashboards nobody used.

By 2021 most enterprise graph projects had either narrowed to the specific use cases where graphs genuinely worked, or were quietly shelved. The hairball became shorthand for what happens when you try to make graphs do too much.

ChatGPT lands in November 2022. Within six months the enterprise technology conversation has shifted completely. LLMs can answer questions about connected knowledge without requiring explicit graph construction. Vector databases and RAG offer semantic retrieval without ontology engineering. The narrative becomes: you don’t need to build a graph to connect your enterprise knowledge, you just embed your documents and let the model find the connections.

Graph adoption plateaus. Projects that had been limping along get cancelled. The CTO who had championed the knowledge graph in 2019 is now championing the LLM platform in 2024, and nobody wants to hear about graphs.

Auto-generated description: Two Spidermen are pointing at each other in confusion, with a police van and crates in the background.

The conclusion made complete sense. If graphs had been struggling to do the intelligence work for five years, and LLMs arrived and did it in an afternoon - why would you keep the graph? The reasoning was sound. The premise was wrong.

A retrospective

The 2019 knowledge graph was trying to be an intelligence layer. It was trying to do the reasoning, surface the insights, answer the questions. That’s hard, expensive, slow, and the intelligence was brittle. It was the wrong job.

The right job for a graph is not intelligence - it’s memory. Holding the structure that makes intelligence trustworthy. These are completely different design requirements.

An LLM does what a graph couldn’t. A graph holds what an LLM can’t.

Specifically, the graph holds things the LLM structurally cannot:

Typed relationships. Not ‘these two things are semantically similar’ - which is all a vector embedding (an LLM) gives you - but ‘A-funded-B-on-date-X-in-the-context-of-decision-Y’, and that-decision-was-made-by-person-Z-who-held-these-views-at-this-time.' The relationship type, its provenance, its temporal context. Semantic similarity collapses all of that into proximity in vector space. The graph preserves it as structure.

Chain of custody. The directed path from original signal to current artifact, with every transformation recorded as a named edge (a recorded connection). This signal came from this source, was interpreted by this person, was encoded in this system, and was modified at this crossing. An LLM has no memory of any of this once it’s processed the context. A graph holds it permanently.

Temporal history. What was true at time T; what changed at time T+1; what was lost in the transition. An LLM lives in a perpetual present - its context window is now, and what-came-before only exists if someone puts it in the window. A graph holds the history as structure, not as context.

Contradiction structure. Two signals pointing in different directions about the same thing, held explicitly in tension rather than collapsed into a single averaged representation. A financial frame says ‘the deal looks clean’ or an operational frame says ‘something is wrong’. In a vector space (an LLM) these produce a confused proximity. Whereas in a graph they’re two nodes with different relationship types to the same entity, and the tension between them is itself a named structure.

The missing signal. What was captured and then, wasn’t. What arrived and got lost in normalisation. The graph can hold an absence as a named thing - a node with no edges where edges should be, a relationship type present in one period and absent in another. An LLM cannot tell you about what isn’t there.

Cross-session continuity. A pattern confirmed in 2019 is explicitly connected to the contradicting signal emerging in 2026. An LLM session just has no access to this unless someone deliberately loads it into the context window. A graph makes it structurally accessible, because the 2019 confirmation and the 2026 signal are both nodes in the same persistent structure.

Wrong job. Right job

The hairball was a graph doing the wrong job. When folks (me included) tried to make the graph the intelligence layer - to force it to produce insight through its own traversal and reasoning - then we needed to connect everything to everything, because one doesn’t know in advance what connections will be meaningful. Voila. Hairball. Because comprehensiveness is the design requirement, and comprehensive connection at scale is visually and computationally intractable.

Auto-generated description: A large dog is leaning over the engine of a car with the caption I HAVE NO IDEA WHAT I'M DOING.

Change the design requirement from comprehensiveness to fidelity and you get a completely different graph - leaner, typed, provenance-rich. You connect what-you-actually-know-is-connected, with typed relationships, with provenance, with temporal context. The graph doesn’t need to be traversed by humans. It doesn’t need to produce visualisations. It just needs to be queryable by the LLM as a source of structured, grounded context. The LLM does the intelligence over that, doing the reasoning over something with actual provenance rather than over whatever-arrived-in-the-session.

A Seven-Year Itch

The LLM won the intelligence argument. Completely. It is the most powerful processing layer ever built for enterprise knowledge. But it doesn’t remember. The graph is the memory. It just spent a decade being asked to be the intelligence too. That failure obscured what it was actually good for. An LLM needs a memory layer to be trustworthy at the specific things that matter for decisions - provenance, chain of custody, contradiction structure, temporal history, the missing signal. Better intelligence makes reliable memory more important, because the intelligence is only as trustworthy as what it’s grounded in.

An LLM doesn’t make the graph redundant. It makes it more necessary.

Auto-generated description: Reverse Homer appears out of the same bush.

So back to Cuban’s question. You can’t get consistent answers from a model that has nothing stable to read from. A domain-specific LLM answer still treats the model as primary. A graph-as-memory-layer answer treats the original signal as primary - and gives the model something it can actually be consistent about.

I built Stratum. If the LLM needs a memory layer to be trustworthy, the question is: what’s worth anchoring it to? My answer, after a long time looking at where signal gets lost, is the customer. Not because it’s the only domain - but because the gap between what a customer expresses and what survives to the decision meant to serve them is the widest, least visible, and most expensive gap in most organisations.

The expression is the immutable coupling. Customer-researcher interpretation, product response, commercial viability all coexist on the same graph in typed registers - multiple disciplines reading the same evidence without any one view replacing the others. Production teams stay tethered to the specific customer whose words shaped the work. Not because she’s the point - the firm builds for customers like her, across millions of interactions. And those decisions at scale are only as trustworthy as the specific, honest signal underneath them. Lose the verbatim, lose the ground truth.

The architecture doesn’t replace a researcher’s judgment. It holds it - across every team, every crossing, every decision downstream. The architecture is the guarantee, not the model.

Auto-generated description: A child with a thoughtful expression is shown in front of potted plants, accompanied by the text WHY NOT BOTH?

Follow along: mattburgess.micro.blog/subscribe… · mattburgess.micro.blog/feed.xml · micro.blog/mattburge… · Mastodon @mattburgess@micro.blog