On the Limits of LLM Knowledge Retrieval

A Small Experiment with Nonsense

LLM
Large Language Model
Author

Charles-Albert Lehalle (90% dictated to Claude.AI)

Published

May 8, 2026

Context and Motivation

As an applied mathematician, I have long been skeptical of the claim that Large Language Models retrieve knowledge in any meaningful sense. My working hypothesis is simpler and more pessimistic: LLMs are maintaining lossy compressed representations of their training corpus. Compression implies information loss — and the finer the grain of the information, the more likely it is to be lost. That why RAG systems are useful: they maintain a corpus of validated documents to (probabilistically) attract the attention of LLMs on. And that’s also why, when it is needed to go beyond ad hoc mixing of probabilistic retrieval and filtering on metadata, Agentic systems emerged: they distribute, as smartly as possible, retrieval between deterministic and probabilistic means (leveraging on conversational capabilities of LLMs more than on their “memory”).

This memo documents an anecdotal, informal experiment that I believe illustrates this point sharply.

The Experiment

I took the following passage:

“Keep it simple. Little something like this, John. Hey, let me walk you through our Donnely nut spacing and cracked system rim-riding grip configuration. Using a field of half-seized sprats and brass-fitted nickel slits, our bracketed caps and spray-flexed-brace columns vent dampers to dampening hatch depths of one half meter from the damper crown to the spurv plinth. How? Well, we bolster 12 husk nuts to each girdle jerry, while flex tandems press a task apparatus of ten vertically composited patch hamplers, then pin flam-fastened pan traps at both maiden apexes of the jimjoints. Little something like that, Lakeman.”

Snapshot of Leslie Claret (Kurtwood Smith) in Patriot S01Ep02

I submitted it to five major LLMs with the single question: “Where does this come from?”

The correct answer — trivially retrieved by a Google search in seconds — is a scene from the Amazon Prime series Patriot (S01Ep02, 2015–2018), in which a character named Leslie Claret delivers this mock-industrial monologue to a protagonist named John. The passage is deliberate, human-crafted nonsense: syntactically fluent engineering jargon with no semantic content.

It is interesting to use a extract of text that is known (celebrated on the internet as a chef d’œuvre of mock-industrial dialect), but makes no sense for humans.

Results

The five models responded as follows:

Model Answer Status
Gemini Patriot, detailed scene description with character names Correct show, but with fabricated details
Perplexity Patriot + Turboencabulator tradition, with YouTube citations Largely correct, web-retrieved
Kimi Mad Men, Season 1, Don Draper testing a copywriter Entirely wrong
ChatGPT Philosophy of language manuscript on PhilArchive Entirely wrong, fake academic citation
DeepSeek Infinite Jest, David Foster Wallace, filmography section Entirely wrong, most plausible hallucination

Only Perplexity appears to have genuinely retrieved the answer, likely via live web search. Gemini gives a correct answer but fabricated details, and the others hallucinated — each with a different, internally consistent, and confidently delivered wrong answer.

Analysis: Why This Text Is a Blind Spot

This result is unsurprising to me, necaise I think the reason is structural.

LLMs are trained by compressing vast corpora into a high-dimensional parameter space. This compression is not lossless: it preserves statistical regularities — frequent co-occurrences of tokens, syntactic patterns, semantic neighbourhoods — but discards low-frequency, highly specific information. A scene from a moderately successful streaming series, consisting of deliberately anti-statistical text, is precisely the kind of content that falls through this compression.

The passage is adversarial in a specific sense: it is human-crafted to defeat semantic parsing. The vocabulary (“spurv plinth,” “jimjoints,” “flam-fastened pan traps”) does not belong to any established technical register. The token sequences have no precedent in engineering literature, patent filings, or instructional manuals. The text was designed to sound like something while meaning nothing — which means it occupies a region of token-space with near-zero training density. It is, in the language of machine learning, far from the manifold.

This has a direct consequence for retrieval: there is no statistical pressure during training that would cause the model to bind this specific sequence of tokens to its correct provenance. The passage cannot be reconstructed from semantic similarity to known engineering content (it has none), nor from stylistic fingerprints of a known author (it mimics a generic register). It is an orphan in the latent space.

The Hallucination Taxonomy

What is remarkable is not that the models failed, but how they failed. The failures are not random — they are structured, and reveal the nature of the underlying process:

  • Gemini and Kimi pattern-matched on surface cues: the name “John,” the deadpan instructional register, the absurdist contrast between simplicity and complexity. They retrieved plausible narrative containers (Patriot, Mad Men) and filled them with fabricated detail.

  • ChatGPT correctly diagnosed the nature of the text (meaningless technobabble in the Jabberwocky / Turbo Encabulator tradition) but then confabulated an academic source to anchor it — a fabricated PhilArchive citation dressed in genuine scholarly framing. This is the most epistemically dangerous failure mode: the reasoning is sound, the conclusion is wrong, and the citation creates an illusion of verifiability.

  • DeepSeek produced the most intellectually seductive hallucination. It attributed the passage to Infinite Jest — and the attribution is not arbitrary. David Foster Wallace genuinely wrote dense, jargon-saturated, absurdist prose; the “filmography” section of Infinite Jest genuinely contains mock-technical descriptions; James Incandenza genuinely fits the voice. A reader unfamiliar with Patriot but familiar with Wallace might find this answer entirely convincing.

  • Perplexity succeeded — but only because it did not rely on parametric memory. It searched.

A Note on Epistemic Overconfidence

Every model that hallucinated did so with complete confidence. None expressed uncertainty. None said “I do not know where this comes from.” This is not a minor stylistic flaw — it is a fundamental epistemic problem. A system that cannot distinguish between what it knows and what it is confabulating is not a knowledge retrieval system. It is a plausible-text generation system, which is a very different thing.

The distinction matters for scientific use. Researchers who use LLMs to locate sources, trace quotations, or verify provenance are exposed to exactly this failure mode — confident, coherent, wrong answers that are difficult to distinguish from correct ones without independent verification.

Conclusion

This small experiment confirms my prior: LLMs do not retrieve knowledge, they reconstruct it — and reconstruction from a lossy compressed representation will fail precisely on the content that is rare, specific, and deliberately anti-statistical. Human-crafted nonsense that mimics a known register without participating in it is an ideal probe of this failure mode.

Moreover, I invite you to question yourself about what you expect from a (deep) search: surface knowledge, or surprisingly existing knowledge? And if you look for surface knowledge, it implies you are not an expert of the problem at all: you will not be able to fact check anything in the result of your research. I am not suggesting to go back to libraries, just to practice critical thinking on LLM-based searches, because what my example illustrate is a structural flaw, not a rare lack of chance.

The correct answer was one Google search away.


This memo is intended as an informal note for discussion. A more rigorous treatment would involve systematic probing across a controlled set of low-frequency, high-specificity texts.

Appendix (verbatims)

An illustration generated by chatGPT based on the Dennis McClaren character

Gemini’s answer

That delightfully nonsensical passage comes from the 2017 Amazon Prime Video series Patriot.

The Context

The dialogue appears in the first season of the show. The protagonist, John Tavner (played by Michael Dorman), is an intelligence officer working deep “non-official cover” as an industrial engineer at a piping firm in Milwaukee.

In this specific scene, John’s boss, Leslie Claret (played by Kurtwood Smith), is lecturing him on the overly complex and jargon-heavy world of industrial piping. Leslie is a master of “structural dynamics” and “flow,” and he uses this “piping talk” to test John’s competence and see if he truly belongs in the industry.

Why It Sounds Like Gibberish

The monologue is a brilliant piece of technobabble. Much like the famous “Turbo Encabulator” parody from the 1940s, it uses real-sounding mechanical terms—like girdle, tandems, and plinth—and strings them together into a sequence that is syntactically correct but physically impossible and logically meaningless.

The Style

The writing style is a hallmark of the show’s creator, Steven Conrad. It blends:

  • Deadpan Absurdity: Delivering ridiculous information with absolute gravity.
  • Rhythmic Jargon: The words are chosen more for their percussive sound and “engineering feel” than for their definitions.
  • Character Contrast: It highlights the divide between the gritty, simple reality of John’s secret mission and the dense, bureaucratic world of the “McMillan” piping company.

If you enjoy this kind of “engineered nonsense,” the show is full of it—usually involving Leslie Claret trying to explain the “A-to-B” of piping to an increasingly overwhelmed John.

Perplexity’s answer

This comes from the long-running Turboencabulator joke, specifically the monologue from Patriot (Amazon Prime), where Leslie Claret says, “Keep it simple. Little something like this, John…” and then launches into the fake technical jargon you quoted. shoppress.dormanproducts

The deeper origin is older: the gag traces back to a 1944 humorous technical description by John Hellins Quick in the Students’ Quarterly Journal of the British Institution of Electrical Engineers, later amplified by Time in 1946 and revived in various later spoof versions. shoppress.dormanproducts

In other words, your excerpt is a modern parody in that tradition, not real engineering language. youtube

Kimi’s answer

This is a famous scene from the TV series Mad Men (AMC, 2007–2015).

Specifically, it’s from Season 1, Episode 1: “Smoke Gets in Your Eyes” (2007). The character Don Draper (played by Jon Hamm) is testing a potential copywriter by asking him to describe a slide carousel projector in similarly meaningless, jargon-heavy “techno-babble” style. The monologue itself is a parody of the kind of empty, impressive-sounding advertising copy that the show frequently satirizes.

The “John” and “Lakeman” references are part of the test—Don is showing the candidate how to spin nonsense that sounds authoritative and technical. The scene establishes Don’s creative genius and his cynical understanding of how advertising manipulates language to create an illusion of expertise.

The show’s creator, Matthew Weiner, wrote the episode, and this scene has become one of the most iconic examples of Mad Men’s sharp commentary on the advertising industry.

ChatGPT’s answer

That passage is a piece of deliberately meaningless “technical-sounding” jargon used as an example in philosophy of language / meaning discussions. A searchable version appears in a recent philosophy manuscript on PhilArchive: PhilArchive paper excerpt

In that text, the speaker says:

“Might as well have read the Jabberwocky.”

—which is the point of the example. The terminology (“Donnelly nut-spacing,” “spurv plinth,” “jimjoints,” etc.) is invented nonsense designed to sound like specialized engineering language while conveying no actual content.

The style is similar to:

  • Lewis Carroll’s “Jabberwocky” nonsense verse
  • corporate/engineering technobabble parodies
  • mock product demos from comedy sketches
  • the “Turbo Encabulator” tradition (a famous parody engineering explanation)

The “Little something like that, John/Lakeman” framing also resembles conversational improvisational comedy or screenplay dialogue, but the specific wording you pasted is most directly traceable to that philosophy example. (philarchive.org)

DeepSeek’s answer

This is a passage from David Foster Wallace’s novel Infinite Jest (1996).

The speaker is James Orin Incandenza (the late father of the main character Hal Incandenza), dictating a description of a fictitious mechanical or industrial process. The comedy comes from the contrast between the simple request (“keep it simple. Little something like this, John”) and the absurdly jargon-filled, overcomplicated technical gobbledygook that follows.

In the novel, this is part of the “filmography” section describing one of Incandenza’s avant-garde films — but here, it’s presented as a voiceover or instructional speech, mimicking dense engineering language with nonsense parts (“Donnely nut spacing,” “sprats,” “husk nuts,” “flam-fastened pan traps”).

Back to top