And if you overload that small window, the big reasoning collapses. Yet the model will still output big, because it has no mechanism for saying “I lost the plot.” It just keeps talking.
So the real dynamic is:
- Listens: small
- Reasons: big — if not overloaded
- Outputs: huge — even when wrong
That’s the starting point for understanding LLMs: they’re capable of expansive thought built on a narrow channel of input, and everything depends on respecting that channel.
How small?
Think Apple ][ small.
Not because the hardware is similar — it isn’t. But because if you want an LLM to behave, you must approach the prompt with the same discipline people used when they had:
- tiny working memory,
- strict ordering requirements,
- fragile execution,
- linear traversal,
- and visible failure when they exceeded limits.
You don’t need nostalgia. You need the mindset.
1. The Illusion of Abundance
LLMs look enormous because their output is enormous. They generate long reports, massive analyses, well-formed articles, and expansive arguments. It feels like you’re talking to something with a vast internal workspace.
But the visible output is not the working memory. When you overload the front of the prompt with:
- too many instructions
- too many constraints
- too many rules
- too much ontology
- too much cleverness
- too much structure
…you get:
- drift
- forgetting
- smoothing
- contradictions
- hallucinations
- overconfident nonsense in a calm voice
- and answers that sound compliant but aren’t
The model didn’t “refuse.” It simply ran out of reliable cognitive bandwidth. It listens sharply to a little, and poorly to a lot. Too much prompt? You get fluent failure.
2. Why the Apple ][ Is the Right Mental Model
The Apple ][ had 32K of RAM. If you exceeded memory, the program simply did not fit. The failure was visible.
And if you wanted a subroutine to be fast and accessible from anywhere in your program, you didn’t bury it on line 9400. You put it near line 1. That’s because programs were stored as a singly-linked list of lines. The interpreter walked code one line at a time from the front (head) with one efficiency trick for walking forward if the subroutine line number was greater. (For the Apple ][ geeks out there, yes I know it only checked the high byte of the line number.)
Early placement meant:
- shorter traversal,
- faster access,
- fewer pointer hops.
Placement mattered. Program size mattered. Structure mattered. And when things got too big, the collapse was loud and obvious. Your program didn’t fit and therefore didn’t run anymore.
3. The Same Constraint Shape Exists in LLMs
LLMs are absurdly more capable than an Apple ][, yet the shape of their constraints is strikingly similar:
- linear processing
- sensitive ordering
- tiny effective workspace
- high cost for bloat
- fragile behavior under overload
- no graceful failure modes
- hidden collapse
- discipline required to keep control
So while the Apple ][ had limited RAM, LLMs have limited attention bandwidth and effective reasoning span. On paper, they can take huge context windows. In practice, they only process a small, high-quality slice of that context at once. Everything else is decoration, or worse, derailment.
That’s why we think Apple ][ small. Not for nostalgia — for accuracy.
4. What Actually Breaks When You Ignore These Limits
When you assume the model can “handle more,” you push it out of its stable operating range. Symptoms:
- Conflicting instructions silently override each other.
There’s no error. The model just resolves the conflict its own way. - Salience shatters.
Too many high-level demands compete. The model latches onto whatever is easiest to satisfy. - Late constraints vanish.
They’re still “in the text,” but the model’s early internal state already hardened. - Overly clever wording burns precious bandwidth.
Every metaphor, flourish, or overloaded term requires extra interpretive work. - Structured formats hijack mode.
YAML, JSON, flowcharts: the model treats them as imperatives, not neutral formatting. - Multi-step systems flatten reasoning.
Each pass destroys latent structure and replaces it with surface text.
LLMs fail quietly. They generate plausible answers even when the reasoning behind them has evaporated. That’s why discipline matters.
5. Position Dominance: The Top of the Prompt Is the Real Kernel
Here is the part almost everyone gets wrong: the beginning of the prompt determines everything.
LLMs ingest prompts linearly. Early tokens:
- establish mode
- anchor context
- define the priority structure
- set the interpretation frame
- shape later salience
- decide which constraints dominate
Later tokens:
- must fight upstream
- are less authoritative
- often get ignored under load
Humans write:
- introduction
- body
- conclusion
Models don’t care about either introductions or conclusions. They care about what arrived early enough to shape the internal state.
If something must be obeyed, it belongs at the top. This is why moving one instruction from line 1 to line 70 can dramatically change the model’s behavior — not because the words changed, but because the relative position changed.
6. The Apple ][ Parallel: Subroutines Up Front, Prompts Up Front
On the Apple ]:
- GOSUB and GOTO both required scanning the linked list of program lines
- backward jumps re-started from the beginning
- traversal distance mattered
- important subroutines were placed at the front (with a GOTO on line 0 around the front-placed subroutines)
- shorter line numbers were literally cheaper in bytes and jumps
- placement guaranteed better performance and size
Now compare that to LLM prompts:
- linear ingestion
- early text determines the “kernel”
- late constraints compete with already-established state
- placement determines strength
- shorter, simpler wording has stronger influence
- the first part of the prompt controls everything that follows
Here’s the structural parallel:
| Apple ][ BASIC | LLM Prompts |
|---|---|
| Putting subroutines up front is optimal | Putting governance up front is mandatory |
| Bad ordering = slower or bigger program | Bad ordering = degraded reasoning |
| You see exceeding limits clearly — the program simply doesn’t fit or run | You see degradation as plausible text, which may be a hidden epistemic failure |
| Interpreter still obeys rules | Model silently drops or reshapes rules |
| Output correctness unaffected by position | Output correctness depends on position |
The lesson: Apple ][ rewarded good ordering. LLMs punish bad ordering.
7. We’re Early in the Arc — And That’s Okay
People think massive context windows will fix this. They won’t. That’s the same thinking behind:
“No one will ever need more than 640K of RAM.”
The 640K line is funny (now) not because it was wrong about RAM, but because it was wrong about which limitation mattered.
Every era of computing had constraints:
- Apple ]: memory size
- 8088: segmented addressing
- 386: protected mode and cache behavior
- Pentium: pipelining and branch penalties
- Multicore: contention and concurrency
- GPU era: memory bandwidth and parallelism quirks
Constraints never vanish. They shift.
Today’s AI systems are still early in their arc:
- effective working windows will grow
- internal representations will get more stable
- multi-hop reasoning will improve
- conflict resolution will get stronger
- salience management will mature
The arc bends upward. But arcs take time. And until the “listen small” constraint grows: think Apple ][ small. This is not pessimism. It’s operational realism.
8. Vocabulary Matters More Than People Realize
Even with perfect sizing and ordering, the words you choose affect the model’s behavior. Because inside the model:
- words activate neighborhoods
- some neighborhoods are hot and drift-prone
- some trigger narrative
- some trigger planning
- some inflate confidence
- some pull in moral reasoning
- some collapse under ambiguity
You aren’t writing definitions. You’re pulling levers in a semantic machine.
That’s why clever metaphors and elegant ontologies are often bad:
- they trigger too much
- they activate inconsistent meanings
- they require more interpretation
- they eat the small working window
- they destabilize the model under load
The best prompts use terms that are:
- small
- boring
- stable
- unambiguous
- non-metaphorical
That’s the cheapest, most reliable internal behavior.
9. We Need an AI Ontology Dictionary
Not another human glossary. We need a dictionary of what words do inside the model, not what they mean to humans.
For each word:
- what mode does it activate?
- what failures does it cause?
- how stable is it across domains?
- does it inflate confidence?
- does it shift salience?
- does it drag in unwanted baggage?
- how much interpretive cost does it impose?
- what does it suppress?
Until we have this, prompt design is guesswork. But you can start discovering this yourself:
- probe the word in different contexts
- watch for drift
- try synonyms
- remove it and see if behavior improves
- stress-test it inside constraints
- see which meanings get activated or suppressed
Patterns emerge: some words are rock-solid, some are unstable, some are dangerous, some consume too much of the tiny window. Choose accordingly.
10. The One Principle That Actually Matters
Strip away everything else: LLMs are tiny engines disguised as big minds. If you design for the illusion, you get plausible answers that conceal structural failure. If you design for the constraint, you get systems that behave reliably within their limits.
That means:
- Put governing logic at the top.
- Keep the prompt tight and simple.
- Remove ornamentation.
- Pick stable vocabulary.
- Avoid cleverness.
- Respect the tiny working window.
- Treat every token as a cost.
- Think like a constraints engineer.
Because in the end: the model listens small, reasons big (if not overloaded), and outputs huge — whether it’s right or wrong. The only way to get consistent behavior is to design for the small part.
Think Apple ][ small. The future will grow larger. But right now, this is how you build things that work.