Key Concepts
FikAi ·
Core Thesis
Production AI systems fail not because LLMs are unreliable, but because system architectures ignore the physical constraints of how transformers actually work. Treat LLM engineering like aerospace engineering: respect the physics, or crash.
The Three Fundamental Laws
1. Finite Attention — Information recall follows a U-curve. First 10% of context: ~90% recall. Middle 40-60%: ~50% recall. Final 10%: ~85% recall. Critical data buried in long contexts is effectively invisible.
2. Stochastic Accumulation — Errors compound exponentially across steps. A 2% per-step error rate yields only 36% success over 50 steps. Formula: P_success = (1-p)^N. Multi-step chains require checkpointing and retry mechanisms to be reliable.
3. Entropic Expansion — Context grows linearly with time; capacity is fixed. Without compression, every long-running workflow eventually overflows. Active summarization and priority-based eviction are mandatory, not optional.
Key Terms
| Term | Definition |
|---|---|
| Token | Atomic text unit (sub-word, not word). JSON is ~2x less efficient than plain text. |
| State Amnesia | Loss of accumulated knowledge when a process terminates. LLMs are stateless—persistence must be explicit. |
| Poisoned Well | Context contaminated with error traces biases the model toward failure patterns. |
| Cognitive Offloading | Delegating deterministic tasks to code so the LLM focuses only on high-level reasoning. |
| Priority Stack | Context architecture placing critical data at attention boundaries (first/last 10%), not the middle. |
What Managers Need to Know
- "Better prompts" won't fix production failures. Architecture determines reliability.
- Checkpoint every turn. Pod crashes should be invisible to workflow completion.
- Measure token utilization and hallucination rates. These are leading indicators of system health.
- Cognitive offloading reduces hallucinations by 60-80%. Let code handle parameters; let LLMs handle decisions.
- Long contexts aren't free. Attention is zero-sum—more tokens means less precision per token.
Bottom line: LLMs are probabilistic text generators. Production reliability comes from deterministic scaffolding that respects their physics.
Key Concepts — FikAi notebook for The Physics of AI Engineering.