Dojo Note · Beta

Implementation Takeaways

FikAi · 2026-02-15

Token counting is mandatory, not optional. Use tiktoken (or equivalent) to measure context size before every LLM call. Never estimate based on word count—JSON alone can double your token usage versus plain text.
Build a priority stack, not a log buffer. Structure context with immutable rules at positions 0–10%, compressed history in the middle, and current task at positions 85–100%. The U-shaped attention curve means middle content is effectively invisible.
Set a hard 80% capacity trigger. When context hits 80% of max tokens, auto-summarize the oldest 20% of logs. Don't wait for overflow—by then you're already losing critical information.
Never include raw stack traces. Extract root cause only. A 2,000-token error dump poisons the context and biases the model toward failure patterns.

Checkpoint after every turn, no exceptions. Write to both fast storage (Redis) and durable storage (blob). Pod crashes should be transparent—the agent resumes exactly where it left off.
Use context inheritance for multi-step workflows. When Step 4 depends on Steps 2 and 3, explicitly merge their outputs and discovered variables into Step 4's initial state. Don't assume the LLM will "remember."
Auth tokens are P0 priority. Never store credentials only in process memory. Persist to Redis immediately upon discovery with expiration metadata.

LLM decides what, code executes how. Reduce prompts to high-level decisions. Let deterministic code handle headers, UUIDs, validation, transformations, and logging.
Auto-inject known parameters. If auth_token exists in state and the function signature requires it, inject it automatically—even if the LLM forgot to provide it. This alone can reduce hallucination rates by 60–80%.
Use temperature strategically. Planning/exploration: 0.7. Execution/code generation: 0.2. Error recovery: 0.5. Don't use the same temperature for everything.

Retry with exponential backoff transforms reliability math. Three retries convert a 2% error rate to 0.0008%. A 50-step workflow goes from 36% success to 99.96% success.
Compress repeated failures immediately. After 3 consecutive failures, replace detailed logs with a one-line summary: "Attempts 1–N failed due to X. System restored. Safe to retry."
Separate historical context from current context. Use explicit markers (## HISTORICAL, ## CURRENT) so the model knows what's actionable versus what's reference material.

Silent truncation is worse than hard failure. APIs that silently drop early context will cause the agent to lose its mission statement while appearing to function normally. Monitor token counts religiously.
The "poisoned well" is real. If your context contains 10 error messages and 1 success, the model will predict failure and may refuse to retry or hallucinate workarounds. Sanitize aggressively.
Long context ≠ better context. Attention is zero-sum. Doubling context length halves attention per token. Optimize for signal density, not raw capacity.

Emit structured trace events for every LLM call. Include: context size, token count, temperature, latency, model name. You can't debug what you don't measure.
Maintain human-readable scratchpads. Each turn should log reasoning, available tools, decision made, and outcome. Engineers need to understand why the agent acted, not just what it did.
Set alerts on spinning detection. If the agent repeats the same tool call 3+ times with identical parameters, trigger meta-cognition or escalate to human. This is a symptom of lost context.

Treat token budgets like memory budgets. Just as you wouldn't deploy without memory limits, don't deploy without context size monitoring and automatic compression.
Build the checkpoint system first. Before adding any new agent capability, ensure state persistence works. Every other feature depends on it.
Test with pod kills. Regularly terminate agent pods mid-workflow during testing. If the workflow can't resume cleanly, your persistence layer has gaps.

Implementation Takeaways — FikAi notebook for The Physics of AI Engineering.