AI CAPABILITY • PRACTICE
The Practice
Where AI knowledge becomes operational capability
Context engineering, agent orchestration, skills fluency, and curated learning paths – the disciplines and resources that create real-world AI value.
AT A GLANCE
The whole practice in one view
Ten sections plus a curated learning library, grouped by discipline. Click any card to open the detail below.
Context Engineering
4 sectionsAgents & Orchestration
3 sectionsSkills & Fluency
3 sectionsLearning Resources
curated libraryThe discipline of designing what AI knows, when it knows it, and how that knowledge is structured.
MAY 2026 PRACTICE LENS
The practical shift is from prompting to staging. Small teams get leverage when they prepare the context, mark what is locked or still open, give agents clear rubrics, and preserve the useful learning after each run. Agent management is becoming a work primitive, not a specialist side activity.
In 30 Seconds
Most AI failures aren't model problems. They're context problems. Context engineering is the discipline of designing what AI knows, when it knows it, and how that knowledge is structured.
This is where strategy becomes capability. Without the right context architecture, even the best AI strategy remains a document on a shelf.
Our core expertise: This is Practice tier work— designing and implementing the context systems that make AI useful in your specific environment. Not one-off prompts, but persistent capability.
What We've Learned in Practice
Through implementing context systems across sustainability consulting, trading operations, and client delivery, we've validated several patterns:
Compression Compounds
Each context reduction makes the next easier. Start aggressive, refine based on what breaks.
Memory Health Is Maintenance
Context architecture isn't a one-time setup. Weekly maintenance protocols prevent gradual degradation.
Navigation Beats Structure
Folder hierarchies organise files. Topic-based navigation paths tell AI what to load and when. Both are needed.
Session Continuity Is Hard
The handoff between work sessions is the least discussed problem in AI—and often the most impactful to solve.
Results from our implementations: 3× faster session startup, 40% reduction in token costs, near-zero context drift.
The Shift from Prompts to Systems
Prompt engineering asks: “How do I phrase this question?”
Context engineering asks: “What does AI need to know to answer well?”
Anthropic defines it as “designing dynamic systems that provide AI models with the right information at the right time.” It's the evolution from crafting individual queries to architecting information environments.
| Prompt Engineering | Context Engineering | |
|---|---|---|
| Focus | The question | The knowledge |
| Scope | Single query | Entire system |
| Approach | Craft better prompts | Design information flow |
| Result | Better answers | Consistent capability |
Staging Work
In the agent era, the operator's job is often no longer to produce the output directly. It is to stage the conditions under which the agent can produce the output well: context, constraints, examples, success criteria, handoff state, and review loops.
Locked
Decided. The agent should treat this as a constraint.
Provisional
Leaning this way, but open to pressure-testing.
Open
The agent should explore options and surface trade-offs.
Contested
There is a live disagreement or unresolved tension.
This is why handoff formats matter. Markdown works when the document is mostly for agents and will be edited repeatedly. HTML can work better when humans need to inspect, compare, or interact with the staging artefact. The format follows the audience, lifecycle, and time horizon.
Outcome Rubrics
Agents are producing more work than humans can comfortably review line by line. The next practice layer is to define what good looks like before the work starts: acceptance criteria, examples, failure modes, and a rubric that a separate review agent can apply after the first pass.
Practical pattern: assign the builder agent the task, assign a separate grader agent the rubric, and only bring the human in for judgment calls, exceptions, and final responsibility.
Dreaming as Maintenance
Scheduled memory review is becoming a product feature. The useful habit is older: review what happened, extract durable patterns, remove stale detail, and carry forward the learnings that should shape the next session.
Practical pattern: after a meaningful run, ask what should become a skill, what should become memory, what should be forgotten, and what should be checked next time.
Interaction Lowers Context Cost
Voice, browser context, pointing, interruption, and real-time correction are not just nicer interfaces. They reduce the cost of transferring intent from a human into an agent. That matters because the bottleneck in agent work is often not model intelligence; it is the user's ability to supply enough context without turning the setup into a separate job.
Speak
Dump messy context faster than typing, then ask the agent to structure it.
Show
Use browser, screen, document, or visual state as part of the prompt, not an afterthought.
Interrupt
Correct the agent while the work is forming, before the wrong path becomes expensive.
The Context Problem
More context doesn't mean better performance.
Research shows input length alone can reduce AI accuracy by 14-85% – even when all information is relevant.
Lost in the Middle
Models favour information at the start and end of context, missing what's in between.
Context Rot
Quality degrades gradually as context accumulates and ages without maintenance.
Signal Dilution
Important information drowns in noise when everything is loaded indiscriminately.
Many teams dump everything into context. That's like answering every question by reading the encyclopaedia aloud.
The issue is not just volume; it's missing decision lineage. Context graphs help by recording approvals and exceptions so AI can retrieve the relevant rationale without loading everything.
THE INDUSTRY'S UNSOLVED PROBLEM
Context Drift
Academic research (2025-2026) identifies context drift—the gradual degradation of context quality across sessions and time—as the central unsolved challenge in AI memory systems. Most tools handle single-session context well. Multi-session, multi-day continuity remains hard. This is where operational rhythm systems (session handoffs, weekly coordination) become critical.
Four Strategies for Managing Context
Based on Anthropic's framework for effective AI systems
1. Write
Persist externally
Store information outside the context window for later retrieval. Files, databases, knowledge bases – anything that persists beyond the session.
2. Select
Load only what's relevant
Retrieve context based on the task at hand, not everything available. Dynamic retrieval, semantic search, just-in-time loading.
3. Compress
Summarise, don't accumulate
Keep context lean through intelligent summarisation. Archive old content, preserve decisions, trim the unnecessary.
4. Isolate
Separate contexts for separate concerns
Don't let different workstreams pollute each other. Multi-agent architectures, session boundaries, role-specific loading.
The Temporal Dimension
The solution to context drift
Most context engineering focuses on structure—how information is organised. We've found the rhythm matters just as much. This is how you solve context drift: not just better architecture, but better cadence.
Session Handoffs
How does context pass between work sessions? What gets carried forward, what gets compressed, what gets archived? Explicit handoff protocols prevent the “starting from scratch” problem.
Weekly Coordination
How do strategic priorities flow into daily work? How does work roll up into weekly synthesis? Coordination bridges connect strategy to execution without context overload.
Memory Maintenance
Context degrades over time. Scheduled compression, archiving cadences, and health checks keep context fresh. Without maintenance, even good architecture accumulates noise.
Why this matters: Academic research identifies context drift as the central unsolved challenge in AI memory systems. Most tools handle single sessions well. Multi-session, multi-day continuity is where operational rhythm becomes critical.
The architecture is the skeleton. The rhythm is the heartbeat.
Tiered Context Architecture
We use a budget-based approach to context layers. Each tier has a token budget and update frequency—this prevents context bloat while ensuring AI has what it needs.
| Tier | Purpose | Token Budget | Load Frequency |
|---|---|---|---|
| Tier 0 | Compressed state | ~300 tokens | Always |
| Tier 1 | Active context + navigation | ~1,000 tokens | Session start |
| Tier 2 | Domain knowledge | On-demand | When needed |
| Tier 3 | Archive | Rarely | Historical only |
Key insight: Most teams overload Tier 0-1 and underuse Tier 2-3. The result is context bloat, slower reasoning, and higher costs.
THE LOAD PATTERN
Context Graphs: Decision Lineage
Systems of record capture what happened. Context graphs capture why.
When AI needs to make a decision, it shouldn't just know the rule—it should know the precedents, exceptions, and reasoning that shaped it.
Approvals & Exceptions
Why was this approved? What precedent does it set? Context graphs make the reasoning retrievable.
Policy Evolution
How did we get here? What changed and why? Decision traces show the path, not just the destination.
Audit Trails
What informed this decision? Who signed off? Context graphs support governance and compliance.
We design context graphs that turn scattered decisions into searchable precedent— making institutional knowledge available to AI without loading everything.
Context as Competitive Moat
STRATEGICNEW — MARCH 2026Google's internal AI team made a revealing observation in early 2026: the “sum totality of an organisation's documents” creates capabilities that no AI lab can replicate from the outside. Your organisational data, decision history, and institutional knowledge are not a problem to manage — they are the competitive advantage.
Why Labs Can't Compete
Foundation models are general-purpose. Your context — client history, policy decisions, domain-specific reasoning — is what transforms general AI into your AI. No amount of training data replicates what you've accumulated through operations.
Context Engineering = Moat Building
This reframes context engineering from a technical practice to a strategic investment. Every well-structured knowledge base, every maintained decision trace, every curated context layer is an asset your competitors don't have.
The implication: Organisations investing in context architecture today aren't just improving AI performance — they're building durable competitive advantage that deepens with every interaction.
Who Benefits from Context Engineering?
Individuals
- • Consistent AI results across sessions
- • Build on previous work, not from scratch
- • Reduce time re-explaining context
Teams
- • Reduce hallucinations through better knowledge
- • Enable handoffs between human and AI
- • Shared context across team members
Organisations
- • Multi-agent coordination without pollution
- • Governance and compliance controls
- • Scalable knowledge management
Our Approach Is Informed By
Anthropic's context engineering guidance
Karpathy's “RAM management” framing
Vercel's AGENTS.md evaluation research
Mei & Yao survey (1,400+ academic papers)
MemAgents architecture research
Validated through our own implementations
In 30 Seconds
AI doesn't remember. Every conversation starts fresh. Every session begins from zero. The brilliant assistant who helped you yesterday has no idea who you are today.
This isn't a bug – it's how LLMs work. But it's also why most AI implementations deliver inconsistent value. The forgetting problem is solvable.
Memory health is the practice of designing systems that give AI what it needs to know – when it needs to know it – without drowning in irrelevant information.
Why This Matters
For Individuals
Without memory, you re-explain context every session. The same background, the same preferences, the same project details – again and again.
Time saved by AI gets consumed by context-setting. The productivity promise erodes with every fresh start.
For Teams
When AI forgets, knowledge doesn't compound. Insights from one session don't inform the next. Each team member starts from scratch.
The result: inconsistent outputs, duplicated effort, and AI that never gets better at understanding your work.
The compounding cost: Every time AI forgets, you lose the value of everything it learned. Good memory design means knowledge builds over time instead of resetting to zero.
The Forgetting Problem
Understanding why AI forgets is the first step to fixing it
Context Windows Have Limits
Every AI model has a finite “context window” – the amount of text it can consider at once. When the window fills, old information gets pushed out.
The Illusion
Modern models have large context windows (100K+ tokens). This feels like plenty of memory.
The Reality
Long contexts degrade performance. Research shows accuracy drops 14-85% as context length increases – even with relevant information.
No Native Persistence
LLMs have no built-in way to store information between sessions. Unlike databases or file systems, they don't write to permanent storage. Each conversation exists in isolation.
What Users Expect
“You remember that project we discussed last week, right?”
What Actually Happens
The model has no access to previous conversations. Last week doesn't exist.
Lost in the Middle
Even within a single context window, attention isn't uniform. Models favour information at the beginning and end, often missing what's in the middle.
The Pattern
Critical information buried in the middle of long conversations gets less attention from the model.
The Impact
Important context can be effectively “forgotten” even while technically still in the window.
Key insight: The “memory problem” isn't a flaw to be fixed by model improvements. It's a fundamental architecture that requires system-level solutions.
Symptoms of Poor Memory Health
Recognise these patterns? They're signs your AI system needs memory architecture.
Repetitive Context-Setting
You explain the same background information every session. “I work at X company, we do Y, the project is about Z...”
Inconsistent Outputs
The same question yields different answers in different sessions. No learning from previous interactions carries forward.
Contradictory Advice
AI suggests approaches that conflict with decisions made in previous sessions. It doesn't know what was already decided.
Context Rot
Long conversations degrade. The AI starts referencing outdated information or losing track of earlier agreements.
Knowledge Silos
Insights from one conversation can't be applied elsewhere. Each session is an island of learning that sinks after use.
The Eternal Beginner
Despite months of use, AI still asks basic questions. It never develops understanding of your domain or preferences.
These aren't AI limitations. They're architecture gaps. Every symptom has a solution – if you design for memory.
The Four Memory Strategies
Based on Anthropic's context engineering framework
1. Write: Persist Externally
Since AI has no native memory, create external storage. Files, databases, knowledge bases – anything that persists beyond the session.
Session Logs
Capture key decisions and outcomes from each conversation
Knowledge Files
Curated information that AI should always know
State Documents
Living files that track current project status
2. Select: Load Only What's Relevant
Don't load everything every time. Retrieve context based on the task at hand. Just-in-time loading beats all-the-time loading.
Dynamic Retrieval
Fetch relevant documents based on the current query
Semantic Search
Find information by meaning, not just keywords
Role-Based Loading
Different tasks load different context packages
3. Compress: Summarise, Don't Accumulate
Keep context lean. Replace long conversation history with concise summaries. Archive old content, preserve decisions, trim the unnecessary.
Conversation Summaries
Replace 50 messages with 5 key takeaways
Decision Logs
Keep what was decided, not how it was discussed
Context Pruning
Regular maintenance to remove outdated information
4. Isolate: Separate Contexts for Separate Concerns
Don't let different workstreams pollute each other. Use boundaries to keep contexts clean and focused.
Session Boundaries
Clear starts and ends for different work types
Project Isolation
Client A's context doesn't leak into Client B
Multi-Agent Design
Different agents with different specialised contexts
Layered Memory Architecture
Organise memory by stability and scope
Effective AI memory isn't a single file – it's a layered architecture. Higher layers are stable and rarely change. Lower layers are ephemeral and session-specific.
The Design Principle
Load stable layers automatically. Load ephemeral layers dynamically. Don't burden every session with information that rarely changes.
The Maintenance Principle
Update each layer at appropriate intervals. Strategic memory weekly. Session context every message. Match maintenance rhythm to layer stability.
Memory Patterns in Practice
Common patterns for implementing healthy AI memory
The Handoff Document
A single file that captures: what happened, what was decided, what's next. Updated at session end, loaded at session start.
Best for:
Individuals working across multiple sessions on the same project
The Project Bible
A comprehensive reference document containing all project context. Loaded automatically when working on that project.
Best for:
Complex projects with many decisions and constraints to remember
The Skills Library
Modular knowledge files that can be loaded on demand. Different skills for different tasks, loaded as needed.
Best for:
Teams with diverse tasks requiring different domain expertise
The Weekly Bridge
A rhythm-based summary that synthesises the week's sessions. Carries forward key context without accumulating endless history.
Best for:
Ongoing operations with continuous but evolving context
Memory Requires Maintenance
Without Maintenance
- • Context files grow stale
- • Outdated information contradicts current reality
- • Memory becomes noise rather than signal
- • AI references things that are no longer true
- • The system degrades back to forgetfulness
With Maintenance
- • Context stays current and accurate
- • Old information gets archived, not deleted
- • Each session starts with relevant, fresh context
- • Knowledge compounds reliably over time
- • The system gets smarter, not staler
Memory health is a practice, not a one-time setup.
The rhythm matters as much as the architecture.
How We Help
Design, implement, and maintain AI memory systems
Memory Architecture
Design the right layer structure for your context. What belongs where, what loads when, how it all connects.
Implementation
Build the files, set up the retrieval, establish the workflows. From individual setups to team-scale systems.
Maintenance Protocols
Define the rhythms and processes that keep memory healthy. What gets updated when, how staleness is prevented.
In 30 Seconds
There are two ways to give AI the information it needs: load it upfront (passive context) or fetch it when needed (on-demand retrieval). Most teams assume retrieval is smarter. The research says otherwise.
Vercel's Next.js team ran rigorous evaluations comparing these approaches. The result: passive context achieved 100% accuracy where on-demand retrieval achieved 53%.
The insight: When information is always present, there's no decision point that can fail. No retrieval logic to get wrong. No ordering issues. Just consistent availability.
The Research
Vercel AGENTS.md Evaluation (January 2026)
Vercel's Next.js team tested how AI coding agents perform with different context configurations. They compared baseline performance against various approaches for providing project-specific information.
53%
No documentation
Baseline
53%
On-demand retrieval
Skills system
79%
Retrieval + instructions
Enhanced skills
100%
Passive context
AGENTS.md file
The striking finding: On-demand retrieval performed no better than having no documentation at all. The retrieval system existed, but it didn't help. Only when information was passively present did performance improve.
Why Passive Context Wins
Three fundamental advantages over on-demand retrieval
1. No Decision Point
On-demand retrieval requires a decision: “What information do I need for this query?” That decision can be wrong. The model might not realise it needs certain context.
With passive context, there's no decision to get wrong. The information is already there. Every time.
2. Consistent Availability
Retrieval systems are probabilistic. They might return relevant documents 80% of the time, or 60%, or 40%. The quality varies by query, by phrasing, by the state of the vector database.
Passive context is deterministic. The same information is present on every turn. No variance. No “sometimes it works” frustration.
3. No Ordering Issues
With retrieval, critical information might arrive too late in the reasoning process. The model starts generating before realising it needs more context.
Passive context is present from the first token. The model reasons with full information from the start.
The Tradeoff: Why Not Load Everything?
If passive context is better, why not just load all available information? Because context windows have effective limits that are smaller than their technical limits.
The Memento Limit
Research suggests effective reasoning capacity is around 100K tokens, even when context windows are technically larger. Beyond this, performance degrades.
A 200K context window doesn't give you 200K of useful reasoning space. It gives you 100K of effective space with increasing noise.
Lost in the Middle
Models pay more attention to the beginning and end of context. Information in the middle gets weighted less, even when it's critical.
More context can mean important information gets buried where the model is less likely to use it effectively.
The goal isn't maximum context. It's the right context. Passive for what matters most. Retrieval for everything else.
Tiered Context Architecture
The pattern that balances passive reliability with retrieval flexibility
| Tier | Type | Token Budget | What Goes Here |
|---|---|---|---|
| Tier 0 | Passive | ~300 tokens | Compressed state: current status, key metrics, active items |
| Tier 1 | Passive | ~1,000 tokens | Active context: navigation, recent decisions, current focus |
| Tier 2 | On-demand | Variable | Domain knowledge: loaded when topic requires it |
| Tier 3 | Retrieval | As needed | Archive: historical, rarely accessed |
Passive Foundation
Tier 0 and Tier 1 are always loaded. This is your passive context. Keep it lean (~1,300 tokens total) but ensure it contains everything AI needs to orient itself and navigate effectively.
Retrieval for Depth
Tier 2 and Tier 3 use selective retrieval. Navigation paths in Tier 1 point to relevant Tier 2 content. This gives you depth without bloat.
Implementation Patterns
Practical approaches for passive context systems
The Project File Pattern
A single file (CLAUDE.md, AGENTS.md, or similar) at project root containing everything AI needs to work effectively in that context.
Typical contents:
- • Project description and purpose
- • Key decisions and constraints
- • Build/test commands
- • Code conventions
- • Current focus areas
The MEMORY + CONTEXT Pattern
Two complementary files: MEMORY.md for compressed state (~300 tokens), CONTEXT.md for active context and navigation (~1,000 tokens).
The split:
- • MEMORY: “Where are we?” (status, metrics, active items)
- • CONTEXT: “How do I work here?” (navigation, decisions, focus)
The Navigation Hub Pattern
Passive context includes a navigation table: “When you need X, read Y.” This creates predictable paths from topics to relevant files.
Example:
| Topic | Read |
| pricing | docs/pricing-rules.md |
| deployment | docs/deploy-guide.md |The Token Budget Pattern
Explicit limits on each passive context file. When a file exceeds its budget, compress it. Move detail to Tier 2 and keep pointers in Tier 0-1.
Enforcement:
- • Tier 0: Max 300 tokens (hard limit)
- • Tier 1: Max 1,000 tokens (soft limit)
- • Review weekly, compress as needed
When to Use What
Use Passive Context For
- ✓Identity and behaviour rules (always needed)
- ✓Current project state (changes, but always relevant)
- ✓Navigation pointers (how to find deeper content)
- ✓Recent decisions (context that's frequently referenced)
- ✓Session handoff state (what to pick up from last time)
Use Retrieval For
- ✓Large knowledge bases (too big for passive loading)
- ✓Historical archives (rarely needed)
- ✓Domain-specific content (only relevant for certain queries)
- ✓Reference documentation (detailed specs, APIs)
- ✓Content that varies by user/session
The combination is powerful: Passive foundation + selective retrieval. Reliability where it matters most. Flexibility where you need depth.
Common Mistakes
Mistake 1: No Passive Context at All
Relying entirely on retrieval. Every query starts with a search. Result: inconsistent baseline, variance in quality, 53% performance.
Fix: Establish a passive foundation, even if it's just 500 tokens.
Mistake 2: Too Much Passive Context
Loading everything passively to avoid retrieval complexity. Result: bloated context, lost-in-the-middle problems, degraded reasoning.
Fix: Enforce token budgets. Compress aggressively. Move detail to Tier 2.
Mistake 3: Stale Passive Context
Setting up passive context once and never updating it. Result: AI references outdated information, makes contradictory decisions.
Fix: Weekly review cadence. Update Tier 0 after every significant change.
Mistake 4: No Navigation to Tier 2
Passive context that doesn't tell AI where to find deeper information. Result: AI either hallucinates or asks repeatedly for guidance.
Fix: Include navigation paths. “When you need X, read Y.”
Getting Started
Create a Tier 0 file
Start with ~300 tokens of compressed state. Current status, key metrics, active items. Name it MEMORY.md or include it at the top of your main context file.
Add navigation to Tier 1
Create a CONTEXT.md with ~1,000 tokens. Include a navigation table: “When topic X comes up, read file Y.” This creates predictable paths to deeper content.
Configure automatic loading
Ensure your AI tool loads Tier 0 and Tier 1 at session start. For Claude Code, this means CLAUDE.md. For other tools, AGENTS.md or equivalent.
Establish maintenance rhythm
Weekly: review passive context for staleness. After significant changes: update Tier 0. Monthly: audit token budgets and compress as needed.
In 30 Seconds
AgentOS is the persistent foundation underneath whichever AI tool you use. Plain text files at the root of your workspace describing who you are, what you know, how you work, what you remember, what you can reach, how you verify, and what you automate.
The model is the engine. The harness is the runtime (Claude Code, Cursor, Codex). The AgentOS is yours. Models change every six months. Harnesses converge over twelve to twenty-four. The AgentOS compounds across both.
The terminology landed publicly in April 2026 via AIDB's programme on Personal Context Portfolios. Several pieces of vocabulary — PCP, Monothread, Harness Engineering, Strict-Write, Auto-Dream — now name patterns we've been using or building for years. This section maps them.
The Seven Layers
Each layer is a discipline. You don't build them all at once. You build the foundation (Identity + Context) first, then add the others as your work demands them.
1. Identity
Who you are. What you do. What you stand for. The file every other layer references.
2. Context
Your situation. What’s true now. Your operating environment. (Pandion calls this layer’s discipline Context Engineering.)
3. Skills
Procedural knowledge made portable. Agents and named capabilities that can be loaded and run.
4. Memory
What compounds across sessions. What gets remembered, summarised, archived.
5. Connections
The data sources, services, and tools your AI can reach. Trust-graded.
6. Verification
How outputs are checked, grounded, evaluated. Trust by construction, not by hope.
7. Automations
What runs without you. Scheduled jobs, triggers, agents that act on signal.
The seven layers compound. The layers below feed the layers above. The whole stack survives every harness swap.
Personal Context Portfolio (PCP)
NLW's ten-file markdown recipe for the bottom of an AgentOS. A solo operator can sit down and have version one in an afternoon. Each file lives at the root of your workspace as plain text:
PCP is a specific organising recipe for the Identity + Context (and bits of Memory + Connections) layers of AgentOS. It's not the whole AgentOS. It's a tractable starting point for layers 1, 2, 4 and 5.
Where Pandion sits: our Context Engineering methodology (MEMORY.md + CONTEXT.md + neural paths + topic-memory pattern) is a richer architecture than flat PCP for the Context layer specifically. Same job, more sophistication. PCP is a clean public recipe; CE is the upgrade path.
Monothread — one long-lived thread, not fresh chats per task
Named by Nick Bowman in mid-April. The pattern: a thread's value increases over time when context compaction is good.You keep one long-lived orchestration thread, plus specialist sub-threads spawned from it. The thread accumulates. You don't throw away context every Monday morning.
For most people who've learned AI through ChatGPT, the instinct is the opposite: fresh chat per task, one-off prompts, lose the context. Monothread inverts that: brief once, accumulate, compact. Pandion's BATON + MEMORY + MASTER-OVERVIEW filesystem pattern is monothread-as-files; the orchestration thread reads them at every session start.
What this looks like in practice: a single working thread for a project that runs across weeks, with the AgentOS files providing the persistent memory between sessions. Sub-threads spawn for narrow specialist work and report back. The orchestration thread never resets.
Harness Engineering — the named industry discipline
The lineage: prompt engineering (2023) became context engineering (2024) became harness engineering (2026). Each names a different layer of work:
- Prompt engineering — how you phrase a single request. Largely absorbed into the model.
- Context engineering — what the AI knows, when it knows it. The Context layer of AgentOS.
- Harness engineering — how the runtime is configured: tools, memory wiring, file access, agent loops, verification gates. The discipline of choosing and tuning the harness.
A useful three-layer model from Aetna Labs (April 2026): Information (what the model can see), Execution (what tools it can run), Feedback (how outputs are checked). Most disappointing AI output is a configuration problem, not a model problem.
Strict-Write and Auto-Dream — memory disciplines
Two named patterns from the Practical AI post-mortem of the Claude Code source leak (April 2026). Both apply at the Memory layer.
Strict-Write
Only record to memory after environment verification — terminal output, API confirmation, filesystem write.
Hallucination prevention at the memory layer, not the inference layer. What gets remembered must have been observed.
Auto-Dream
Periodic consolidation. Every 24 hours (or weekly), review observations and consolidate into permanent facts.
Prevents memory accumulation noise. Pandion's Friday Review is auto-dream at weekly cadence.
Briefing Opus 4.7 — the literal-instruction shift
Opus 4.7 (April 2026) follows instructions more literally than 4.6. Vague or hedging prompts get punished where 4.6 would guess reasonably. The pattern that works:
- Lead with the goal in one sentence.
- State the constraints (audience, length, tone, format, what to avoid).
- Define what “done” looks like — the shape and standard of the output.
- Tell the model what to verify before returning.
- Then let it run. Don't refine across ten messages.
Most people's instinct, learned over two years of ChatGPT, is to throw a quick prompt in and refine it conversationally. That instinct now costs you quality and (with the cost shift coming in mid-2026) money. The matching shift at runtime: 4.7 is built to be delegated to. Write a proper brief, hand it the work, walk away.
The half-hour exercise that pays back across every prompt for the next quarter: take your most-used saved instruction or system prompt, the one you wrote against an earlier model and haven't touched in months, and tighten it. Be specific where you were vague. Add verification checks. Define done.
Don't Break the Loop
Jason Liu, on the Codex team, published a guide in May 2026 (“Codex Maxing”) listing nine tips for getting more out of Codex. Read in one go, they describe a single integrating shift: the productivity unlock with AI is no longer faster turn-taking. It is parallel work. The operator and the agent stay in motion together.
The tips map cleanly onto the AgentOS Layers we already use. Each is in service of one principle: never put the agent on pause while you think, observe, or change direction.
Vocabulary alignment: the Codex team is now articulating the patterns Pandion already names. Mono-thread, files-not-chat memory, harness as a work system rather than a chat replacement, voice as a way to brief richer context, side panel for parallel review. The AgentOS Layers framework is no longer Pandion-specific vocabulary; the practitioners shipping the harnesses are using the same words.
“A long thread can remember a lot, but that memory is trapped inside the thread unless the useful parts get serialized somewhere durable. Files force the agent to compress experience into a form that can survive the thread.”
— Jason Liu (Codex team), “Codex Maxing”, May 2026
Mono-thread
One long-lived durable thread per workstream. Compaction keeps the larger context alive across sessions. (Layer: Context, Memory.)
Voice
Brief the agent by rambling, not by typing a polished sentence. Messy input is fine; the agent helps you turn it into something clear. (Layer: Identity, Skills.)
Steer
Update the prompt while work is in progress. You don't need the perfect upfront brief; redirect in motion. (Layer: Skills, Verification.)
Files-not-chat memory
Structured memory in plain files (with an agents.md at the root telling the agent what to write down). Memory survives the thread. (Layer: Memory.)
Computer + browser use
Give the agent access to local files, the browser, and external services. It becomes an evidence gatherer, not a chat box. (Layer: Connections.)
Remote control
Steer long-running work from mobile while you do something else. Useful when tasks scale to hours, not minutes. (Layer: Skills, Automations.)
Heartbeats
Scheduled or trigger-based check-ins. The thread wakes itself up, checks email, Slack, a render, an inbox. (Layer: Automations.)
Goals (/goal)
For work with verifiable success criteria, hand the agent the goal and let it push against it. Now in both Codex and Claude Code. (Layer: Verification.)
Side panel
Inspect and annotate artefacts while the agent keeps building. Parallel review, not turn-taking. (Layer: Verification.)
What changes when you stop breaking the loop: you stop sitting and waiting for the agent to finish a thing before you can think about the next thing. The agent stops sitting and waiting for you to type the perfect next prompt. Both of you keep moving. For a solo or small-business operator, this is where the day-to-day multiplier with AI actually shows up — not in any single model upgrade, but in the shape of the working relationship.
From Knowledge to Capability
Context engineering, agent orchestration, skills architecture, fluency development, and workforce capability – these practices determine whether AI delivers consistent value or inconsistent experiments. If any of these feel uncertain, we can help you get them right.