AI CAPABILITY • LANDSCAPE
AI Landscape Navigator
The landscape is exploding. The gap between movers and waiters is widening.
More capability ships quarterly than used to ship in years. New models, tools, and platforms appear weekly. Competitive advantage is materialising now – not “someday.”
This living reference helps you understand what's out there, what matters, and where we stand. Not exhaustive – navigational.
A Living Reference
This page is updated as the landscape evolves. It reflects our current understanding and experience, not comprehensive market research. We include tools we've used, evaluated, or tracked closely. Last updated: May 2026.
MAY 2026 LANDSCAPE SHIFT
The useful question is no longer: which model is best?
For solo operators and small teams, the practical landscape question is now: which combination of tools gives reliable work at a sensible cost, with enough privacy, portability, and interaction quality for the work you actually do?
Capacity shows up as limits
Compute scarcity appears as rate limits, latency, pricing changes, outages, and tool-routing decisions.
Interaction changes adoption
Voice, screen context, interruption, and live correction make AI easier to use in real working situations.
Portability beats loyalty
The best small-business setup is rarely one model forever. It is a simple, portable harness that can route work well.
AT A GLANCE
The whole landscape in one view
Ten categories, grouped by where they sit in your day-to-day work. Click any card to open the detail below.
Models & Providers
1 categoryBuilding & Creating
2 categoriesWorking & Integrating
7 categoriesThe AI models powering everything — who builds them, what they’re good at, and how they compare.
ANTHROPIC
Claude models
Anthropic is the clearest work-AI story of 2026. In a single week in late May the picture shifted again: Andre Karpathy joined the pre-training team to use Claude to accelerate AI research itself; Anthropic reported its first profitable quarter (the first for any foundation lab) on a $44B annualised run-rate; and the SpaceX compute partnership deepened with a $45B three-year contract for Colossus 1 and Colossus 2 capacity (~$1.25B/month, ramping May-June). Capability, harness quality, and compute supply now move together, and compute access is a literal balance-sheet item rather than analyst commentary.
Examples: Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5
Strengths:
- +Frontier reasoning, vision, and document handling (Opus 4.7)
- +Practitioner-grade tools: Claude Code, Claude Design, Routines, Skills, MCPs
- +1M token context window; xhigh effort tier; Task Budgets (beta)
- +SpaceX / Colossus 1 + Colossus 2 partnership ($45B over 3 years) expands inference capacity through 2029
- +First profitable quarter of any foundation lab (Q2 2026: $10.9B revenue, ~$559M operating profit)
- +Karpathy joining pre-training team to use Claude to accelerate AI research
Considerations:
- •April 1 2026 supply-chain incident (Claude Code auto-update shipped a hostile package for 3 hours)
- •Premium pricing for frontier models
- •Even after the SpaceX deal, token demand continues to outstrip available supply
- •Profitability is partly a function of being supply-constrained, not just demand strength
Our view: Our primary platform. We run Claude Code as our business operations hub, orchestrating strategy, research, delivery, and knowledge management daily. The April product wave made Claude a practitioner-grade work platform; the May SpaceX deal turned compute capacity into product strategy; the late-May Karpathy hire and profitability disclosure shifted market expectations more than any single product release this year.
OPENAI
GPT models & ChatGPT
OpenAI pioneered the current AI era and continues its strategic pivot to work AI. GPT-5.5 is broadly comparable to Opus 4.7 on most everyday tasks and slightly cheaper. May strengthened the interaction layer: GPT Realtime 2, Realtime Translate, and Realtime Whisper make voice and live context transfer more central, while Codex added long-running goal loops and browser access. OpenAI is also formalising enterprise deployment through Deploy Co, a signal that model access alone is not enough.
Examples: GPT-5.5, GPT Realtime 2, Codex, GPT Image 2, ChatGPT Plus
Strengths:
- +Largest consumer ecosystem and integrations
- +GPT-5.5: comparable to Opus 4.7 on everyday tasks, slightly cheaper
- +GPT Image 2: legible text in images (signage, slide labels, packaging)
- +Realtime 2, Realtime Translate, Realtime Whisper, Codex /goal, and browser extension
Considerations:
- •Anthropic now ahead on ARR ($30B vs OpenAI $25B); positioning is reversed
- •Sora shutdown signalled compute scarcity trade-offs
- •Microsoft dependency remains a concentration risk
Our view: The ecosystem leader on consumer reach and a serious work-AI competitor. GPT-5.5's price-performance makes multi-model routing compelling, while Codex and Realtime push OpenAI towards persistent, interactive work systems rather than chat alone.
Gemini models + a sprawling AI product line
Google brings deep AI research heritage, the broadest distribution surface in technology, and (after Google I/O 2026) the most sprawling AI product line of any provider. The Gemini app jumped from 400M monthly active users (May 2025) to 900M (April 2026); monthly tokens processed across Google surfaces went from 480 trillion to 3.2 quadrillion in the same window. The scale advantage is real. The product clarity is not: a small-business operator now has to choose between Gemini, Gemini Advanced (AI Pro), Gemini Business (Workspace), AI Ultra, Spark, Anti Gravity 2.0, AI Studio, Jules, Flow, Veo, Omni, Nano Banana Pro, Google Pics, NotebookLM, and AI Mode in search — many overlapping, several launched without release dates, all evolving fast.
Examples: Gemini 3.5 Flash, Anti Gravity 2.0, Spark, Omni, Nano Banana Pro, AI Studio, Jules, Flow, Veo, Google Pics, NotebookLM, AI Mode (search)
Strengths:
- +Largest distribution surface in consumer AI (900M Gemini app MAU; 3.2 quadrillion tokens/month)
- +TPU compute moat — now externalised as a business line, not just internal capacity
- +Native multimodality and very long context windows
- +Omni (announced May 2026): editing-first multimodal model — a "Nano Banana for video"
- +Anti Gravity 2.0: agent-first standalone desktop app with multi-agent teams and scheduled tasks (parity with Claude Code / Codex, not yet leadership)
- +NotebookLM remains a category-defining product for research, study, and synthesis
Considerations:
- •Product sprawl is the dominant problem — the I/O 2026 lineup is genuinely hard to navigate, even for AI-fluent users
- •Gemini 3.5 Flash benchmarks well on Terminal Bench 2.0 (76.2%) and is state-of-the-art on OS World, but pricing has shifted (3x cost of previous Flash, 20x cost of 2.0 Flash) — speed is no longer paired with low cost
- •Spark and several other I/O launches announced without release dates
- •Google Ultra plan (May 2026) now uses compute-based usage limits; agentic tools (Anti Gravity, Flow) on usage-limit model — the subsidy era is ending here too
- •Strategic uncertainty: an internal split between Hassabis (world-models / robotics / continual learning) and a coding-agent-led RSI direction means priorities may shift again
Our view: Google may win consumer AI by sheer distribution: it already touches consumers everywhere, and Gemini scale numbers are remarkable. For solo and small-business operators, however, the product sprawl is the unmet need. Choosing what to use for what is now harder than using it. This is the single clearest argument all year for an AI-navigator role — a guide who can map the landscape rather than build everything inside it.
THINKING MACHINES
Interaction models
Thinking Machines Lab introduced a distinct model category in May 2026: interaction models trained from scratch for continuous, time-aware exchange rather than turn-based chat. The architecture pairs a foreground interaction model with a background model doing longer reasoning, browsing, and agentic work. The important signal is not raw benchmark performance; it is the shift from "prompt in, answer out" to AI that can notice, interrupt, translate, correct, and keep working while the human keeps talking.
Examples: TML Interaction Small, real-time video + speech, background model pairing
Strengths:
- +Real-time audio and visual proactivity
- +200ms micro-turns rather than conventional turn-based chat
- +Foreground interaction plus background reasoning architecture
- +Strong fit for meetings, training, education, coaching, and live collaboration
Considerations:
- •Early-stage lab, not yet a general platform choice
- •Frontier labs may copy the abstraction quickly
- •Commercial deployment path still unclear
Our view: A category signal more than a vendor recommendation today. Interaction is becoming capability, not interface polish. This belongs in the landscape because it changes what practitioners can expect from future harnesses.
XAI
Grok models
xAI has shifted from pure model challenger to infrastructure signal. Grok remains integrated with X, but the May 2026 Anthropic / SpaceX partnership reframed the story: xAI / SpaceX has enormous compute capacity, while Anthropic has stronger model and harness demand. Elon has also indicated xAI will be dissolved as a separate company into SpaceX AI. Treat xAI less as a dependable frontier model platform and more as a window into AI compute infrastructure.
Examples: Grok 4, Colossus 1, Colossus 2, SpaceX AI
Strengths:
- +Real-time information from X
- +Colossus 1 and Colossus 2 make SpaceX a meaningful compute actor
- +Potential path towards orbital and vertically integrated AI compute
- +Less guardrails on topics
Considerations:
- •Limited enterprise features
- •Tied to X ecosystem
- •Significant organisational fragility (9/11 co-founders departed)
- •Grok has not kept pace with the strongest model + harness combinations
Our view: Do not depend on Grok as a core work platform. Do monitor SpaceX AI as infrastructure: compute supply is now a strategic lever in the AI race, and Elon may be more consequential as a compute operator than as a model builder.
META
Llama models (open-source)
Meta's open-source approach has democratised access to capable models. Llama can be run locally or on private infrastructure, offering control and privacy that hosted APIs cannot.
Examples: Llama 4 Maverick, Llama 4 Scout, Llama 4 Behemoth (preview)
Strengths:
- +Open source and customisable
- +Can run locally/privately
- +No per-token API costs
- +Growing ecosystem
Considerations:
- •Requires technical expertise to deploy
- •Smaller models than frontier APIs
- •Self-managed infrastructure
Our view: Important for privacy-sensitive deployments and organisations with technical capability.
MISTRAL
European AI models
European-founded AI company offering competitive models with strong performance-to-cost ratios. Open-weight models available for self-hosting, with API access for convenience. Le Chat Pro is one of the privacy-first tools commonly used on the "private side" of a Public/Private wall for solo regulated practices.
Examples: Mistral Large 3, Mistral Medium 3, Mistral Small 3.1, Le Chat Pro
Strengths:
- +European data sovereignty option
- +Strong price/performance
- +Open-weight models available
- +Le Chat Pro: privacy-first option for client-confidential work
Considerations:
- •Smaller ecosystem than US providers
- •Enterprise features still developing
Our view: Good option for European data sovereignty requirements. Le Chat Pro features prominently on the private side of the Public/Private wall pattern (see /ai/foundation).
APPLE
On-device AI + Private Cloud Compute
Apple announced its CEO succession in April 2026: hardware VP John Ternus replaces Tim Cook (rather than software-side or COO Jeff Williams). The signal is structural: Apple is betting on on-device silicon plus Private Cloud Compute, not the frontier-lab race. Apple Foundation Models run inside the device for most tasks; harder workloads route to Apple's own private cloud with verifiable guarantees that data stays out of training. For solo practitioners handling protected client data, this is one of the most consequential strategic signals of 2026.
Examples: Apple Foundation Models, Apple Intelligence, Private Cloud Compute
Strengths:
- +On-device by default for most tasks (privacy by architecture)
- +Private Cloud Compute with verifiable hardware-rooted guarantees
- +Tight integration across Apple ecosystem (iOS, macOS, iCloud)
- +Hardware-first strategic positioning under Ternus
Considerations:
- •Apple ecosystem only
- •Less raw frontier capability than dedicated lab models
- •Still relatively new vs. Anthropic / OpenAI / Google
Our view: Watch closely. The on-device AI path becomes more compelling every quarter, particularly for solo regulated practitioners and personal-life users where privacy and offline capability matter. Apple Intelligence is a natural complement to Lumo, Mistral Le Chat, and Maple on the private side of a Public/Private wall.
DEEPSEEK
Chinese frontier AI at fraction of the cost
DeepSeek shook the AI industry by producing frontier-competitive models at a fraction of US lab costs. DeepSeek V4 shipped on 27 April 2026 in Pro and Flash variants, priced at less than one-seventh the cost of Opus 4.6 for roughly one-generation-behind capability. R1 (reasoning) matches earlier o1 performance; V3 rivals GPT-4o. The arithmetic is now unambiguous: for routine tasks where "good enough" is genuinely good enough, DeepSeek changes the cost calculus.
Examples: DeepSeek V4 (Pro and Flash), V3, R1
Strengths:
- +V4 ships at <1/7th the cost of Opus 4.6 for one-generation-behind capability
- +Open-weight models available (R1, V3)
- +Efficiency breakthroughs in training methodology
- +Strong coding and mathematical reasoning; natively multimodal
Considerations:
- •Chinese company — data sovereignty concerns for some organisations
- •API reliability and availability can vary
- •Censorship on certain topics (Chinese regulatory compliance)
- •Rapidly evolving — model versions shift fast
Our view: The biggest disruption in AI economics since GPT-3. DeepSeek proved that frontier capability doesn't require frontier budgets. Essential for multi-model strategy — particularly for cost-sensitive workloads where R1 or V3 can match more expensive alternatives.
How We Navigate This
With so many options, how do you choose? Here's our approach.
Start with the Problem
Don't start with “what AI should we use?” Start with “what problem are we solving?” The tool follows from the task, not the other way around.
Favour Simplicity
The simplest tool that solves the problem is usually the right choice. Complexity has ongoing costs. Start simple; add sophistication when you hit limits.
Build for Portability
The landscape changes fast. Avoid deep lock-in where you can. Use standards (MCP, OpenAI-compatible APIs) that let you switch if better options emerge.
Test with Real Work
Demos impress; production reveals. Before committing, test tools on your actual tasks. What works in a demo may struggle with your specific context.
What's Not Here
Comprehensive Coverage
This isn't a complete market survey. We focus on tools we've used or seriously evaluated. Many good options aren't listed because we haven't worked with them.
Full Enterprise Stack
We cover M365 Copilot and Graph, but not the full enterprise AI stack (Copilot Studio, Power Platform AI, Salesforce Einstein, ServiceNow, etc.). These require enterprise-specific context.
Infrastructure Deep Dives
We now track AI compute infrastructure because it explains limits, pricing, and reliability. We do not attempt a full survey of GPU providers, cloud infrastructure, power markets, or on-premise deployment. Those choices need infrastructure-specific advice.
Pricing Details
Pricing changes frequently. We mention pricing considerations but don't list specific prices. Check provider websites for current rates.
From Landscape to Practice
Understanding the landscape is step one. Making it work for your organisation is where we help.
Right-Sized Stack
What combination of these tools makes sense for your organisation type?
Adoption Journey
Where are you on the spectrum from locked-out to power user?
Context Engineering
The right information at the right time. How to design systems that give AI what it needs.
Learn more →Agents & Orchestration
One agent, infinite expertise. Skills-based AI systems that compound value.
Learn more →AI Skills & Fluency
The bottleneck isn't tools – it's people and culture. Building genuine capability.
Learn more →Why Timing Matters
The landscape is not just moving faster. Capacity, pricing, interaction, and deployment support now change what small teams can actually do with AI.
Clock Speed Reality
Features ship faster than conferences can announce them. The useful habit is not memorising every launch, but spotting which changes alter real work: better context transfer, cheaper execution, safer privacy, or more reliable delegation.
Leaders Pulling Ahead
The gap between organisations that get AI and those still experimenting is widening. Not because technology is inaccessible, but because execution speed is separating leaders from laggards. The pattern starts with documentation, research, and workflow support before it reaches core professional judgement.
Model Commoditisation
The models themselves are increasingly commoditised. Your advantage is less about choosing one winner and more about building a portable way of working: saved context, reusable instructions, clear routing, and enough fluency to move between tools when cost, limits, or quality shift.
Labs Eating the App Layer
AI labs are moving down-stack into code review, security scanning, meetings, design, and deployment support. For small teams, the question is practical: build on tools that are useful now, but keep your context, documents, and working method portable enough that a bundled feature does not strand you.
Need Help Navigating?
The landscape is overwhelming. We've been navigating it daily. Let's talk about what makes sense for your situation.