AI CAPABILITY • LANDSCAPE
AI Landscape Navigator
The landscape is exploding. The gap between movers and waiters is widening.
More capability ships quarterly than used to ship in years. New models, tools, and platforms appear weekly. Competitive advantage is materialising now – not “someday.”
This living reference helps you understand what's out there, what matters, and where we stand. Not exhaustive – navigational.
A Living Reference
This page is updated as the landscape evolves. It reflects our current understanding and experience, not comprehensive market research. We include tools we've used, evaluated, or tracked closely. Last updated: April 2026.
The AI models powering everything — who builds them, what they’re good at, and how they compare.
ANTHROPIC
Claude models
Anthropic leads in both capability and safety. Claude models excel at nuanced reasoning, coding, and agentic work — operating autonomously across complex multi-step tasks. Claude Code alone generates $2.5B ARR, contributing to Anthropic's $19B ARR (closing fast on OpenAI's $25B). Claude Code supports skills-based orchestration, MCP tool integration, and agent teams that coordinate in parallel.
Examples: Claude Opus 4.6, Claude 4.5 Sonnet, Claude 4.5 Haiku
Strengths:
- +Frontier reasoning and agentic capability
- +Claude Code: $2.5B ARR, skills, MCPs, agent teams
- +1M token context window (Opus 4.6)
- +Safety-conscious design with strong instruction following
Considerations:
- •Premium pricing for frontier models
- •Agent teams feature still experimental
- •Ecosystem growing rapidly but newer than OpenAI
Our view: Our primary platform. We run Claude Code as our business operations hub — not just for coding, but for orchestrating strategy, research, delivery, and knowledge management daily.
OPENAI
GPT models & ChatGPT
OpenAI pioneered the current AI era and is now in full strategic pivot. GPT 5.4 (March 2026) scored 75% on OSWorld — surpassing human performance (72.4%) — at half the price of Opus. In late March 2026, OpenAI renamed its product division to "AGI Deployment," shut down Sora to redeploy compute to Codex, and completed pre-training on a new model codenamed "Spud." CEO Sam Altman narrowed his role to capital and supply chains, putting CEO of Applications Fiji Simo in the driver's seat for product. The message is clear: knowledge work automation is now OpenAI's singular focus.
Examples: GPT 5.4, Codex, o3-pro, ChatGPT Plus
Strengths:
- +Largest ecosystem and integrations ($25B ARR)
- +GPT 5.4: above-human OSWorld, half-price of Opus
- +Full strategic pivot to enterprise work AI ("AGI Deployment")
- +New "Spud" model completing — claimed to "accelerate the economy"
Considerations:
- •Major organisational restructuring underway
- •Sora discontinued — signals compute scarcity trade-offs
- •IPO prospectus-like documents reveal concentration risk (Microsoft dependency)
Our view: The ecosystem leader, now fully focused on work AI. GPT 5.4's price-performance makes multi-model strategies compelling. The "AGI Deployment" rename and Sora shutdown signal that OpenAI sees knowledge work as THE market.
Gemini models
Google brings deep AI research heritage and integration with Google ecosystem. Gemini models are multimodal from the ground up, with strong reasoning and long context capabilities.
Examples: Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro
Strengths:
- +Native multimodality (text, image, video)
- +Google ecosystem integration
- +Very long context windows
- +Strong research foundation
Considerations:
- •Availability varies by region
- •Enterprise features still maturing
- •Less established in coding tasks
Our view: Strong option for multimodal and Google-integrated workflows.
XAI
Grok models
Elon Musk's AI venture, integrated with X (Twitter). Grok models are designed to be more direct and less filtered than competitors, with real-time access to X platform data. In March 2026, SpaceX filed for a $75B IPO with xAI merger implications — if completed, this would create an AI+space conglomerate with massive compute resources. Organisational fragility remains a factor: 9 of the original 11 co-founders have departed, and xAI has undertaken a ground-up infrastructure rebuild.
Examples: Grok 4, Grok 4 Heavy, Grok 4 Fast
Strengths:
- +Real-time information from X
- +SpaceX IPO filing ($75B target) could unlock massive capital
- +Less guardrails on topics
- +Fast iteration pace
Considerations:
- •Limited enterprise features
- •Tied to X ecosystem
- •Significant organisational fragility (9/11 co-founders departed)
- •IPO + merger complexity adds uncertainty
Our view: The SpaceX IPO changes the calculus — if completed, xAI gains access to enormous capital. But organisational fragility and the infrastructure rebuild still introduce real vendor risk. Monitor, but don't depend on.
META
Llama models (open-source)
Meta's open-source approach has democratised access to capable models. Llama can be run locally or on private infrastructure, offering control and privacy that hosted APIs cannot.
Examples: Llama 4 Maverick, Llama 4 Scout, Llama 4 Behemoth (preview)
Strengths:
- +Open source and customisable
- +Can run locally/privately
- +No per-token API costs
- +Growing ecosystem
Considerations:
- •Requires technical expertise to deploy
- •Smaller models than frontier APIs
- •Self-managed infrastructure
Our view: Important for privacy-sensitive deployments and organisations with technical capability.
MISTRAL
European AI models
European-founded AI company offering competitive models with strong performance-to-cost ratios. Open-weight models available for self-hosting, with API access for convenience.
Examples: Mistral Large 3, Mistral Medium 3, Mistral Small 3.1
Strengths:
- +European data sovereignty option
- +Strong price/performance
- +Open-weight models available
- +Multilingual strength
Considerations:
- •Smaller ecosystem than US providers
- •Enterprise features still developing
Our view: Good option for European data sovereignty requirements.
DEEPSEEK
Chinese frontier AI at fraction of the cost
DeepSeek shook the AI industry by producing frontier-competitive models at a fraction of US lab costs. DeepSeek R1's reasoning matches o1-level performance, while V3 rivals GPT-4o — both trained for an estimated $5–6M vs hundreds of millions at US labs. V4, expected mid-2026, will be natively multimodal with a 1M token context window. The efficiency breakthrough forced a rethink across the industry: if models this good can be built this cheaply, the moat is not the model — it's the application layer.
Examples: DeepSeek V3, DeepSeek R1, DeepSeek V4 (upcoming)
Strengths:
- +Frontier-competitive reasoning at fraction of cost
- +Open-weight models available (R1, V3)
- +Efficiency breakthroughs in training methodology
- +Strong coding and mathematical reasoning
Considerations:
- •Chinese company — data sovereignty concerns for some organisations
- •API reliability and availability can vary
- •Censorship on certain topics (Chinese regulatory compliance)
- •Rapidly evolving — model versions shift fast
Our view: The biggest disruption in AI economics since GPT-3. DeepSeek proved that frontier capability doesn't require frontier budgets. Essential for multi-model strategy — particularly for cost-sensitive workloads where R1 or V3 can match more expensive alternatives.
How We Navigate This
With so many options, how do you choose? Here's our approach.
Start with the Problem
Don't start with “what AI should we use?” Start with “what problem are we solving?” The tool follows from the task, not the other way around.
Favour Simplicity
The simplest tool that solves the problem is usually the right choice. Complexity has ongoing costs. Start simple; add sophistication when you hit limits.
Build for Portability
The landscape changes fast. Avoid deep lock-in where you can. Use standards (MCP, OpenAI-compatible APIs) that let you switch if better options emerge.
Test with Real Work
Demos impress; production reveals. Before committing, test tools on your actual tasks. What works in a demo may struggle with your specific context.
What's Not Here
Comprehensive Coverage
This isn't a complete market survey. We focus on tools we've used or seriously evaluated. Many good options aren't listed because we haven't worked with them.
Full Enterprise Stack
We cover M365 Copilot and Graph, but not the full enterprise AI stack (Copilot Studio, Power Platform AI, Salesforce Einstein, ServiceNow, etc.). These require enterprise-specific context.
Hardware & Infrastructure
GPU providers, cloud infrastructure (AWS Bedrock, Azure AI, GCP Vertex), and on-premise deployment options. These require infrastructure-specific context beyond this navigator.
Pricing Details
Pricing changes frequently. We mention pricing considerations but don't list specific prices. Check provider websites for current rates.
From Landscape to Practice
Understanding the landscape is step one. Making it work for your organisation is where we help.
Right-Sized Stack
What combination of these tools makes sense for your organisation type?
Adoption Journey
Where are you on the spectrum from locked-out to power user?
Context Engineering
The right information at the right time. How to design systems that give AI what it needs.
Learn more →Agents & Orchestration
One agent, infinite expertise. Skills-based AI systems that compound value.
Learn more →AI Skills & Fluency
The bottleneck isn't tools – it's people and culture. Building genuine capability.
Learn more →Why Timing Matters
The landscape isn't just changing – the pace of change is accelerating.
Clock Speed Reality
Features ship faster than conferences can announce them. More capability is shipping quarterly than organisations used to deliver in 5-6 years of traditional technology change.
Leaders Pulling Ahead
The gap between organisations that “get” AI and those still experimenting is widening. Not because technology is inaccessible – everyone has access now – but because execution speed is separating leaders from laggards. Healthcare illustrates the pattern: 81% of US doctors now use AI (doubled since 2023), but only 17% for diagnosis – adoption starts with documentation and research, core professional judgment comes last. The same pattern is playing out across every sector.
Model Commoditisation
The models themselves are increasingly commoditised. GPT 5.4 matches frontier capability at half the price. Your competitive advantage isn't which model you use – it's how quickly you build capability around it. Multi-model strategies are now the norm: the average user works across 3.5 different models, choosing the right tool for each task. This is an economic imperative, not just a technical preference.
Labs Eating the App Layer
AI labs are moving down-stack. Code review, security scanning, meetings — functions that were standalone products are being absorbed into lab offerings. The revenue gap is narrowing (Anthropic $19B vs OpenAI $25B ARR; Cursor at $2B). Platform risk is real: if you build on a capability a lab is likely to bundle, plan accordingly.
Need Help Navigating?
The landscape is overwhelming. We've been navigating it daily. Let's talk about what makes sense for your situation.