AI CAPABILITY • LANDSCAPE

AI Landscape Navigator

The landscape is exploding. The gap between movers and waiters is widening.

More capability ships quarterly than used to ship in years. New models, tools, and platforms appear weekly. Competitive advantage is materialising now – not “someday.”

This living reference helps you understand what's out there, what matters, and where we stand. Not exhaustive – navigational.

A Living Reference

This page is updated as the landscape evolves. It reflects our current understanding and experience, not comprehensive market research. We include tools we've used, evaluated, or tracked closely. Last updated: April 2026.

The AI models powering everything — who builds them, what they’re good at, and how they compare.

ANTHROPIC

Claude models

Anthropic leads in both capability and safety. Claude models excel at nuanced reasoning, coding, and agentic work — operating autonomously across complex multi-step tasks. Claude Code alone generates $2.5B ARR, contributing to Anthropic's $19B ARR (closing fast on OpenAI's $25B). Claude Code supports skills-based orchestration, MCP tool integration, and agent teams that coordinate in parallel.

Examples: Claude Opus 4.6, Claude 4.5 Sonnet, Claude 4.5 Haiku

Strengths:

  • +Frontier reasoning and agentic capability
  • +Claude Code: $2.5B ARR, skills, MCPs, agent teams
  • +1M token context window (Opus 4.6)
  • +Safety-conscious design with strong instruction following

Considerations:

  • Premium pricing for frontier models
  • Agent teams feature still experimental
  • Ecosystem growing rapidly but newer than OpenAI

Our view: Our primary platform. We run Claude Code as our business operations hub — not just for coding, but for orchestrating strategy, research, delivery, and knowledge management daily.

OPENAI

GPT models & ChatGPT

OpenAI pioneered the current AI era and is now in full strategic pivot. GPT 5.4 (March 2026) scored 75% on OSWorld — surpassing human performance (72.4%) — at half the price of Opus. In late March 2026, OpenAI renamed its product division to "AGI Deployment," shut down Sora to redeploy compute to Codex, and completed pre-training on a new model codenamed "Spud." CEO Sam Altman narrowed his role to capital and supply chains, putting CEO of Applications Fiji Simo in the driver's seat for product. The message is clear: knowledge work automation is now OpenAI's singular focus.

Examples: GPT 5.4, Codex, o3-pro, ChatGPT Plus

Strengths:

  • +Largest ecosystem and integrations ($25B ARR)
  • +GPT 5.4: above-human OSWorld, half-price of Opus
  • +Full strategic pivot to enterprise work AI ("AGI Deployment")
  • +New "Spud" model completing — claimed to "accelerate the economy"

Considerations:

  • Major organisational restructuring underway
  • Sora discontinued — signals compute scarcity trade-offs
  • IPO prospectus-like documents reveal concentration risk (Microsoft dependency)

Our view: The ecosystem leader, now fully focused on work AI. GPT 5.4's price-performance makes multi-model strategies compelling. The "AGI Deployment" rename and Sora shutdown signal that OpenAI sees knowledge work as THE market.

GOOGLE

Gemini models

Google brings deep AI research heritage and integration with Google ecosystem. Gemini models are multimodal from the ground up, with strong reasoning and long context capabilities.

Examples: Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro

Strengths:

  • +Native multimodality (text, image, video)
  • +Google ecosystem integration
  • +Very long context windows
  • +Strong research foundation

Considerations:

  • Availability varies by region
  • Enterprise features still maturing
  • Less established in coding tasks

Our view: Strong option for multimodal and Google-integrated workflows.

XAI

Grok models

Elon Musk's AI venture, integrated with X (Twitter). Grok models are designed to be more direct and less filtered than competitors, with real-time access to X platform data. In March 2026, SpaceX filed for a $75B IPO with xAI merger implications — if completed, this would create an AI+space conglomerate with massive compute resources. Organisational fragility remains a factor: 9 of the original 11 co-founders have departed, and xAI has undertaken a ground-up infrastructure rebuild.

Examples: Grok 4, Grok 4 Heavy, Grok 4 Fast

Strengths:

  • +Real-time information from X
  • +SpaceX IPO filing ($75B target) could unlock massive capital
  • +Less guardrails on topics
  • +Fast iteration pace

Considerations:

  • Limited enterprise features
  • Tied to X ecosystem
  • Significant organisational fragility (9/11 co-founders departed)
  • IPO + merger complexity adds uncertainty

Our view: The SpaceX IPO changes the calculus — if completed, xAI gains access to enormous capital. But organisational fragility and the infrastructure rebuild still introduce real vendor risk. Monitor, but don't depend on.

META

Llama models (open-source)

Meta's open-source approach has democratised access to capable models. Llama can be run locally or on private infrastructure, offering control and privacy that hosted APIs cannot.

Examples: Llama 4 Maverick, Llama 4 Scout, Llama 4 Behemoth (preview)

Strengths:

  • +Open source and customisable
  • +Can run locally/privately
  • +No per-token API costs
  • +Growing ecosystem

Considerations:

  • Requires technical expertise to deploy
  • Smaller models than frontier APIs
  • Self-managed infrastructure

Our view: Important for privacy-sensitive deployments and organisations with technical capability.

MISTRAL

European AI models

European-founded AI company offering competitive models with strong performance-to-cost ratios. Open-weight models available for self-hosting, with API access for convenience.

Examples: Mistral Large 3, Mistral Medium 3, Mistral Small 3.1

Strengths:

  • +European data sovereignty option
  • +Strong price/performance
  • +Open-weight models available
  • +Multilingual strength

Considerations:

  • Smaller ecosystem than US providers
  • Enterprise features still developing

Our view: Good option for European data sovereignty requirements.

DEEPSEEK

Chinese frontier AI at fraction of the cost

DeepSeek shook the AI industry by producing frontier-competitive models at a fraction of US lab costs. DeepSeek R1's reasoning matches o1-level performance, while V3 rivals GPT-4o — both trained for an estimated $5–6M vs hundreds of millions at US labs. V4, expected mid-2026, will be natively multimodal with a 1M token context window. The efficiency breakthrough forced a rethink across the industry: if models this good can be built this cheaply, the moat is not the model — it's the application layer.

Examples: DeepSeek V3, DeepSeek R1, DeepSeek V4 (upcoming)

Strengths:

  • +Frontier-competitive reasoning at fraction of cost
  • +Open-weight models available (R1, V3)
  • +Efficiency breakthroughs in training methodology
  • +Strong coding and mathematical reasoning

Considerations:

  • Chinese company — data sovereignty concerns for some organisations
  • API reliability and availability can vary
  • Censorship on certain topics (Chinese regulatory compliance)
  • Rapidly evolving — model versions shift fast

Our view: The biggest disruption in AI economics since GPT-3. DeepSeek proved that frontier capability doesn't require frontier budgets. Essential for multi-model strategy — particularly for cost-sensitive workloads where R1 or V3 can match more expensive alternatives.

How We Navigate This

With so many options, how do you choose? Here's our approach.

Start with the Problem

Don't start with “what AI should we use?” Start with “what problem are we solving?” The tool follows from the task, not the other way around.

Favour Simplicity

The simplest tool that solves the problem is usually the right choice. Complexity has ongoing costs. Start simple; add sophistication when you hit limits.

Build for Portability

The landscape changes fast. Avoid deep lock-in where you can. Use standards (MCP, OpenAI-compatible APIs) that let you switch if better options emerge.

Test with Real Work

Demos impress; production reveals. Before committing, test tools on your actual tasks. What works in a demo may struggle with your specific context.

What's Not Here

Comprehensive Coverage

This isn't a complete market survey. We focus on tools we've used or seriously evaluated. Many good options aren't listed because we haven't worked with them.

Full Enterprise Stack

We cover M365 Copilot and Graph, but not the full enterprise AI stack (Copilot Studio, Power Platform AI, Salesforce Einstein, ServiceNow, etc.). These require enterprise-specific context.

Hardware & Infrastructure

GPU providers, cloud infrastructure (AWS Bedrock, Azure AI, GCP Vertex), and on-premise deployment options. These require infrastructure-specific context beyond this navigator.

Pricing Details

Pricing changes frequently. We mention pricing considerations but don't list specific prices. Check provider websites for current rates.

Why Timing Matters

The landscape isn't just changing – the pace of change is accelerating.

Clock Speed Reality

Features ship faster than conferences can announce them. More capability is shipping quarterly than organisations used to deliver in 5-6 years of traditional technology change.

Leaders Pulling Ahead

The gap between organisations that “get” AI and those still experimenting is widening. Not because technology is inaccessible – everyone has access now – but because execution speed is separating leaders from laggards. Healthcare illustrates the pattern: 81% of US doctors now use AI (doubled since 2023), but only 17% for diagnosis – adoption starts with documentation and research, core professional judgment comes last. The same pattern is playing out across every sector.

Model Commoditisation

The models themselves are increasingly commoditised. GPT 5.4 matches frontier capability at half the price. Your competitive advantage isn't which model you use – it's how quickly you build capability around it. Multi-model strategies are now the norm: the average user works across 3.5 different models, choosing the right tool for each task. This is an economic imperative, not just a technical preference.

Labs Eating the App Layer

AI labs are moving down-stack. Code review, security scanning, meetings — functions that were standalone products are being absorbed into lab offerings. The revenue gap is narrowing (Anthropic $19B vs OpenAI $25B ARR; Cursor at $2B). Platform risk is real: if you build on a capability a lab is likely to bundle, plan accordingly.

Need Help Navigating?

The landscape is overwhelming. We've been navigating it daily. Let's talk about what makes sense for your situation.