Hyperscaler AI agent platforms are a double-edged sword

Ran Sheinberg
Co-founder, xpander.ai
Apr 24, 2026
Product

Large enterprise platform engineering teams are shipping AI agents on three clouds simultaneously, and every one of those clouds wants to be the only platform you use. The tension between adopting managed agent services and maintaining architectural independence has become the defining infrastructure decision of 2026. This guide breaks down the problem, defines the requirement set, and evaluates the platforms that claim to solve it.

Summary

Hyperscaler agent services (AWS Bedrock AgentCore, Azure AI Foundry, Gemini Enterprise Agent Platform) give you managed runtimes locked to a single cloud. They handle inference and basic observability but leave versioning, CI/CD, rollback, and cross-cloud governance as your problem. Open-source frameworks like LangGraph and CrewAI solve agent construction but ignore everything after the build step.

An enterprise team operating across clouds needs nine capabilities in a single layer: multi-cloud deployment, multi-model support, multiple build paths, full lifecycle management, multi-agent orchestration, observability, governance, evaluation, and system integration. That layer has to be independent of any single cloud or framework to preserve optionality and centralize operations.

xpander.ai is the platform that meets all nine requirements as a cloud-agnostic standalone deployment, covering the full agent lifecycle from no-code prototyping through production governance.

The Problem: Hyperscaler Agent Services Create Fragmentation, Not Foundation

What Hyperscaler Agent Services Actually Give You

Each hyperscaler has shipped a competent managed runtime for AI agents, and each one ties you to its ecosystem in ways that compound over time.

AWS Bedrock AgentCore provides a fully managed agent runtime with runtime scaling and observability. Agent definitions, memory, and tool integrations are stored in AWS-native services (S3, DynamoDB, Lambda), and the build path is code-first only via Python SDK. There is no native CI/CD for agents, no semantic versioning, and no rollback capability.

Azure AI Foundry benefits from Microsoft's OpenAI partnership and integrates tightly with Azure Active Directory and Azure Monitor. The build path supports Python and .NET, with partial lifecycle coverage through observability and evaluation tooling. Teams already deep in the Microsoft ecosystem (M365, Copilot, Azure AD) will find the integration convenient, but portability to another cloud requires a rewrite.

Gemini Enterprise Agent Platform (formerly Vertex AI, rebranded at Google Cloud Next in April 2026) uses Google's Agent Development Kit (ADK) as its build path. It offers runtime management and observability on GCP. Of the three hyperscalers, Gemini Enterprise Agent Platform carries the deepest cloud lock-in due to Gemini model dependency and GCP-native storage and compute requirements.

All three share the same structural gaps: no multi-cloud portability, no agent versioning or rollback, no canary or blue-green deployment support, and no cross-cloud governance model.

The Fragmentation Tax

Large enterprises typically operate across two or three hyperscalers. Business units adopt different agent services independently, each with its own deployment model, access control system, and monitoring stack.

The cost is organizational, not just technical. Every new use case requires rebuilding the deployment, monitoring, and governance layer from scratch. There is no shared operational model across teams, no consistent way to audit agent behavior, and no mechanism to move an agent from one cloud to another without starting over.

Platform engineering teams end up maintaining parallel infrastructure for the same category of workload. That duplication scales linearly with every team that ships agents.

Why Frameworks Alone Don't Solve It

LangGraph and CrewAI are strong agent construction tools. Both provide orchestration primitives, tool-use abstractions, and multi-step reasoning patterns that accelerate the build phase.

Neither provides deployment pipelines, rollback mechanisms, observability infrastructure, or governance controls. As xpander.ai's analysis of the framework gap puts it: "Frameworks build agent logic. Platforms run agents safely at scale."

Teams spend months building on these frameworks before confronting the production gap. CrewAI offers a basic visual builder and LangGraph has a maturing Platform tier, but both require bring-your-own-infrastructure for everything beyond the agent code itself.

What a True Enterprise AI Foundation Layer Looks Like

The following nine capabilities represent the requirement set for a platform that can serve as a shared foundation across teams, clouds, and use cases. These are framed as requirements, not features, because the gap between "we offer observability" and "we cover the full agent lifecycle" is where most platforms fall short.

Multi-Cloud and Self-Hosted Deployment

Agents must run on AWS, Azure, GCP, or any Kubernetes cluster, including VPC and air-gapped environments. A platform that requires a specific cloud's compute or storage layer fails this requirement, regardless of how well it performs within that cloud.

Multi-Model and Multi-Framework Support

Teams need to swap models (GPT-4, Claude, Gemini, open-weight alternatives) without rewriting agent logic. The platform should integrate with existing frameworks like LangGraph and CrewAI rather than forcing a replacement.

SDK and Visual Builder Development

Code-first engineers and domain experts who build in no-code need to work on the same platform. A handoff path between visual prototyping and production code prevents the common pattern where a domain expert's prototype gets rebuilt from scratch by engineering.

Versioning, CI/CD, and Lifecycle Controls

Agents are software. They need semantic versioning, canary deployments, blue-green rollouts, automated rollback on health-check failure, hot-reload of prompts and models, and Git integration. Without these controls, promoting an agent from staging to production is a manual, error-prone process.

Multi-Agent and Long-Running Task Orchestration

Complex workflows require stateful execution with checkpointing, retries, and human-in-the-loop pause/resume. Agents need to coordinate across non-linear graphs, not just sequential chains.

Observability, Logging, and Health Monitoring

Execution tracing, health checks, automated rollback triggers, and audit logging must span all agents in the fleet. Observability that lives inside a single cloud's monitoring stack is insufficient when agents run across multiple environments.

Governance, Security, and Access Controls

Infrastructure-level isolation (VPC, air-gap) should serve as the primary control layer. Application-level guardrails, permissions, and audit trails are important but secondary. A governance model built entirely on application-layer policies is weaker than one rooted in infrastructure boundaries.

Evaluation and Testing

Evaluation needs to be built into the delivery lifecycle, triggered as part of CI/CD, not bolted on after an agent is already in production. Pre-deployment testing is table stakes; continuous evaluation in production is the harder requirement.

Enterprise System and Channel Integration

Agents must be invocable from APIs, SDKs, MCP, webhooks, Slack, Teams, CI/CD pipelines, cron triggers, and other agents. A platform that only supports one invocation pattern limits where agents can be deployed.

Why the Abstraction Layer Has to Be Independent

The Lock-in Calculus

An abstraction layer provided by a hyperscaler is not an abstraction layer. It is a deeper integration point. The platform that sits above your clouds and frameworks must be independent of all of them, or it simply moves the lock-in one level up the stack.

Analysis from stepto.net (April 2026) frames the strategic calculus directly: "The organizations that built abstraction layers in 2025 and 2026 are on a different trajectory. The ones that did not are on a different trajectory: deepening dependency, increasing exit cost." Organizations that defer this decision do not stay in the same position; they accumulate switching costs that grow with every agent deployed on a cloud-locked service.

One Operational Model Across Teams

The organizational payoff of an independent platform layer is consistency. Shared deployment patterns mean a platform engineering team can define canary rollout policies once and apply them across every agent, regardless of which cloud hosts the compute.

Shared governance means one audit log, one access control model, one set of cost guardrails. Shared observability means a single view of agent health across AWS, Azure, and GCP workloads. Without a unifying layer, each of these concerns gets solved independently by every team, at considerable cost in engineering time and organizational coordination.

xpander.ai: Built for This Requirement Set

xpander.ai is a full-lifecycle AI agent platform that functions as an internal developer platform (IDP) for agents. It covers build, deploy, govern, monitor, and iterate in a single product, without depending on any hyperscaler's infrastructure stack.

Best for: Enterprise platform engineering teams that need one operational model for AI agents across multiple clouds, frameworks, and team skill levels.

Deployment Flexibility

xpander.ai deploys as a standalone application across AWS, Azure, GCP, and any Kubernetes cluster. VPC deployment keeps agent execution inside the customer's network perimeter. Air-gapped environments are supported natively, not through workarounds.

This is a meaningful distinction from hyperscaler services. xpander.ai's multi-cloud capability is architectural, not a marketing claim about API compatibility. The same agent definition, the same deployment pipeline, and the same governance model work identically regardless of where the compute runs. Private LLM support ensures data stays within the customer's perimeter even when using self-hosted models.

Agent Lifecycle as Software Lifecycle

xpander.ai treats agent lifecycle management with the same rigor that platform teams expect from application deployment. Semantic versioning tracks changes to configuration, prompts, tools, and code as a single versioned artifact. Canary deployments and blue-green rollouts let teams promote new agent versions gradually, with automated rollback triggered by health-check failures.

Hot-reload of prompts and models allows changes without full redeployment. CI/CD integration through Git means agent updates flow through the same pipelines as application code. No other platform in the comparison table offers this full set of lifecycle controls as native capabilities.

Orchestration Depth

Complex agent workflows in xpander.ai run as stateful, long-running tasks with checkpointing and retries. Human-in-the-loop pause/resume is built into the runtime, not implemented as a custom workaround. Non-linear multi-agent graphs support branching, merging, and parallel execution patterns that sequential chains cannot express.

xpander.ai supports both adaptive agents (where the runtime determines the execution path) and deterministic AI workflows (where the path is predefined). Agents are invocable from APIs, SDKs, MCP, webhooks, Slack, Teams, CI/CD pipelines, cron triggers, and other agents, making them composable building blocks rather than isolated endpoints.

Governance by Infrastructure, Not Just Policy

xpander.ai's governance model starts at the infrastructure layer. Self-deployment, air-gapped environments, and VPC isolation provide the primary security boundary. Application-level controls (permissions, guardrails, monitoring, approvals, audit logging, cost controls) layer on top of that foundation.

The 100+ pre-built specialized agents in xpander.ai ship with scoped operations and cost guardrails by default. Audit logging covers every agent interaction. This two-tier approach (infrastructure isolation first, application-level policy second) is a stronger security posture than relying solely on application-layer guardrails.

Build Path for Every Team

xpander.ai's Agent Studio provides a no-code visual builder for domain experts to prototype and build agents. Low-code and code-first SDK paths serve engineering teams, with full support for LangGraph, CrewAI, and other frameworks. The handoff model is explicit: a domain expert builds a working agent in Agent Studio, then an engineer integrates it via API into a product surface.

No manual data mapping is required between build paths. An agent built visually has the same runtime characteristics as one built in code, and both can be versioned, deployed, and governed through the same pipeline.

Pros:

  • Cloud-agnostic by architecture. Runs on AWS, Azure, GCP, any Kubernetes cluster, VPC, or air-gapped environment as a native standalone deployment.

  • Full lifecycle management. Semantic versioning, canary deployments, blue-green rollouts, automated rollback, and CI/CD integration are first-class platform capabilities, not add-ons.

  • Three build paths in one platform. No-code Agent Studio, low-code, and code-first SDK with framework support (LangGraph, CrewAI) serve different team profiles without fragmenting the operational model.

  • Infrastructure-first governance. VPC isolation and air-gap support provide a stronger security boundary than application-layer-only controls.

  • Broad invocation surface. API, SDK, MCP, webhooks, Slack, Teams, CI/CD pipelines, cron triggers, and agent-to-agent calls make agents composable across systems.

Cons:

  • Independent platform to operate. Teams must run and maintain xpander.ai as a separate platform layer, which adds operational overhead compared to using a hyperscaler's fully managed service.

  • Newer market presence. As an independent vendor, xpander.ai carries adoption risk relative to hyperscaler services backed by AWS, Microsoft, or Google.

Platform Comparison

Platform

Cloud Dependency

Build Path

Lifecycle Management

xpander.ai

Cloud-agnostic (AWS, Azure, GCP, any K8s)

No-code, low-code, code-first

Full (versioning, rollback, CI/CD, canary)

AWS Bedrock AgentCore

AWS-locked

Code-first only

Partial (runtime scaling, observability)

Azure AI Foundry

Azure-locked

Code-first (Python/.NET)

Partial (observability, evaluation)

Gemini Enterprise Agent Platform

GCP-locked

Code-first via ADK

Partial (runtime, observability)

CrewAI

Bring your own infra

Visual builder + Python

None (DIY)

LangGraph

Bring your own infra

Code-first + visual debugger

Minimal (Platform tier, maturing)

Getting Started

If your organization runs AI workloads across more than one cloud, or expects to within the next 12 months, the abstraction layer decision is already in front of you. Every agent deployed on a cloud-locked service without a portability path increases the cost of that future decision.

Start by mapping your current agent efforts across teams. Count the clouds, count the frameworks, count the teams solving deployment and governance independently. That inventory will tell you whether you need a unifying platform layer or whether a single hyperscaler's service is genuinely sufficient.

For teams ready to evaluate xpander.ai against this requirement set, xpander.ai's platform documentation covers the architecture in detail. Request a deployment walkthrough scoped to your infrastructure, your clouds, and your governance requirements.

    The AI Agent Platform
    for Enterprise Teams

    Connect agents to any enterprise system. Deploy on any cloud. Orchestration, security, and observability built in.

    All features ・No credit card

    © xpander.ai 2026. All rights reserved.

    The AI Agent Platform
    for Enterprise Teams

    Connect agents to any enterprise system. Deploy

    on any cloud. Orchestration, security, and observability built in.

    All features ・No credit card

    © xpander.ai 2026. All rights reserved.

    The AI Agent Platform for Enterprise Teams

    Connect agents to any enterprise system. Deploy on any cloud. Orchestration, security, and observability built in.

    All features ・No credit card

    © xpander.ai 2026. All rights reserved.