How to Deploy AI Agents on Your Enterprise Data Stack

Your data team is probably already running AI agents — or evaluating them seriously. The tooling question has shifted from "should we?" to "how do we connect this to everything we already have?" That's harder than it sounds, and the vendor pitch decks are not helping.

This article covers the architecture, the real trade-offs, and what production actually requires. Skip to the section you need.

1. The Shift: Why Data Teams Are Building AI Agents Now

From consumers to builders

For most of the past decade, data teams were consumers of AI — building dashboards, writing reports, running models that other people used. The tooling assumed humans in the loop at every decision point: a data engineer writes the query, a data scientist interprets the results, an analyst builds the visualization.

That assumption is being actively dismantled. LLMs can now write syntactically correct, semantically reasonable SQL against schemas they've never seen before. They can reason about data quality issues — not just flag them, but explain why a sudden spike in null values probably means an upstream schema change, not a pipeline bug. The capability gap between "AI that assists" and "AI that acts" closed fast.

The vendor rush and its limits

Every major data platform vendor saw the same opening and launched their own AI agent product. Snowflake shipped Cortex Analyst and then Snowflake Intelligence. Databricks launched Genie. Google released Gemini-powered agents for BigQuery. Each is genuinely impressive — within its own platform.

The problem is obvious to anyone who runs real enterprise infrastructure: these are walled gardens. Snowflake's agent can query your Snowflake warehouse but has no visibility into your Databricks lakehouse. Databricks Genie understands your Unity Catalog lineage perfectly but goes blind the moment a dataset lives in BigQuery or a Postgres operational database.

No enterprise runs a single data platform

A 5,000-person company running a "modern data stack" in 2025 typically has: one or two cloud data warehouses (often including a holdover from an acquisition), a data lakehouse for ML workloads, multiple BI tools serving different business units, Airflow or Prefect for orchestration, dbt for transformation, and half a dozen operational databases that engineers query directly when things go wrong.

That's not tech debt — that's the natural outcome of a decade of growth, acquisitions, and best-of-breed tool selection. Any AI agent strategy that assumes a single data platform will fail immediately in production.

2. The Architecture: What a Data-Connected AI Agent Actually Looks Like

The four layers

A production-grade data AI agent has four distinct layers, and conflating them is where most DIY builds go wrong.

LLM reasoning layer. This is the model (GPT-4o, Claude, Gemini, Llama) doing the actual reasoning — generating SQL, interpreting query results, deciding which tool to call next, determining when to escalate to a human. The model itself is almost commodity at this point; how you orchestrate it is not.

Tool/connector layer. This is how the agent talks to your actual systems — warehouses, BI tools, orchestrators, alerting channels. Connectors need to handle authentication, query translation, result serialization, and error handling. Each connector is essentially a small integration project. More on why this becomes expensive below.

Governance layer. Every action the agent takes — every query it runs, every result it reads, every alert it sends — needs to be logged, attributable, and reviewable. This isn't optional for regulated industries; it's increasingly expected everywhere. The governance layer intercepts agent actions before they execute and records them after.

Memory/context layer. Agents operating on your data stack need persistent context: schema knowledge, historical query patterns, user preferences, incident history. A stateless agent that forgets what it learned about your Snowflake schema between sessions is much less useful than one that accumulates institutional knowledge over time.

The connector challenge in practice

Connectors sound simple. They are not. To connect an agent to Snowflake, you need: a driver or REST API call, credential management (how does the agent authenticate?), query limits (what tables can it access?), result handling (how do you serialize a 10,000-row result back to the agent without blowing your context window?), and error handling when queries time out or permissions are denied.

Now multiply that by twelve data systems, each with different auth models, query interfaces, and failure modes.

Authentication patterns

Agents need credentials, and those credentials need to follow the same policies as human access. The patterns that work in production:

IAM roles (AWS/GCP/Azure): Agents running in VPC or cloud environments should assume IAM roles with scoped permissions. No static credentials in environment variables.
OAuth2 with service accounts: For BI tools and SaaS platforms, OAuth2 with short-lived tokens and automatic rotation. Service accounts should be purpose-specific — one for the monitoring agent, one for the query agent — not a shared "AI agent" account with broad access.
Warehouse-native access controls: In Snowflake, Databricks, and BigQuery, agents should operate as a named service principal that has been granted specific roles — read-only on production tables, no access to PII unless explicitly scoped.

The antipattern is giving your agent admin credentials "to make it easier." That's how you end up with an agent that accidentally drops a table.

The architecture

The governance layer isn't a step in the pipeline — it's a cross-cutting concern that intercepts every connector call, regardless of which system is being accessed.

3. Five Production Workflows That Actually Work

These aren't hypotheticals. They're the workflows data teams consistently deploy first because they have clear ROI and bounded blast radius.

Workflow 1: Pipeline Monitoring and Alerting

What it does: The agent watches your dbt Cloud runs and Airflow DAGs continuously. When a failure occurs, it doesn't just fire a generic "DAG failed" alert — it pulls the actual error logs, queries upstream tables to check for data freshness issues, checks whether this DAG failed last week at the same time, and then sends a Slack message to the on-call engineer with a root cause hypothesis and the three most likely fixes.

Connectors needed: dbt Cloud API, Airflow REST API or Astronomer API, Snowflake/Databricks (for upstream data checks), Slack.

Business impact: Mean time to resolution (MTTR) for pipeline incidents drops from 45-90 minutes to under 15 minutes. The agent handles the first 20 minutes of investigation automatically — checking logs, tracing lineage, ruling out obvious causes — so the human engineer who gets paged is handed context, not a blank slate.

Why it works: Pipeline failures are high-frequency, high-stakes, and follow predictable patterns. The agent gets better at diagnosis over time as it accumulates incident history.

Workflow 2: Automated Data Quality

What it does: The agent runs configurable quality checks across multiple warehouses — row count validation, schema drift detection, referential integrity checks, statistical distribution monitoring. When it finds an anomaly, it creates a Jira ticket with the full context: which table, which metric, what the expected value was, what it found, and which downstream dashboards are likely affected.

Connectors needed: Snowflake, BigQuery, Databricks (or whichever warehouses you run), Jira, optionally Great Expectations or Monte Carlo if you already use them.

Business impact: Data quality issues surface hours before they affect business dashboards, instead of being discovered when a VP notices a revenue number looks wrong. One enterprise data team reported catching 85% of data quality issues before business users noticed them, down from catching ~30% proactively before deploying the agent.

Why it works: Quality checks are rule-based and schedulable. The LLM adds value in interpreting anomalies and generating the right ticket context — not in replacing the check logic itself.

Workflow 3: Cross-Platform SQL Optimization

What it does: The agent periodically pulls slow query logs from both Snowflake and BigQuery (or whatever combination you run), analyzes execution plans, identifies the patterns causing performance issues — missing clustering keys, inefficient joins, unnecessary full table scans — and produces specific optimization recommendations with before/after query rewrites.

Connectors needed: Snowflake Query History / Information Schema, BigQuery INFORMATION_SCHEMA.JOBS, optionally Databricks query history.

Business impact: Warehouse compute costs are often dominated by a small number of expensive queries — the 20% of queries consuming 80% of credits. An agent working through slow query logs systematically catches optimization opportunities that fall through the cracks when engineers are busy. Teams running this workflow report 15-30% reductions in warehouse compute spend within the first 90 days.

Why it works: SQL optimization is pattern-matching work with clear success criteria. The agent doesn't need to understand your business domain — it needs to understand query execution, which is learnable from documentation and examples.

Workflow 4: Self-Serve Dashboard Creation

What it does: A business user describes in natural language what they want to see: "I need a dashboard showing weekly revenue by region compared to the same period last year, with a breakdown by product line." The agent introspects the relevant schemas, generates the SQL, creates the dashboard in Looker, PowerBI, or Tableau, and sends a link back to the user with a brief explanation of the data sources used.

Connectors needed: Snowflake/BigQuery (for schema introspection and query execution), Looker API or PowerBI REST API or Tableau Server REST API, Slack or Teams (for the user interface).

Business impact: Reduces the BI backlog for standard reporting requests, which at many companies runs 2-4 weeks. Business users get answers in hours instead of weeks; data engineers stop spending 30% of their time on simple dashboard requests.

Why it works: Standard reporting requests follow predictable patterns. The hardest part — understanding which tables contain which data — is solvable with good schema documentation and a few-shot examples of past queries.

Workflow 5: Cost Anomaly Detection

What it does: The agent monitors warehouse spend across all platforms — Snowflake credit consumption, BigQuery slot/byte billing, Databricks DBU usage — and fires an alert when spend deviates significantly from baseline. It identifies the specific query, user, or job responsible, checks whether it's a known scheduled job or an unexpected ad-hoc query, and escalates to the relevant team lead with enough context to take action.

Connectors needed: Snowflake ACCOUNT_USAGE views, BigQuery Billing export (BigQuery), Databricks cost APIs, Slack/PagerDuty.

Business impact: Cloud data warehouse costs can spike dramatically from a single poorly-written query or a misconfigured scheduled job. One manufacturing company caught a BigQuery job scanning 50TB per run (instead of the expected 50GB) within 20 minutes of the first run — saving approximately $4,000 per day. Without the agent, it would have run for a week before someone noticed in the monthly billing review.

Why it works: Cost anomaly detection is numerical threshold monitoring plus attribution — exactly the kind of structured reasoning LLMs handle well when connected to the right data sources.

4. Build vs. Buy: Framework vs. Platform

When building makes sense

If you have a single data warehouse, a team with multiple ML engineers, a narrow use case (one workflow, not five), and the appetite to own the full stack including connector maintenance, auth rotation, and observability — building with LangChain or a similar framework is viable. You'll have full control and can move fast on the initial implementation.

This describes roughly 15% of enterprises that reach out about AI agents. For the other 85%, the build path leads to a maintenance problem that arrives about six months in.

Where DIY frameworks break

LangChain is the most popular starting point. It's flexible, well-documented, and has a large ecosystem. It's also DIY on everything that matters for enterprise deployment: you're writing your own connectors, your own auth management, your own governance hooks, your own observability. "Flexible" means "you own it."

CrewAI is excellent for multi-agent orchestration and has good developer ergonomics. It doesn't have enterprise governance primitives. There's no concept of audit logging, IAM integration, or cost controls built in. You can add these, but you're building them from scratch.

AutoGen (Microsoft) is research-grade tooling. It's genuinely impressive for complex multi-agent scenarios in controlled environments. Production enterprise deployment at scale — with proper auth, logging, and governance — requires significant additional engineering.

The hidden maintenance costs

The frameworks above will get you a demo in a week. The costs that compound over time:

Connector maintenance: Every time Snowflake updates their API, or Airflow changes an endpoint, you're updating code. With twelve connectors, you're doing this constantly.
Auth rotation: Service account keys expire. OAuth tokens expire. IAM role policies change. Managing credential lifecycle for a dozen data systems is a part-time job.
Prompt versioning: Agent behavior drifts when you upgrade the underlying LLM or change prompt templates. You need version control and rollback for prompts, not just code.
Observability: Knowing why an agent took a particular action — which tool call it made, what it returned, why it chose that path — requires instrumentation that frameworks don't provide out of the box.

A team of three ML engineers can typically maintain a DIY agent setup for 2-3 systems. Beyond that, the maintenance overhead starts competing with new development.

What a purpose-built platform provides

A platform built specifically for enterprise data agents — like xpander.ai — handles the connector and governance infrastructure so your team focuses on the agent logic. That means pre-built, maintained connectors for Snowflake, BigQuery, Databricks, Postgres, dbt, Airflow, and Slack; IAM integration that maps to your existing access controls; built-in audit logging; and deployment flexibility ranging from cloud-hosted to self-hosted VPC to air-gapped environments for regulated industries.

The honest trade-off: you give up some customization flexibility in exchange for not owning the infrastructure. For most enterprise teams, that's the right call once you're running more than two or three agents across multiple data platforms.

5. The Cross-Platform Problem: Why Vendor-Native Agents Hit a Wall

Snowflake Intelligence / Cortex Analyst

Snowflake's AI agent capabilities are genuinely strong for Snowflake data. Cortex Analyst can answer natural language questions against your Snowflake tables with reasonable accuracy. Snowflake Intelligence extends this with multi-step reasoning.

The wall is hard and immediate: if your company acquired another company running on Databricks, or if your ML team built their feature store in BigQuery because that's where the data scientists came from, Snowflake's agents can't see it. The agent has no knowledge of tables that don't live in Snowflake.

Databricks Genie

Genie is impressive within the Unity Catalog ecosystem. It understands your data catalog, respects your column-level access controls, and can reason about lineage for data that Databricks manages. For companies that are genuinely Databricks-first, it's a compelling option.

Outside Unity Catalog, it's blind. Genie can't query your Snowflake warehouse, can't check your BigQuery billing, can't inspect your Airflow DAG logs.

BigQuery Agent (Gemini in BigQuery)

Same pattern. Google's integration is deep and tight — Gemini-powered analysis within the Google Cloud ecosystem is well-executed. Cross-cloud, or even cross-tool within the same cloud, is out of scope.

The enterprise reality

Most companies with more than 1,000 employees and a data team older than three years run a heterogeneous stack that vendor-native agents can't cover. Here's where they end up:

Capability	Snowflake Cortex	Databricks Genie	BigQuery Agent	DIY Framework	xpander.ai
Multi-platform support	❌ Snowflake only	❌ Databricks only	❌ Google only	⚠️ Build it yourself	✅ Native
Built-in governance	⚠️ Partial (Snowflake RBAC)	⚠️ Partial (Unity Catalog)	⚠️ Partial (IAM)	❌ Build it yourself	✅ Infra agnostic
Self-hosted / air-gap	❌	❌	❌	✅ Possible	✅ Supported
Native connectors	Snowflake only	Databricks only	GCP only	❌ Build it yourself	✅ 20+ connectors
Time to production	Days (Snowflake-only)	Days (Databricks-only)	Days (GCP-only)	2–6 months	1–2 weeks

What "cross-platform" actually means

"Cross-platform" is not a checkbox. It means a single agent that can: run a query against your Snowflake warehouse, use those results to join with a table in BigQuery, check the dbt lineage graph to understand what upstream jobs feed those tables, verify the relevant Airflow DAGs ran successfully today, and push a summary to a Looker dashboard — with every step logged under unified governance and a single audit trail.

That's not achievable with vendor-native agents. It's barely achievable with DIY frameworks without significant investment. It's the primary design constraint that purpose-built cross-platform platforms are built to solve.

6. Production Requirements Most Teams Forget

These requirements never appear in demos. They become urgent about 60 days into a production rollout.

Governance and audit logging

Every query an agent runs, every table it reads, every alert it sends — these need to be logged in a way that's queryable and explainable. "The agent did it" is not an acceptable answer when a compliance officer asks why certain data was accessed at 2 AM on a Sunday.

The audit log needs to capture: what action was taken, which data system was accessed, what query was executed, what was returned, which agent and user triggered it, and when. This is table stakes for any regulated industry (financial services, healthcare, government) and increasingly expected everywhere.

Look for SOC 2 Type II compliance as a baseline signal that a platform takes this seriously — not as a guarantee, but as evidence that governance has been operationalized, not just documented.

Deployment flexibility

The default assumption — cloud-hosted SaaS — is fine for many workloads. It's not acceptable for:

Regulated industries (financial services, healthcare): Data often can't leave a VPC. Self-hosted or VPC deployment is required.
Defense and government: Air-gapped deployment with no external network calls. The entire agent stack, including the LLM, runs on-premises.
Companies with specific data residency requirements: EU data sovereignty rules, for example, may require that query results never transit servers outside specific regions.

If your vendor doesn't offer self-hosted or VPC deployment options, you're locked out of certain customer segments and potentially in violation of your own data governance policies.

IAM integration

Agents should not create shadow permission systems. The access an agent has should be derivable from and auditable within your existing IAM infrastructure.

In practice: agents running in AWS should use IAM roles, not static API keys. Agents running in GCP should use Workload Identity Federation. Agents accessing Snowflake should operate as a named service principal with Snowflake-native role bindings. The agent's permissions should be visible in the same place where you manage human engineer permissions.

Shadow permissions — where the agent has access that isn't visible in your main IAM system — create audit gaps and security risks that won't be discovered until something goes wrong.

Guardrails by default

The default posture for a data agent should be read-only. Write operations — creating tables, modifying rows, pushing to production dashboards, sending external alerts — should require explicit configuration and, for high-stakes operations, human-in-the-loop approval.

Cost guardrails matter too. An agent that can run arbitrary queries against your Snowflake warehouse can accidentally kick off a full table scan on a multi-billion-row fact table. Query cost limits — maximum bytes processed, maximum credits consumed, maximum rows returned — should be configurable per agent, per connector, and per user role.

Observability

You need to be able to answer: "Why did the agent take that action?" For debugging, for compliance, and for improving agent behavior over time.

The minimum viable observability stack for a data agent:

Reasoning traces: What did the agent "think" at each step? Which tool did it decide to call and why?
Tool call logs: What exact API call or SQL query was executed? What was returned?
Performance metrics: Latency per agent run, per tool call, per workflow. Which connectors are slow?
Error rates: What fraction of agent runs fail, and at which step?

Without this, you're debugging a black box. Agents that seem to work fine in staging will behave differently in production with real data volumes and edge cases, and you won't know why without traces.

7. Getting Started: A Practical Roadmap

The teams that fail at this spend six weeks evaluating platforms and twelve weeks building a framework-based solution that covers three workflows. The teams that succeed pick one high-pain workflow, deploy it in two weeks, and use that momentum to expand.

Phase 1: One workflow, two data sources, two weeks

Pick pipeline monitoring. It's the highest-pain, lowest-risk workflow to start with — failures are already happening, the blast radius of the agent is low (it's reading logs, not writing data), and the ROI is visible immediately.

Connect two data sources: your orchestrator (Airflow or dbt Cloud) and one warehouse. Don't try to boil the ocean. Get the agent diagnosing pipeline failures and alerting on Slack within two weeks. If you can't do it in two weeks, your chosen approach is too complex.

Phase 2: Expand to cross-platform workflows (weeks 3-8)

Once you've proven the pattern, add more connectors. Connect your second warehouse. Add cost monitoring. The governance layer you put in place in Phase 1 should cover the new workflows without additional configuration — if it doesn't, that's a signal to address the governance gap before expanding further.

Set specific targets: MTTR for pipeline incidents, number of data quality issues caught before business users noticed them, compute cost as a percentage of revenue.

Phase 3: Governance layer review before org-wide rollout

Before you push this to 50 engineers and 200 business users, audit the governance setup. Can you answer these questions?

What did the agent query yesterday?
Who triggered each agent run?
What's the maximum query cost the agent can incur in a single run?
What happens if an agent run fails mid-workflow — does it leave data in a consistent state?
What's the escalation path when the agent is wrong?

If you can't answer all five confidently, fix that before expanding. The cost of fixing a governance gap scales with the number of users.

Phase 4: Measure and report

The metrics that matter:

MTTR for data incidents: Measure baseline before deploying, then monthly after. A well-deployed pipeline monitoring agent should cut this by 50-70% within 90 days.
Hours saved on manual investigation: Track how often the agent's diagnosis was correct on the first try versus required human correction.
Warehouse compute cost: Track week-over-week after deploying the cost anomaly detection and SQL optimization workflows.
BI backlog: If you deploy the self-serve dashboard workflow, track the number of open dashboard requests and time-to-delivery.

These metrics are what justify the next phase of investment and what give you credibility when presenting to leadership.

Where to Go From Here

The architecture isn't the hard part. Neither is picking an LLM. The hard part is the connector layer — building and maintaining reliable, governed, authenticated connections to twelve different data systems across two or three cloud providers, while keeping the whole thing auditable and operable by a team that has other things to do.

If you're evaluating approaches, test against your actual multi-platform stack, not a sandbox with one warehouse. The difference between "works in a demo" and "works in production" for data agents is almost always the connector and governance layer.

xpander.ai is built specifically for this problem — native connectors for Snowflake, BigQuery, Databricks, Postgres, dbt, Airflow, and Slack, with unified governance, SOC 2 Type II compliance, and deployment options from cloud-hosted to fully air-gapped. If you want to see what a cross-platform agent deployment looks like against your actual stack, it's worth a conversation.

But regardless of what you use: start with one workflow, prove it in two weeks, and build from there. The teams getting the most value from data agents aren't the ones who planned the most elaborate architecture — they're the ones who shipped something real six months ago.

Architecture patterns and workflow descriptions are based on common enterprise deployment patterns. Specific performance metrics cited are representative of results reported by data teams in the field and will vary by environment and implementation.