AI app architecture

Ship a small AI app fast — three cloud architectures that scale.

You don't need a platform team to put an AI app in production. On every major cloud the recipe is the same: a serverless container for your code, a managed LLM behind it, identity in front, and a little storage and retrieval on the side. Here are the three reference designs — for AWS, Google Cloud and Azure — that get a small or internal app deployed in an afternoon and still scale when it catches on.

What "small AI app" means here

An internal tool, a prototype, a customer-facing feature — anything that wraps an LLM in a web app and a bit of your own data. The goal is the same on all three clouds: deploy quickly, scale to zero when idle, and grow without a re-architecture. Each design below uses only managed, serverless building blocks, so there's no cluster to babysit and you pay for what you use.

The shape is the same on every cloud

Before the service names, learn the pattern once. Every architecture on this page is four layers wired the same way — only the logos change.

1 · Edge & identity

A CDN terminates TLS and serves the front end; an identity provider signs users in. For an internal app this layer is also the gate that keeps everyone but your org out.

2 · Serverless container

Your app runs as a container that scales on request volume and drops to zero when idle. You push an image; the platform handles HTTPS, load-balancing and autoscaling.

3 · A managed LLM

The container calls a hosted model API — no GPUs to provision. The same call site does retrieval (RAG) against a vector store so answers are grounded in your data.

4 · Data, secrets & keyless trust

Object storage and a serverless database hold your data; a secrets vault holds credentials. The container gets a workload identity so it calls the LLM with no API keys at all.

That last point is the one people miss: on all three clouds the modern pattern is keyless. The container is granted an identity (an IAM role, a service account, or a managed identity) and the cloud hands it short-lived credentials at runtime — so there's no model API key to leak.

AWS · App Runner + Amazon Bedrock

The fastest path to an AI app on AWS: AWS App Runner deploys your container straight from Amazon ECR, gives you an HTTPS endpoint, and autoscales — no clusters, no load balancers to wire up. Your code calls Amazon Bedrock (Claude, Amazon Nova, Llama and more through one API), and Amazon Cognito handles sign-in. App Runner assumes an IAM role, so it calls Bedrock with no stored keys.

User → CloudFront → App Runner (signed in with Cognito) → Bedrock for generation, OpenSearch for retrieval, DynamoDB & S3 for data.

Security, the AWS way

Amazon Cognito user pools give you hosted sign-up/sign-in, OAuth 2.0 / OIDC and social or SAML federation — your app validates the JWT Cognito returns. For service-to-service trust, App Runner assumes an IAM instance role scoped to bedrock:InvokeModel plus its S3 and DynamoDB needs, so it reaches Bedrock with no static keys. Add Bedrock Guardrails for content filtering and Bedrock Knowledge Bases if you'd rather AWS manage the whole RAG pipeline. Heads-up: AWS is steering new projects to Amazon ECS Express Mode, App Runner's successor with the same one-step container deploy — the architecture above is identical on either.

Google Cloud · Cloud Run + Vertex AI (Gemini)

Cloud Run is the quickest deploy in the business — gcloud run deploy --source . builds your container and hands back an HTTPS URL that scales to zero. Your code calls Gemini through Vertex AI, retrieval runs on Vertex AI Vector Search, and access is gated either by Identity Platform (customer sign-in) or Identity-Aware Proxy (lock an internal app to your org). Cloud Run runs as a service account, so it calls Vertex AI with no keys.

User → Cloud CDN/LB → Cloud Run (signed in with Identity Platform) → Vertex AI Gemini, Vector Search, Firestore & Cloud Storage.

Security, the Google Cloud way

For customer-facing apps, Identity Platform (the enterprise upgrade of Firebase Authentication) gives you sign-in with OIDC/SAML federation and MFA. For an internal app the cleaner answer is Identity-Aware Proxy (IAP): it authenticates every request against your Google Workspace identities before traffic reaches Cloud Run, so you write zero auth code. Cloud Run runs as a service account granted roles/aiplatform.user; the SDK picks up credentials through workload identity, so Vertex AI is reached with no keys. Add Grounding or the Vertex AI RAG Engine for managed retrieval.

Azure · Container Apps + Azure OpenAI

Azure Container Apps runs your container serverlessly with KEDA autoscaling and scale-to-zero, pulling the image from Azure Container Registry. Your code calls Azure OpenAI (the GPT family and more, in Azure AI Foundry), retrieval runs on Azure AI Search, and Microsoft Entra ID handles sign-in via an app registration. A managed identity lets the container call Azure OpenAI with no API keys.

User → Front Door → Container App (signed in with Entra ID) → Azure OpenAI, AI Search, Cosmos DB & Blob Storage, all via managed identity.

Security, the Azure way

You register the app in Microsoft Entra ID (formerly Azure AD) to get a client ID and configure OAuth 2.0 / OIDC sign-in. Container Apps' built-in authentication — "Easy Auth" — runs as a sidecar that handles the whole login handshake and injects the user's identity into request headers, so there's no auth code in your app. For service trust, the Container App's managed identity is granted the Cognitive Services OpenAI User role, so it calls Azure OpenAI with no API keys — same for AI Search, Cosmos DB, Key Vault and the registry. Add Azure AI Content Safety and Prompt Shields for guardrails. (Customer-facing app? Use Entra External ID for CIAM.)

The three stacks, layer by layer

Same architecture, three sets of names. Pick the column for the cloud you're already in — the design doesn't change.

Layer	AWS	Google Cloud	Azure
Serverless compute	App Runner → ECS Express Mode	Cloud Run	Container Apps
Managed LLM	Amazon Bedrock	Vertex AI · Gemini	Azure OpenAI · AI Foundry
User sign-in	Amazon Cognito	Identity Platform	Entra ID (app registration)
Internal-only access	VPC + IAM	Identity-Aware Proxy	Easy Auth + Front Door
Keyless service → AI	IAM role	Service account	Managed identity
Vector / RAG	OpenSearch · Bedrock KB	Vertex AI Vector Search	Azure AI Search
Database	DynamoDB · Aurora	Firestore · Cloud SQL	Cosmos DB · PostgreSQL
Object storage	Amazon S3	Cloud Storage	Blob Storage
Secrets	Secrets Manager	Secret Manager	Key Vault
Container registry	Amazon ECR	Artifact Registry	Container Registry
Edge / CDN	CloudFront	Cloud CDN + LB	Azure Front Door
Observability	CloudWatch	Cloud Logging + Monitoring	Azure Monitor · App Insights

Service names current as of June 2026. Cloud providers rename things often — check the diagram against the live console before you build.

Diagram it first — then let your AI build it

Here's the workflow that makes these architectures cheap to design: sketch the stack in Outwin with the real service icons, then export it to an AI that can read every box and connection — and ask for the Terraform, the Bicep, or a security review. The picture becomes the prompt.

Draw the stack in a minute

Press / and type /aws app runner, /gcp cloud run or /azure container apps — the real icon drops onto the canvas. 1,146 AWS, Azure and GCP icons are built in, so there's no SVG hunting. Ghost-suggestion pills propose the next box and auto-layout keeps it tidy.

Hand it to ChatGPT or Claude

Export to AI-readable HTML — a file that spells out your services and how they connect in plain text, not a flat screenshot. The model sees the whole topology and writes the IaC to match, grounded in your architecture.

Private enough for real systems

Outwin runs entirely in your browser with no account and no server, and works offline after first load. Diagram a customer's production stack — or a sensitive internal one — and nothing ever leaves your machine.

One picture the whole room reads

The same board is what you whiteboard on the call, the file you feed the AI, and the diagram you drop in the design doc. No redraws, no drift — a single source of truth from prototype to production.

Why this beats a generic diagram tool. draw.io, Lucidchart and Cloudcraft export a flat image an LLM can't parse — the structure dies the moment you leave the canvas. Outwin's export keeps the architecture machine-readable, so your AI can turn it into working infrastructure. See the comparison →

Built for people shipping AI fast

AI engineers standing up an LLM feature and wiring in RAG, auth and a datastore.
Forward-deployed engineers proposing a target architecture on a customer call — see the FDE playbook.
Solutions architects picking the cloud and sketching the reference design for a review.
Founders and small teams who want production-shaped infra without a platform org behind them.

Cloud architecture diagrams Export to ChatGPT & Claude FDE playbook Outwin vs other tools FAQ

Sketch your AI architecture now

Free, no sign-up, runs in your browser. Press /, drop the real AWS, GCP or Azure icons, and hand the whole stack to your AI.

Open the canvas