AI app architecture
Ship a small AI app fast — three cloud architectures that scale.
You don't need a platform team to put an AI app in production. On every major cloud the recipe is the same: a serverless container for your code, a managed LLM behind it, identity in front, and a little storage and retrieval on the side. Here are the three reference designs — for AWS, Google Cloud and Azure — that get a small or internal app deployed in an afternoon and still scale when it catches on.
What "small AI app" means here
An internal tool, a prototype, a customer-facing feature — anything that wraps an LLM in a web app and a bit of your own data. The goal is the same on all three clouds: deploy quickly, scale to zero when idle, and grow without a re-architecture. Each design below uses only managed, serverless building blocks, so there's no cluster to babysit and you pay for what you use.
The shape is the same on every cloud
Before the service names, learn the pattern once. Every architecture on this page is four layers wired the same way — only the logos change.
1 · Edge & identity
A CDN terminates TLS and serves the front end; an identity provider signs users in. For an internal app this layer is also the gate that keeps everyone but your org out.
2 · Serverless container
Your app runs as a container that scales on request volume and drops to zero when idle. You push an image; the platform handles HTTPS, load-balancing and autoscaling.
3 · A managed LLM
The container calls a hosted model API — no GPUs to provision. The same call site does retrieval (RAG) against a vector store so answers are grounded in your data.
4 · Data, secrets & keyless trust
Object storage and a serverless database hold your data; a secrets vault holds credentials. The container gets a workload identity so it calls the LLM with no API keys at all.
That last point is the one people miss: on all three clouds the modern pattern is keyless. The container is granted an identity (an IAM role, a service account, or a managed identity) and the cloud hands it short-lived credentials at runtime — so there's no model API key to leak.
AWS · App Runner + Amazon Bedrock
The fastest path to an AI app on AWS: AWS App Runner deploys your container straight from Amazon ECR, gives you an HTTPS endpoint, and autoscales — no clusters, no load balancers to wire up. Your code calls Amazon Bedrock (Claude, Amazon Nova, Llama and more through one API), and Amazon Cognito handles sign-in. App Runner assumes an IAM role, so it calls Bedrock with no stored keys.
Security, the AWS way
Amazon Cognito user pools give you hosted sign-up/sign-in, OAuth 2.0 / OIDC and social or SAML federation — your app validates the JWT Cognito returns. For service-to-service trust, App Runner assumes an IAM instance role scoped to bedrock:InvokeModel plus its S3 and DynamoDB needs, so it reaches Bedrock with no static keys. Add Bedrock Guardrails for content filtering and Bedrock Knowledge Bases if you'd rather AWS manage the whole RAG pipeline. Heads-up: AWS is steering new projects to Amazon ECS Express Mode, App Runner's successor with the same one-step container deploy — the architecture above is identical on either.
Google Cloud · Cloud Run + Vertex AI (Gemini)
Cloud Run is the quickest deploy in the business — gcloud run deploy --source . builds your container and hands back an HTTPS URL that scales to zero. Your code calls Gemini through Vertex AI, retrieval runs on Vertex AI Vector Search, and access is gated either by Identity Platform (customer sign-in) or Identity-Aware Proxy (lock an internal app to your org). Cloud Run runs as a service account, so it calls Vertex AI with no keys.
Security, the Google Cloud way
For customer-facing apps, Identity Platform (the enterprise upgrade of Firebase Authentication) gives you sign-in with OIDC/SAML federation and MFA. For an internal app the cleaner answer is Identity-Aware Proxy (IAP): it authenticates every request against your Google Workspace identities before traffic reaches Cloud Run, so you write zero auth code. Cloud Run runs as a service account granted roles/aiplatform.user; the SDK picks up credentials through workload identity, so Vertex AI is reached with no keys. Add Grounding or the Vertex AI RAG Engine for managed retrieval.
Azure · Container Apps + Azure OpenAI
Azure Container Apps runs your container serverlessly with KEDA autoscaling and scale-to-zero, pulling the image from Azure Container Registry. Your code calls Azure OpenAI (the GPT family and more, in Azure AI Foundry), retrieval runs on Azure AI Search, and Microsoft Entra ID handles sign-in via an app registration. A managed identity lets the container call Azure OpenAI with no API keys.
Security, the Azure way
You register the app in Microsoft Entra ID (formerly Azure AD) to get a client ID and configure OAuth 2.0 / OIDC sign-in. Container Apps' built-in authentication — "Easy Auth" — runs as a sidecar that handles the whole login handshake and injects the user's identity into request headers, so there's no auth code in your app. For service trust, the Container App's managed identity is granted the Cognitive Services OpenAI User role, so it calls Azure OpenAI with no API keys — same for AI Search, Cosmos DB, Key Vault and the registry. Add Azure AI Content Safety and Prompt Shields for guardrails. (Customer-facing app? Use Entra External ID for CIAM.)
The three stacks, layer by layer
Same architecture, three sets of names. Pick the column for the cloud you're already in — the design doesn't change.
| Layer | AWS | Google Cloud | Azure |
|---|---|---|---|
| Serverless compute | App Runner → ECS Express Mode | Cloud Run | Container Apps |
| Managed LLM | Amazon Bedrock | Vertex AI · Gemini | Azure OpenAI · AI Foundry |
| User sign-in | Amazon Cognito | Identity Platform | Entra ID (app registration) |
| Internal-only access | VPC + IAM | Identity-Aware Proxy | Easy Auth + Front Door |
| Keyless service → AI | IAM role | Service account | Managed identity |
| Vector / RAG | OpenSearch · Bedrock KB | Vertex AI Vector Search | Azure AI Search |
| Database | DynamoDB · Aurora | Firestore · Cloud SQL | Cosmos DB · PostgreSQL |
| Object storage | Amazon S3 | Cloud Storage | Blob Storage |
| Secrets | Secrets Manager | Secret Manager | Key Vault |
| Container registry | Amazon ECR | Artifact Registry | Container Registry |
| Edge / CDN | CloudFront | Cloud CDN + LB | Azure Front Door |
| Observability | CloudWatch | Cloud Logging + Monitoring | Azure Monitor · App Insights |
Service names current as of June 2026. Cloud providers rename things often — check the diagram against the live console before you build.
Diagram it first — then let your AI build it
Here's the workflow that makes these architectures cheap to design: sketch the stack in Outwin with the real service icons, then export it to an AI that can read every box and connection — and ask for the Terraform, the Bicep, or a security review. The picture becomes the prompt.
Draw the stack in a minute
Press / and type /aws app runner, /gcp cloud run or /azure container apps — the real icon drops onto the canvas. 1,146 AWS, Azure and GCP icons are built in, so there's no SVG hunting. Ghost-suggestion pills propose the next box and auto-layout keeps it tidy.
Hand it to ChatGPT or Claude
Export to AI-readable HTML — a file that spells out your services and how they connect in plain text, not a flat screenshot. The model sees the whole topology and writes the IaC to match, grounded in your architecture.
Private enough for real systems
Outwin runs entirely in your browser with no account and no server, and works offline after first load. Diagram a customer's production stack — or a sensitive internal one — and nothing ever leaves your machine.
One picture the whole room reads
The same board is what you whiteboard on the call, the file you feed the AI, and the diagram you drop in the design doc. No redraws, no drift — a single source of truth from prototype to production.
Why this beats a generic diagram tool. draw.io, Lucidchart and Cloudcraft export a flat image an LLM can't parse — the structure dies the moment you leave the canvas. Outwin's export keeps the architecture machine-readable, so your AI can turn it into working infrastructure. See the comparison →
Built for people shipping AI fast
- AI engineers standing up an LLM feature and wiring in RAG, auth and a datastore.
- Forward-deployed engineers proposing a target architecture on a customer call — see the FDE playbook.
- Solutions architects picking the cloud and sketching the reference design for a review.
- Founders and small teams who want production-shaped infra without a platform org behind them.
Sketch your AI architecture now
Free, no sign-up, runs in your browser. Press /, drop the real AWS, GCP or Azure icons, and hand the whole stack to your AI.
Open the canvas