Overview
Cloudflare Workers AI lets you run LLM and AI inference at the edge using Cloudflare's global network. Deploy AI-powered features without managing GPU infrastructure — models run serverlessly alongside your Workers code. It supports text generation, embeddings, image generation, and more.
Example: Text Generation
export default {
async fetch(request, env) {
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain serverless in one sentence.' },
],
}
);
return Response.json(response);
},
};Available Model Categories
| Category | Models | Use Case |
|---|---|---|
| Text Generation | Llama 3.1, Mistral, Gemma | Chat, summarization, code |
| Embeddings | BGE, GTE | Semantic search, RAG |
| Image Generation | Stable Diffusion | Image creation |
| Speech-to-Text | Whisper | Audio transcription |
| Translation | M2M-100 | Multi-language translation |
Getting Started
- •Add an AI binding to your wrangler.toml: [ai] binding = "AI"
- •Call env.AI.run() with a model name and input
- •Deploy with wrangler deploy — no GPU provisioning needed
- •Free tier includes 10,000 neurons per day