Cloudflare Workers AI

Run LLM and AI models at the edge with Cloudflare Workers. Serverless inference for text, image, and speech models.

Visit Cloudflare Workers AI →

Overview

Cloudflare Workers AI lets you run LLM and AI inference at the edge using Cloudflare's global network. Deploy AI-powered features without managing GPU infrastructure — models run serverlessly alongside your Workers code. It supports text generation, embeddings, image generation, and more.

Example: Text Generation

TypeScript

export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-3.1-8b-instruct',
      {
        messages: [
          { role: 'system', content: 'You are a helpful assistant.' },
          { role: 'user', content: 'Explain serverless in one sentence.' },
        ],
      }
    );
    return Response.json(response);
  },
};

Available Model Categories

Category	Models	Use Case
Text Generation	Llama 3.1, Mistral, Gemma	Chat, summarization, code
Embeddings	BGE, GTE	Semantic search, RAG
Image Generation	Stable Diffusion	Image creation
Speech-to-Text	Whisper	Audio transcription
Translation	M2M-100	Multi-language translation

Getting Started

•Add an AI binding to your wrangler.toml: [ai] binding = "AI"
•Call env.AI.run() with a model name and input
•Deploy with wrangler deploy — no GPU provisioning needed
•Free tier includes 10,000 neurons per day

Frequently Asked Questions

What is Workers AI?

Workers AI is Cloudflare's serverless AI inference platform that lets you run popular LLM and AI models directly at the edge, close to your users.

What models are available?

Workers AI supports text generation (Llama, Mistral), text embeddings, image generation, speech-to-text, translation, and more LLM and vision models.

How is Workers AI priced?

Workers AI has a free tier with daily neuron limits. Paid usage is billed per neuron consumed, which varies by model size and input/output length.

Cloudflare Workers AI

Overview

Example: Text Generation

Available Model Categories

Getting Started

Frequently Asked Questions

What is Workers AI?

What models are available?

How is Workers AI priced?

Related Resources

Emergent (E2B)

Ollama

Open WebUI

LLM APIs for Developers

Amazon Q Developer