env.dev

Cloudflare Workers AI

Run LLM and AI models at the edge with Cloudflare Workers. Serverless inference for text, image, and speech models.

Visit Cloudflare Workers AI

Overview

Cloudflare Workers AI lets you run LLM and AI inference at the edge using Cloudflare's global network. Deploy AI-powered features without managing GPU infrastructure — models run serverlessly alongside your Workers code. It supports text generation, embeddings, image generation, and more.

Example: Text Generation

TypeScript
export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-3.1-8b-instruct',
      {
        messages: [
          { role: 'system', content: 'You are a helpful assistant.' },
          { role: 'user', content: 'Explain serverless in one sentence.' },
        ],
      }
    );
    return Response.json(response);
  },
};

Available Model Categories

CategoryModelsUse Case
Text GenerationLlama 3.1, Mistral, GemmaChat, summarization, code
EmbeddingsBGE, GTESemantic search, RAG
Image GenerationStable DiffusionImage creation
Speech-to-TextWhisperAudio transcription
TranslationM2M-100Multi-language translation

Getting Started

  • Add an AI binding to your wrangler.toml: [ai] binding = "AI"
  • Call env.AI.run() with a model name and input
  • Deploy with wrangler deploy — no GPU provisioning needed
  • Free tier includes 10,000 neurons per day

Frequently Asked Questions

What is Workers AI?

Workers AI is Cloudflare's serverless AI inference platform that lets you run popular LLM and AI models directly at the edge, close to your users.

What models are available?

Workers AI supports text generation (Llama, Mistral), text embeddings, image generation, speech-to-text, translation, and more LLM and vision models.

How is Workers AI priced?

Workers AI has a free tier with daily neuron limits. Paid usage is billed per neuron consumed, which varies by model size and input/output length.