env.dev

Rate Limiting: Token Bucket, Sliding Window & More

Implement rate limiting with token bucket, sliding window, fixed window, and Redis-based strategies. Code examples, edge patterns, and API gateways.

Last updated:

Rate limiting controls how many requests a client can make to an API within a given time window. It protects backend services from abuse, prevents resource exhaustion, and ensures fair usage across clients. Every major API — GitHub (5,000 req/hr), Stripe (100 req/sec), OpenAI (tokens-per-minute) — enforces rate limits. The choice of algorithm determines how "bursty" traffic is handled and how evenly requests are distributed. This guide covers the four main algorithms, Redis implementations, and edge-runtime patterns. For broader API design context, see REST API best practices.

How Does the Token Bucket Algorithm Work?

The token bucket is the most widely used rate limiting algorithm. A bucket holds tokens up to a maximum capacity. Each request consumes one token. Tokens are added at a fixed rate. If the bucket is empty, the request is rejected. This allows controlled bursts up to the bucket size while enforcing a long-term average rate.

Token bucket implementation
interface TokenBucket {
  tokens: number;
  lastRefill: number;
  capacity: number;
  refillRate: number; // tokens per second
}

function consumeToken(bucket: TokenBucket): boolean {
  const now = Date.now();
  const elapsed = (now - bucket.lastRefill) / 1000;

  // Refill tokens based on elapsed time
  bucket.tokens = Math.min(
    bucket.capacity,
    bucket.tokens + elapsed * bucket.refillRate
  );
  bucket.lastRefill = now;

  if (bucket.tokens >= 1) {
    bucket.tokens -= 1;
    return true; // Request allowed
  }
  return false; // Rate limited
}

// Example: 100 requests/minute, burst up to 20
const bucket: TokenBucket = {
  tokens: 20,
  lastRefill: Date.now(),
  capacity: 20,
  refillRate: 100 / 60, // ~1.67 tokens/sec
};

How Does the Sliding Window Algorithm Work?

The sliding window log tracks the timestamp of every request and counts how many fall within the current window. It is the most accurate algorithm but requires more memory. The sliding window counter is a memory-efficient approximation that uses weighted counts from the current and previous window.

Sliding window counter
function slidingWindowCounter(
  prevCount: number,
  currCount: number,
  windowSize: number, // in milliseconds
  limit: number
): boolean {
  const now = Date.now();
  const currWindowStart = Math.floor(now / windowSize) * windowSize;
  const elapsed = now - currWindowStart;

  // Weight previous window by how much of it overlaps
  const weight = 1 - elapsed / windowSize;
  const estimatedCount = prevCount * weight + currCount;

  return estimatedCount < limit;
}

// Example: 100 requests per 60-second window
// Previous window had 80 requests, current window has 30
// We're 45 seconds into the current window
// Estimate: 80 * (1 - 45/60) + 30 = 80 * 0.25 + 30 = 50 → under limit

How Do Fixed Window and Leaky Bucket Compare?

Fixed window counts requests in discrete time windows (e.g., per minute). It is simple but allows bursts at window boundaries — a client can make 2× the limit by sending requests at the end of one window and the start of the next.

Leaky bucket processes requests at a fixed rate, like water dripping from a bucket. Excess requests queue up (up to a max queue size) and are processed in order. It produces the smoothest output rate but adds latency for queued requests.

AlgorithmBurst handlingMemoryAccuracyBest for
Token bucketAllows controlled burstsLow (counter + timestamp)GoodAPIs, general use
Sliding window logPrecise per-requestHigh (stores each timestamp)ExactBilling, auditing
Sliding window counterSmoothed approximationLow (2 counters)GoodHigh-scale APIs
Fixed windowBoundary bursts possibleLow (1 counter)ModerateSimple use cases
Leaky bucketQueued, no burstsMedium (queue)Exact rateSmoothing traffic

How Do You Implement Rate Limiting with Redis?

Redis is the standard backing store for distributed rate limiting because of its atomic operations, sub-millisecond latency, and built-in expiration. Here are two common patterns:

Redis fixed window (INCR + EXPIRE)
import type { Redis } from 'ioredis';

async function fixedWindowLimit(
  redis: Redis,
  key: string,
  limit: number,
  windowSeconds: number
): Promise<{ allowed: boolean; remaining: number }> {
  const current = await redis.incr(key);

  if (current === 1) {
    // First request in this window — set expiration
    await redis.expire(key, windowSeconds);
  }

  return {
    allowed: current <= limit,
    remaining: Math.max(0, limit - current),
  };
}

// Usage: 100 requests per minute per user
const windowKey = `ratelimit:${userId}:${Math.floor(Date.now() / 60000)}`;
const result = await fixedWindowLimit(redis, windowKey, 100, 60);
Redis sliding window log (sorted set)
async function slidingWindowLog(
  redis: Redis,
  key: string,
  limit: number,
  windowMs: number
): Promise<{ allowed: boolean; remaining: number }> {
  const now = Date.now();
  const windowStart = now - windowMs;

  const pipeline = redis.pipeline();
  // Remove expired entries
  pipeline.zremrangebyscore(key, 0, windowStart);
  // Count remaining entries
  pipeline.zcard(key);
  // Add current request
  pipeline.zadd(key, now.toString(), `${now}:${Math.random()}`);
  // Set key expiration
  pipeline.pexpire(key, windowMs);

  const results = await pipeline.exec();
  const count = (results?.[1]?.[1] as number) ?? 0;

  if (count >= limit) {
    // Over limit — remove the entry we just added
    await redis.zremrangebyscore(key, now, now);
    return { allowed: false, remaining: 0 };
  }

  return { allowed: true, remaining: limit - count - 1 };
}

How Should APIs Communicate Rate Limits?

The IETF draft RateLimit header fields (draft-ietf-httpapi-ratelimit-headers, revision 10 published September 2025) standardize how APIs communicate rate limit status. Many APIs already use the X-RateLimit-* convention.

Rate limit response headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 100          # Max requests in the window
X-RateLimit-Remaining: 57       # Requests remaining in current window
X-RateLimit-Reset: 1704067200   # Unix timestamp when the window resets
Retry-After: 30                 # Seconds to wait (only on 429 responses)

# When limit is exceeded
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704067200
Retry-After: 30
Content-Type: application/json

{"error": "rate_limit_exceeded", "message": "Too many requests. Retry after 30 seconds."}

Order middleware carefully: rate limiting should sit after CORS preflight handling so 429s carry the right headers — see the CORS guide for the layering rules.

Express middleware example
import type { Request, Response, NextFunction } from 'express';

function rateLimitMiddleware(limit: number, windowSeconds: number) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const windowKey = `ratelimit:${req.ip}:${Math.floor(Date.now() / (windowSeconds * 1000))}`;
    const result = await fixedWindowLimit(redis, windowKey, limit, windowSeconds);

    res.set('X-RateLimit-Limit', limit.toString());
    res.set('X-RateLimit-Remaining', result.remaining.toString());
    res.set('X-RateLimit-Reset', (Math.ceil(Date.now() / 1000) + windowSeconds).toString());

    if (!result.allowed) {
      res.set('Retry-After', windowSeconds.toString());
      res.status(429).json({
        error: 'rate_limit_exceeded',
        message: `Too many requests. Retry after ${windowSeconds} seconds.`,
      });
      return;
    }

    next();
  };
}

// Apply: 100 requests per minute
app.use('/api', rateLimitMiddleware(100, 60));

How Do API Gateways Handle Rate Limiting?

In production, rate limiting is typically handled at the API gateway layer (Nginx, Kong, AWS API Gateway, Cloudflare) rather than in application code. This offloads the work before requests reach your services. The Nginx configuration guide covers the surrounding directives (proxy_pass, upstreams, TLS termination) that pair with rate limiting.

Nginx rate limiting
# Define a rate limit zone (10 req/sec per IP, 10MB shared memory)
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

server {
    location /api/ {
        # Allow bursts of 20 requests, delay after 10
        limit_req zone=api burst=20 delay=10;
        limit_req_status 429;

        proxy_pass http://backend;
    }
}

Multi-tier limiting: Apply different limits at different layers — a global per-IP limit at the gateway (e.g., 1000 req/min), a per-user limit at the application layer (e.g., 100 req/min for free tier, 1000 for paid), and a per-endpoint limit for expensive operations (e.g., 10 req/min for search).

How Do You Rate Limit at the Edge?

Edge runtimes (Cloudflare Workers, Vercel Edge Functions, Deno Deploy) run in dozens of regions, so a single Redis primary in us-east-1 turns every check into a cross-continent round trip. Two patterns dominate in 2026: Cloudflare Durable Objects for strongly consistent per-key counters, and Upstash Redis REST for centralized counters with regional replicas.

Durable Objects pin all writes for a given key to a single object instance, giving you atomic increments without a separate Redis. The trade-off is locality: the object lives in one region, so requests from far away pay extra latency. Upstash flips it — REST calls hit the nearest replica, but cross-replica consistency is eventual.

@upstash/ratelimit on Cloudflare Workers
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis/cloudflare';

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const ratelimit = new Ratelimit({
      redis: Redis.fromEnv(env),
      // Sliding window: 100 requests per 60 seconds
      limiter: Ratelimit.slidingWindow(100, '60 s'),
      analytics: true,
      prefix: 'ratelimit:api',
    });

    const ip = req.headers.get('cf-connecting-ip') ?? 'anonymous';
    const { success, limit, remaining, reset } = await ratelimit.limit(ip);

    if (!success) {
      return new Response('Rate limit exceeded', {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
          'X-RateLimit-Reset': reset.toString(),
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
        },
      });
    }

    return fetch(req);
  },
};

What Are the Common Rate Limiting Pitfalls?

Trusting raw client IP behind a proxy. req.ip behind a CDN or load balancer is the proxy IP, not the client. Configure trust proxy explicitly (Express: app.set('trust proxy', 'loopback, linklocal, uniquelocal')) and only honor X-Forwarded-For from known infrastructure — otherwise attackers spoof it to evade limits.

Non-atomic counter updates. A read-modify-write outside Redis (e.g., SQL SELECT then UPDATE) races under concurrency. Use INCR, Lua scripts, or Durable Objects for atomicity.

Boundary bursts on fixed windows. A naive 1-minute fixed window lets a client send the full quota in the last second of one window and the first second of the next — 2× the limit in 2 seconds. Use sliding window counter or token bucket for steady throughput.

Hard-blocking instead of returning 429 with Retry-After. Closing the connection or returning 503 leaves clients with no signal to back off. Always return 429 Too Many Requests with Retry-After so well-behaved clients pause instead of retry-storming.

How Should You Layer Rate Limits in Production?

  • Token bucket is the best general-purpose algorithm — it allows bursts while enforcing an average rate
  • Sliding window counter offers the best accuracy-to-memory ratio for high-scale distributed systems
  • • Use Redis for centralized rate limiting and Durable Objects or Upstash for edge runtimes — pick by where your compute runs
  • • Always return X-RateLimit-* headers and 429 status codes with Retry-After so clients can back off gracefully
  • • Apply rate limits at the API gateway layer for per-IP throttling and at the application layer for per-user/tier limits
  • • Use different limits per endpoint — a health check and a database-heavy search should not share the same budget
  • • Rate limiting protects against per-identity abuse; volumetric L3/L4 floods need DDoS scrubbing or WAF at the edge, not application counters

References

Was this helpful?

Read next

Env Variables Security: Secrets, Leaks & Best Practices

Why environment variables are not truly secure and what to do about it: secret rotation, leak detection, client-side risk, and secrets managers.

Continue →

Frequently Asked Questions

What is rate limiting?

Rate limiting controls how many requests a client can make in a given time period. It protects APIs from abuse, prevents resource exhaustion, and ensures fair usage. Common limits: 100 requests per minute per API key.

Which rate limiting algorithm should I use?

Token bucket is the most popular — it allows bursts while enforcing an average rate. Sliding window is more precise but uses more memory. Fixed window is simplest but allows burst at window boundaries.

How do I implement rate limiting with Redis?

Use Redis INCR with EXPIRE for fixed window, or sorted sets with ZADD/ZRANGEBYSCORE for sliding window. Redis is ideal because it is fast, atomic, and shared across multiple application instances.

How do I rate limit on Cloudflare Workers or edge functions?

Use Cloudflare Durable Objects for strongly consistent per-key counters with atomic increments, or the @upstash/ratelimit package backed by Upstash Redis REST for centralized counters with regional replicas. A single Redis primary in one region adds cross-continent latency to every edge check, so co-locate state with compute or accept eventual consistency.

What is the difference between rate limiting and DDoS protection?

Rate limiting throttles requests per identity (IP, API key, user) at the application or gateway layer to prevent abuse and ensure fair use. DDoS protection handles volumetric L3/L4 floods and bot traffic at the edge using scrubbing, WAF rules, and anycast — application counters cannot absorb a 100 Gbps SYN flood. The two are complementary: edge DDoS mitigation keeps the pipe open; rate limiting governs what gets through.

Stay up to date

Get notified about new guides, tools, and cheatsheets.