env.dev

Regex Patterns Cookbook: Common Patterns with Explanations

A cookbook of common regex patterns for email, URL, IP address, phone number, and date validation with detailed explanations.

Last updated:

Regular expressions (regex) are pattern-matching sequences used to search, validate, and transform text. Every major programming language supports them — JavaScript via RegExp, Python via the re module, and PCRE-compatible engines in PHP, Ruby, Java, Go, and Rust. This cookbook provides production-ready patterns for the most common validation tasks, with explanations of how each pattern works and the regex concepts behind them. Test patterns live in the regex tester or skim the regex cheat sheet for syntax reference.

How Do You Validate Email Addresses with Regex?

Email validation is famously difficult — the full RFC 5322 grammar cannot be captured by a regular expression. The pattern below covers 99.9% of real-world email addresses. For strict validation, always send a confirmation email instead of relying solely on regex.

Email validation
// Practical email regex (covers 99.9% of real addresses)
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

emailRegex.test('user@example.com');      // true
emailRegex.test('name+tag@sub.domain.co'); // true
emailRegex.test('invalid@.com');           // false
emailRegex.test('@missing-local.com');     // false

// Breakdown:
// ^                     Start of string
// [a-zA-Z0-9._%+-]+    Local part: letters, digits, dots, underscores, %, +, -
// @                     Literal @ sign
// [a-zA-Z0-9.-]+       Domain: letters, digits, dots, hyphens
// \.                   Literal dot before TLD
// [a-zA-Z]{2,}         TLD: at least 2 letters
// $                     End of string

How Do You Match URLs with Regex?

URL validation
// Match HTTP/HTTPS URLs
const urlRegex = /^https?:\/\/([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(\/[^\s]*)?$/;

urlRegex.test('https://example.com');           // true
urlRegex.test('http://sub.domain.co.uk/path');  // true
urlRegex.test('ftp://not-http.com');            // false

// For extracting URLs from text (non-anchored)
const urlExtract = /https?:\/\/[^\s]+/g;
const text = 'Visit https://example.com or http://docs.dev/api for details';
text.match(urlExtract);
// → ['https://example.com', 'http://docs.dev/api']

How Do You Validate IP Addresses?

IPv4 and IPv6 validation
// IPv4: four octets 0-255 separated by dots
const ipv4Regex = /^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/;

ipv4Regex.test('192.168.1.1');   // true
ipv4Regex.test('10.0.0.255');    // true
ipv4Regex.test('256.1.1.1');     // false
ipv4Regex.test('192.168.1');     // false

// Breakdown of each octet group:
// 25[0-5]     → 250-255
// 2[0-4]\d   → 200-249
// [01]?\d\d? → 0-199

// Simplified IPv6 (full form, 8 groups of 4 hex digits)
const ipv6Regex = /^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$/;

ipv6Regex.test('2001:0db8:85a3:0000:0000:8a2e:0370:7334'); // true

How Do You Match Phone Numbers and Dates?

Phone number patterns
// US phone numbers (multiple formats)
const usPhoneRegex = /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;

usPhoneRegex.test('+1 (555) 123-4567');  // true
usPhoneRegex.test('555-123-4567');        // true
usPhoneRegex.test('5551234567');          // true

// International E.164 format
const e164Regex = /^\+[1-9]\d{6,14}$/;

e164Regex.test('+14155551234');  // true
e164Regex.test('+442071234567'); // true
Date patterns
// ISO 8601 date (YYYY-MM-DD)
const isoDateRegex = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;

isoDateRegex.test('2024-01-15'); // true
isoDateRegex.test('2024-13-01'); // false (month 13)
isoDateRegex.test('2024-1-5');   // false (no leading zero)

// US date format (MM/DD/YYYY)
const usDateRegex = /^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$/;

usDateRegex.test('01/15/2024');  // true
usDateRegex.test('12/31/2024');  // true

What Are Lookahead and Lookbehind Assertions?

Lookarounds match a position in the string without consuming characters. They are zero-width assertions — they check a condition but do not include the matched text in the result.

TypeSyntaxMeaning
Positive lookahead(?=...)Followed by ...
Negative lookahead(?!...)NOT followed by ...
Positive lookbehind(?<=...)Preceded by ...
Negative lookbehind(?<!...)NOT preceded by ...
Lookaround examples
// Password strength: at least 8 chars, 1 uppercase, 1 lowercase, 1 digit
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/;

strongPassword.test('MyPass1234');  // true
strongPassword.test('alllowercase'); // false (no uppercase or digit)

// Extract price amounts without the $ symbol (lookbehind)
const prices = 'Items cost $19.99 and $5.50';
prices.match(/(?<=\$)\d+\.\d{2}/g);
// → ['19.99', '5.50']

// Match word NOT followed by a specific word (negative lookahead)
const noTestFiles = /\w+(?!\.test)\.ts$/;
noTestFiles.test('utils.ts');      // true
noTestFiles.test('utils.test.ts'); // false

What Is the Difference Between Greedy and Lazy Matching?

By default, quantifiers (*, +, ?) are greedy — they match as much text as possible. Adding ? after the quantifier makes it lazy — it matches as little as possible.

Greedy vs lazy
const html = '<b>bold</b> and <i>italic</i>';

// Greedy: matches from first < to LAST >
html.match(/<.*>/);
// → ['<b>bold</b> and <i>italic</i>']

// Lazy: matches from first < to NEXT >
html.match(/<.*?>/g);
// → ['<b>', '</b>', '<i>', '</i>']

// Practical example: extract HTML tag contents
html.match(/<b>(.*?)<\/b>/);
// → ['<b>bold</b>', 'bold']  (group 1 = 'bold')

How Do Named Capture Groups Work?

Named groups assign a readable label to captured matches using (?<name>...) syntax. They make regex patterns self-documenting and the results easier to work with.

Named capture groups
// Parse a date string with named groups
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2024-01-15'.match(dateRegex);

console.log(match?.groups?.year);  // '2024'
console.log(match?.groups?.month); // '01'
console.log(match?.groups?.day);   // '15'

// Parse a log line
const logRegex = /(?<timestamp>[\d:.]+) \[(?<level>\w+)] (?<message>.+)/;
const log = '14:23:01.456 [ERROR] Connection refused to db-primary';
const parsed = log.match(logRegex);

console.log(parsed?.groups?.level);   // 'ERROR'
console.log(parsed?.groups?.message); // 'Connection refused to db-primary'

// Use in string replacement
const isoDate = '2024-01-15';
isoDate.replace(dateRegex, '$<month>/$<day>/$<year>');
// → '01/15/2024'

What Is Catastrophic Backtracking (ReDoS)?

Catastrophic backtracking — also known as ReDoS (Regular expression Denial of Service) — happens when a pattern with nested quantifiers forces the engine to explore an exponential number of paths before failing. A 30-character malicious input can hang a server thread for seconds or minutes. Cloudflare took down its global edge in July 2019 with a single regex containing .*(?:.*=.*); the post-mortem is required reading.

Vulnerable pattern (do not ship this)
// Nested quantifier — every "a" can be matched by either +
const evil = /^(a+)+$/;

// Linear input → instant
evil.test('aaaaaaaaaaaaaaaaaaaaaaaaa');           // true, fast

// Add a single non-matching character at the end
evil.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!');   // hangs ~seconds
evil.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!'); // hangs ~minutes

// Why? The engine tries every partition of the a's between the
// inner + and the outer + before declaring "no match". That is
// 2^n combinations — exponential in input length.

V8 (Node.js, browsers) has no built-in ReDoS protection. A bad pattern blocks the event loop. .NET, Go (RE2), and Rust use linear-time engines that are immune; Python and Java are vulnerable like JavaScript. Defenses, in order of preference:

Fixes
// 1. Validate input length first — cheap and effective
if (input.length > 256) return false;

// 2. Anchor and avoid overlapping alternation
//    Bad:  /^(a|a)+$/   — both branches match the same character
//    Good: /^a+$/       — single, deterministic quantifier
const safe = /^a+$/;

// 3. Use a possessive quantifier or atomic group (PCRE / Python 3.11+)
//    These prevent the engine from backtracking into the group:
//    ^(?>a+)+$           — atomic group, no exponential blow-up
//    ^a++$               — possessive quantifier (same effect)
//    NOTE: JavaScript supports neither. Use linear refactor instead.

// 4. Run a static analyser in CI — eslint-plugin-security has
//    detect-unsafe-regex, and safe-regex catches the obvious cases.

// 5. For untrusted input on the server, run regex in a worker with
//    a timeout, or use a linear-time library (e.g. node-re2).

Common pitfalls

  • Forgetting to anchor with ^ and $ — without anchors, /\d{4}/.test('abc12345xyz') is true. For validation always anchor; for extraction use the global flag instead.
  • Using . when you mean [^/] /api/.+/users greedily swallows extra slashes. Be specific: /api/[^/]+/users.
  • Mixing PCRE-only features in JavaScript — atomic groups (?>...), possessive quantifiers a++, and Python's (?P<name>...) all throw SyntaxError in V8. JS uses (?<name>...) for named groups.
  • Not escaping . in TLDs /example.com/ matches examplezcom. Write /example\.com/.

References

How Should You Use Regex Safely?

  • • For email and URL validation, use a practical pattern and complement it with server-side verification
  • • Use named capture groups to make complex patterns readable and maintainable
  • • Prefer lazy quantifiers (*?, +?) when matching delimited content like HTML tags or quoted strings
  • Lookaheads are useful for password validation rules and conditional matching without consuming characters
  • • Always anchor patterns with ^ and $ when validating entire strings to prevent partial matches
  • • Test regex patterns interactively with regex101.com or the env.dev regex tester for real-time explanation of each token
Was this helpful?

Read next

WebSockets: Guide to Real-Time Communication

Master WebSockets: the RFC 6455 protocol, handshake, browser API, server implementations in Node.js, Python, and Go, reconnection strategies, heartbeats, security best practices, pub/sub patterns, and scaling in production.

Continue →

Frequently Asked Questions

Should I use regex for email validation?

A basic regex can catch obvious typos, but fully validating email addresses with regex is nearly impossible (RFC 5322 is very complex). Use a simple pattern like ^[^@]+@[^@]+\.[^@]+$ and verify with a confirmation email.

How do I match a URL with regex?

A practical pattern: https?://[\w.-]+(?:/[\w./?%&=-]*)? catches most URLs. For strict RFC compliance, use a URL parser library instead. Regex is best for "good enough" extraction from text.

What is the difference between greedy and lazy matching?

Greedy (default) matches as much as possible: .* matches everything. Lazy (add ?) matches as little as possible: .*? stops at the first opportunity. Use lazy matching when you want the shortest match.

What is catastrophic backtracking and how do I avoid it?

Catastrophic backtracking (ReDoS) happens when nested quantifiers like (a+)+ force the engine into exponential-time backtracking on non-matching input. Avoid it by removing nested quantifiers, anchoring patterns with ^ and $, validating input length first, and using atomic groups (?>...) or possessive quantifiers in PCRE. V8/Node.js has no built-in ReDoS protection — bad patterns will block the event loop.

Are JavaScript and Python regex syntaxes the same?

Mostly, but not exactly. JavaScript (ECMAScript) and Python (PCRE-like) both support character classes, quantifiers, lookaheads, and named groups, but Python uses (?P<name>...) for named groups while modern JS uses (?<name>...). Python supports variable-width lookbehind; JS lookbehind is fixed-width in older engines (V8 supports variable-width since 2018). PCRE/Python also has atomic groups and possessive quantifiers, which JS lacks entirely.

Stay up to date

Get notified about new guides, tools, and cheatsheets.