Regular expressions (regex) are pattern-matching sequences used to search, validate, and transform text. Every major programming language supports them — JavaScript via RegExp, Python via the re module, and PCRE-compatible engines in PHP, Ruby, Java, Go, and Rust. This cookbook provides production-ready patterns for the most common validation tasks, with explanations of how each pattern works and the regex concepts behind them. Test patterns live in the regex tester or skim the regex cheat sheet for syntax reference.
How Do You Validate Email Addresses with Regex?
Email validation is famously difficult — the full RFC 5322 grammar cannot be captured by a regular expression. The pattern below covers 99.9% of real-world email addresses. For strict validation, always send a confirmation email instead of relying solely on regex.
// Practical email regex (covers 99.9% of real addresses)
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
emailRegex.test('user@example.com'); // true
emailRegex.test('name+tag@sub.domain.co'); // true
emailRegex.test('invalid@.com'); // false
emailRegex.test('@missing-local.com'); // false
// Breakdown:
// ^ Start of string
// [a-zA-Z0-9._%+-]+ Local part: letters, digits, dots, underscores, %, +, -
// @ Literal @ sign
// [a-zA-Z0-9.-]+ Domain: letters, digits, dots, hyphens
// \. Literal dot before TLD
// [a-zA-Z]{2,} TLD: at least 2 letters
// $ End of stringHow Do You Match URLs with Regex?
// Match HTTP/HTTPS URLs
const urlRegex = /^https?:\/\/([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(\/[^\s]*)?$/;
urlRegex.test('https://example.com'); // true
urlRegex.test('http://sub.domain.co.uk/path'); // true
urlRegex.test('ftp://not-http.com'); // false
// For extracting URLs from text (non-anchored)
const urlExtract = /https?:\/\/[^\s]+/g;
const text = 'Visit https://example.com or http://docs.dev/api for details';
text.match(urlExtract);
// → ['https://example.com', 'http://docs.dev/api']How Do You Validate IP Addresses?
// IPv4: four octets 0-255 separated by dots
const ipv4Regex = /^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/;
ipv4Regex.test('192.168.1.1'); // true
ipv4Regex.test('10.0.0.255'); // true
ipv4Regex.test('256.1.1.1'); // false
ipv4Regex.test('192.168.1'); // false
// Breakdown of each octet group:
// 25[0-5] → 250-255
// 2[0-4]\d → 200-249
// [01]?\d\d? → 0-199
// Simplified IPv6 (full form, 8 groups of 4 hex digits)
const ipv6Regex = /^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$/;
ipv6Regex.test('2001:0db8:85a3:0000:0000:8a2e:0370:7334'); // trueHow Do You Match Phone Numbers and Dates?
// US phone numbers (multiple formats)
const usPhoneRegex = /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
usPhoneRegex.test('+1 (555) 123-4567'); // true
usPhoneRegex.test('555-123-4567'); // true
usPhoneRegex.test('5551234567'); // true
// International E.164 format
const e164Regex = /^\+[1-9]\d{6,14}$/;
e164Regex.test('+14155551234'); // true
e164Regex.test('+442071234567'); // true// ISO 8601 date (YYYY-MM-DD)
const isoDateRegex = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;
isoDateRegex.test('2024-01-15'); // true
isoDateRegex.test('2024-13-01'); // false (month 13)
isoDateRegex.test('2024-1-5'); // false (no leading zero)
// US date format (MM/DD/YYYY)
const usDateRegex = /^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$/;
usDateRegex.test('01/15/2024'); // true
usDateRegex.test('12/31/2024'); // trueWhat Are Lookahead and Lookbehind Assertions?
Lookarounds match a position in the string without consuming characters. They are zero-width assertions — they check a condition but do not include the matched text in the result.
| Type | Syntax | Meaning |
|---|---|---|
| Positive lookahead | (?=...) | Followed by ... |
| Negative lookahead | (?!...) | NOT followed by ... |
| Positive lookbehind | (?<=...) | Preceded by ... |
| Negative lookbehind | (?<!...) | NOT preceded by ... |
// Password strength: at least 8 chars, 1 uppercase, 1 lowercase, 1 digit
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/;
strongPassword.test('MyPass1234'); // true
strongPassword.test('alllowercase'); // false (no uppercase or digit)
// Extract price amounts without the $ symbol (lookbehind)
const prices = 'Items cost $19.99 and $5.50';
prices.match(/(?<=\$)\d+\.\d{2}/g);
// → ['19.99', '5.50']
// Match word NOT followed by a specific word (negative lookahead)
const noTestFiles = /\w+(?!\.test)\.ts$/;
noTestFiles.test('utils.ts'); // true
noTestFiles.test('utils.test.ts'); // falseWhat Is the Difference Between Greedy and Lazy Matching?
By default, quantifiers (*, +, ?) are greedy — they match as much text as possible. Adding ? after the quantifier makes it lazy — it matches as little as possible.
const html = '<b>bold</b> and <i>italic</i>';
// Greedy: matches from first < to LAST >
html.match(/<.*>/);
// → ['<b>bold</b> and <i>italic</i>']
// Lazy: matches from first < to NEXT >
html.match(/<.*?>/g);
// → ['<b>', '</b>', '<i>', '</i>']
// Practical example: extract HTML tag contents
html.match(/<b>(.*?)<\/b>/);
// → ['<b>bold</b>', 'bold'] (group 1 = 'bold')How Do Named Capture Groups Work?
Named groups assign a readable label to captured matches using (?<name>...) syntax. They make regex patterns self-documenting and the results easier to work with.
// Parse a date string with named groups
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2024-01-15'.match(dateRegex);
console.log(match?.groups?.year); // '2024'
console.log(match?.groups?.month); // '01'
console.log(match?.groups?.day); // '15'
// Parse a log line
const logRegex = /(?<timestamp>[\d:.]+) \[(?<level>\w+)] (?<message>.+)/;
const log = '14:23:01.456 [ERROR] Connection refused to db-primary';
const parsed = log.match(logRegex);
console.log(parsed?.groups?.level); // 'ERROR'
console.log(parsed?.groups?.message); // 'Connection refused to db-primary'
// Use in string replacement
const isoDate = '2024-01-15';
isoDate.replace(dateRegex, '$<month>/$<day>/$<year>');
// → '01/15/2024'What Is Catastrophic Backtracking (ReDoS)?
Catastrophic backtracking — also known as ReDoS (Regular expression Denial of Service) — happens when a pattern with nested quantifiers forces the engine to explore an exponential number of paths before failing. A 30-character malicious input can hang a server thread for seconds or minutes. Cloudflare took down its global edge in July 2019 with a single regex containing .*(?:.*=.*); the post-mortem is required reading.
// Nested quantifier — every "a" can be matched by either +
const evil = /^(a+)+$/;
// Linear input → instant
evil.test('aaaaaaaaaaaaaaaaaaaaaaaaa'); // true, fast
// Add a single non-matching character at the end
evil.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!'); // hangs ~seconds
evil.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!'); // hangs ~minutes
// Why? The engine tries every partition of the a's between the
// inner + and the outer + before declaring "no match". That is
// 2^n combinations — exponential in input length.V8 (Node.js, browsers) has no built-in ReDoS protection. A bad pattern blocks the event loop. .NET, Go (RE2), and Rust use linear-time engines that are immune; Python and Java are vulnerable like JavaScript. Defenses, in order of preference:
// 1. Validate input length first — cheap and effective
if (input.length > 256) return false;
// 2. Anchor and avoid overlapping alternation
// Bad: /^(a|a)+$/ — both branches match the same character
// Good: /^a+$/ — single, deterministic quantifier
const safe = /^a+$/;
// 3. Use a possessive quantifier or atomic group (PCRE / Python 3.11+)
// These prevent the engine from backtracking into the group:
// ^(?>a+)+$ — atomic group, no exponential blow-up
// ^a++$ — possessive quantifier (same effect)
// NOTE: JavaScript supports neither. Use linear refactor instead.
// 4. Run a static analyser in CI — eslint-plugin-security has
// detect-unsafe-regex, and safe-regex catches the obvious cases.
// 5. For untrusted input on the server, run regex in a worker with
// a timeout, or use a linear-time library (e.g. node-re2).Common pitfalls
- Forgetting to anchor with
^and$— without anchors,/\d{4}/.test('abc12345xyz')istrue. For validation always anchor; for extraction use the global flag instead. - Using
.when you mean[^/]—/api/.+/usersgreedily swallows extra slashes. Be specific:/api/[^/]+/users. - Mixing PCRE-only features in JavaScript — atomic groups
(?>...), possessive quantifiersa++, and Python's(?P<name>...)all throwSyntaxErrorin V8. JS uses(?<name>...)for named groups. - Not escaping
.in TLDs —/example.com/matchesexamplezcom. Write/example\.com/.
References
- • MDN — Regular expressions guide — authoritative reference for the JavaScript
RegExpobject, flags, and assertion syntax. - • Python
remodule documentation — covers PCRE-style features, named groups with(?P<...>), and variable-width lookbehind. - • OWASP — Regular expression Denial of Service (ReDoS) — catalogue of evil-regex patterns, real CVEs, and detection guidance.
- • regex101.com — interactive tester with token-by-token explanation, supports PCRE2, ECMAScript, Python, Go, and Java flavours.
- • regular-expressions.info — deep tutorial on engine internals, backtracking, and flavour differences across languages.
How Should You Use Regex Safely?
- • For email and URL validation, use a practical pattern and complement it with server-side verification
- • Use named capture groups to make complex patterns readable and maintainable
- • Prefer lazy quantifiers (
*?,+?) when matching delimited content like HTML tags or quoted strings - • Lookaheads are useful for password validation rules and conditional matching without consuming characters
- • Always anchor patterns with
^and$when validating entire strings to prevent partial matches - • Test regex patterns interactively with regex101.com or the env.dev regex tester for real-time explanation of each token