PostgreSQL is the most advanced open-source relational database. This cheatsheet covers essential SQL commands, JOINs with visual diagrams, subquery vs JOIN alternatives, CTEs, window functions, indexing strategies, JSON operations, and performance gotchas. Designed for developers who need a fast, accurate reference for daily PostgreSQL work — from basic CRUD to advanced query patterns.
Connecting & Database Management
| Command | Description |
|---|---|
| psql -U user -d dbname -h host | Connect to a database |
| psql -U user -d dbname -c "SQL" | Run a single SQL command |
| \l | List all databases |
| \c dbname | Switch to a database |
| \dt | List tables in current schema |
| \d tablename | Describe table structure |
| \dn | List schemas |
| \du | List roles/users |
| \q | Quit psql |
| CREATE DATABASE mydb; | Create a new database |
| DROP DATABASE mydb; | Delete a database (irreversible) |
CRUD Operations
| Operation | Syntax |
|---|---|
| INSERT | INSERT INTO users (name, email) VALUES ('Ada', 'ada@example.com'); |
| INSERT multiple | INSERT INTO users (name) VALUES ('Ada'), ('Bob'), ('Eve'); |
| INSERT ... RETURNING | INSERT INTO users (name) VALUES ('Ada') RETURNING id, name; |
| SELECT | SELECT name, email FROM users WHERE active = true; |
| UPDATE | UPDATE users SET name = 'Ada L' WHERE id = 1; |
| UPDATE ... RETURNING | UPDATE users SET name = 'Ada L' WHERE id = 1 RETURNING *; |
| UPSERT | INSERT INTO users (id, name) VALUES (1, 'Ada') ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name; |
| DELETE | DELETE FROM users WHERE id = 1; |
| TRUNCATE | TRUNCATE TABLE users; -- faster than DELETE, no row-level triggers |
Gotcha
TRUNCATE resets sequences and fires no row-level triggers. It also acquires an ACCESS EXCLUSIVE lock. Use DELETE if you need triggers to fire or need row-level WHERE filtering.
Table & Schema Management
| Command | Description |
|---|---|
| CREATE TABLE t (id SERIAL PRIMARY KEY, name TEXT NOT NULL); | Create table with auto-increment PK |
| CREATE TABLE t (id UUID DEFAULT gen_random_uuid() PRIMARY KEY); | UUID primary key (pg 13+) |
| ALTER TABLE t ADD COLUMN email TEXT; | Add a column |
| ALTER TABLE t DROP COLUMN email; | Remove a column |
| ALTER TABLE t RENAME COLUMN old TO new; | Rename a column |
| ALTER TABLE t ALTER COLUMN name SET NOT NULL; | Add NOT NULL constraint |
| ALTER TABLE t ADD CONSTRAINT uq UNIQUE (email); | Add unique constraint |
| DROP TABLE t; | Drop a table |
| DROP TABLE IF EXISTS t CASCADE; | Drop table + dependents |
| CREATE SCHEMA app; SET search_path TO app, public; | Create and use a schema |
Gotcha
Use IDENTITY columns instead of SERIAL for new tables: id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY. SERIAL creates an implicit sequence with separate permissions. IDENTITY is SQL-standard and safer.
How Do JOINs Work in PostgreSQL?
JOINs combine rows from two or more tables based on a related column. The type of JOIN determines which rows are included when there is no match.
INNER
LEFT
RIGHT
FULL
CROSS
| JOIN Type | Returns | Example |
|---|---|---|
| INNER JOIN | Only matching rows in both tables | SELECT * FROM orders o INNER JOIN users u ON o.user_id = u.id; |
| LEFT JOIN | All rows from left + matches from right (NULLs if no match) | SELECT * FROM users u LEFT JOIN orders o ON u.id = o.user_id; |
| RIGHT JOIN | All rows from right + matches from left | SELECT * FROM orders o RIGHT JOIN users u ON u.id = o.user_id; |
| FULL OUTER JOIN | All rows from both (NULLs where no match) | SELECT * FROM users u FULL OUTER JOIN orders o ON u.id = o.user_id; |
| CROSS JOIN | Cartesian product (every combination) | SELECT * FROM sizes CROSS JOIN colors; |
| LATERAL JOIN | Subquery can reference preceding tables | SELECT * FROM users u, LATERAL (SELECT * FROM orders WHERE user_id = u.id LIMIT 3) o; |
Gotcha
LEFT JOIN with a WHERE clause on the right table effectively becomes an INNER JOIN. Move the filter into the ON clause instead: LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'active'.
Subquery vs JOIN vs CTE — Same Data, Different Approaches
These three approaches often produce identical results. Choose based on readability, performance, and whether you need to reuse the result.
Approach 1: Subquery (in WHERE)
Approach 2: JOIN
Approach 3: CTE (Common Table Expression)
| Approach | Best When | Watch Out |
|---|---|---|
| Subquery in WHERE | Simple existence/membership checks | IN with large sets can be slow — use EXISTS instead |
| JOIN | You need columns from both tables | Can produce duplicates — use DISTINCT or aggregate |
| CTE | Complex queries with reusable intermediate results | Before pg 12, CTEs were always materialized (optimization fence) |
| EXISTS subquery | Checking existence without caring about values | Fastest for "does any matching row exist?" — stops at first match |
| Correlated subquery | Row-by-row dependent logic | Runs once per outer row — can be very slow on large tables |
EXISTS vs IN — prefer EXISTS for large datasets
Filtering, Sorting & Pagination
| Pattern | Example |
|---|---|
| WHERE ... AND / OR | WHERE age > 18 AND status = 'active' |
| BETWEEN | WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31' |
| IN | WHERE status IN ('active', 'pending') |
| LIKE / ILIKE | WHERE name ILIKE '%ada%' -- case-insensitive |
| IS NULL / IS NOT NULL | WHERE deleted_at IS NULL |
| ORDER BY | ORDER BY created_at DESC NULLS LAST |
| LIMIT + OFFSET | LIMIT 20 OFFSET 40 -- page 3 |
| Keyset pagination | WHERE created_at < '2025-06-01' ORDER BY created_at DESC LIMIT 20 |
Gotcha
OFFSET-based pagination gets slower as offset grows because PostgreSQL must scan and discard rows. For large datasets, use keyset (cursor) pagination: WHERE id > last_seen_id ORDER BY id LIMIT 20.
Aggregation & Grouping
| Function | Description | Example |
|---|---|---|
| COUNT(*) | Count all rows | SELECT COUNT(*) FROM orders; |
| COUNT(DISTINCT col) | Count unique values | SELECT COUNT(DISTINCT user_id) FROM orders; |
| SUM(col) | Sum of values | SELECT SUM(total) FROM orders WHERE status = 'paid'; |
| AVG(col) | Average value | SELECT AVG(total) FROM orders; |
| MIN(col) / MAX(col) | Minimum/maximum | SELECT MIN(created_at), MAX(created_at) FROM orders; |
| ARRAY_AGG(col) | Collect into array | SELECT user_id, ARRAY_AGG(product) FROM orders GROUP BY user_id; |
| STRING_AGG(col, sep) | Concatenate strings | SELECT STRING_AGG(name, ', ') FROM users; |
| GROUP BY | Group rows for aggregation | SELECT status, COUNT(*) FROM orders GROUP BY status; |
| HAVING | Filter groups (not rows) | SELECT user_id, COUNT(*) FROM orders GROUP BY user_id HAVING COUNT(*) > 5; |
Gotcha
WHERE filters rows before grouping. HAVING filters groups after aggregation. Don't use HAVING for conditions that could go in WHERE — it's less efficient.
What Are Window Functions?
Window functions compute a value across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY does. They use the OVER() clause.
| Function | Description | Example |
|---|---|---|
| ROW_NUMBER() | Unique sequential number per partition | ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) |
| RANK() | Rank with gaps on ties | RANK() OVER (ORDER BY score DESC) |
| DENSE_RANK() | Rank without gaps on ties | DENSE_RANK() OVER (ORDER BY score DESC) |
| LAG(col, n) | Value from n rows before | LAG(revenue, 1) OVER (ORDER BY month) -- previous month |
| LEAD(col, n) | Value from n rows after | LEAD(revenue, 1) OVER (ORDER BY month) -- next month |
| SUM() OVER() | Running total | SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) |
| AVG() OVER() | Moving average | AVG(amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) |
| NTILE(n) | Split into n roughly equal buckets | NTILE(4) OVER (ORDER BY score) -- quartiles |
| FIRST_VALUE / LAST_VALUE | First/last in window frame | FIRST_VALUE(name) OVER (PARTITION BY dept ORDER BY salary DESC) |
Top-N per group using ROW_NUMBER()
How Should You Index PostgreSQL Tables?
| Index Type | Use Case | Syntax |
|---|---|---|
| B-tree (default) | Equality and range queries (=, <, >, BETWEEN) | CREATE INDEX idx ON t (col); |
| Unique | Enforce uniqueness + fast lookups | CREATE UNIQUE INDEX idx ON t (email); |
| Composite | Queries filtering on multiple columns | CREATE INDEX idx ON t (user_id, created_at); |
| Partial | Index only a subset of rows | CREATE INDEX idx ON orders (user_id) WHERE status = 'active'; |
| GIN | Full-text search, JSONB, arrays | CREATE INDEX idx ON t USING GIN (tags); |
| GiST | Geometric, range, full-text (proximity) | CREATE INDEX idx ON t USING GiST (location); |
| BRIN | Very large tables with natural order (e.g. timestamps) | CREATE INDEX idx ON logs USING BRIN (created_at); |
| Expression | Index a computed value | CREATE INDEX idx ON t (LOWER(email)); |
| Covering (INCLUDE) | Add non-key columns to avoid table lookup | CREATE INDEX idx ON t (user_id) INCLUDE (name, email); |
| Concurrent | Create without locking writes | CREATE INDEX CONCURRENTLY idx ON t (col); |
Gotcha
Composite index column order matters. An index on (a, b, c) supports queries on (a), (a, b), and (a, b, c) — but NOT (b) or (c) alone. Put the most selective or most frequently filtered column first.
Gotcha
Unused indexes slow down writes and waste disk. Check with: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
Working with JSONB
| Operator / Function | Description | Example |
|---|---|---|
| -> | Get JSON object field (as JSON) | data -> 'name' |
| ->> | Get JSON field as text | data ->> 'name' |
| #> | Get nested field (as JSON) | data #> '{address,city}' |
| #>> | Get nested field as text | data #>> '{address,city}' |
| @> | Contains (for indexing) | WHERE data @> '{"role": "admin"}' |
| ? | Key exists | WHERE data ? 'email' |
| jsonb_set() | Update a nested field | jsonb_set(data, '{name}', '"Ada"') |
| jsonb_build_object() | Build JSONB from key-value pairs | jsonb_build_object('id', 1, 'name', 'Ada') |
| jsonb_agg() | Aggregate rows to JSONB array | SELECT jsonb_agg(row_to_json(u)) FROM users u; |
| jsonb_each_text() | Expand JSONB to key-value rows | SELECT * FROM jsonb_each_text(data); |
| jsonb_array_elements() | Expand JSONB array to rows | SELECT * FROM jsonb_array_elements('[1,2,3]'); |
Gotcha
Use JSONB, not JSON. JSON stores raw text and re-parses on every access. JSONB is binary, indexable with GIN, and supports containment operators (@>). The only reason to use JSON is if you need to preserve exact whitespace/key ordering.
Transactions & Locking
| Command | Description |
|---|---|
| BEGIN; | Start a transaction |
| COMMIT; | Save all changes |
| ROLLBACK; | Undo all changes since BEGIN |
| SAVEPOINT sp1; | Create a savepoint within a transaction |
| ROLLBACK TO sp1; | Undo back to the savepoint |
| SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; | Strictest isolation (prevents phantom reads) |
| SELECT ... FOR UPDATE; | Lock selected rows until COMMIT |
| SELECT ... FOR UPDATE SKIP LOCKED; | Skip already-locked rows (great for job queues) |
| SELECT ... FOR UPDATE NOWAIT; | Error immediately if rows are locked |
| LOCK TABLE t IN ACCESS EXCLUSIVE MODE; | Full table lock (use sparingly) |
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted* | No | Yes | Yes |
| Read Committed (default) | No | Yes | Yes |
| Repeatable Read | No | No | No** |
| Serializable | No | No | No |
* PostgreSQL treats Read Uncommitted as Read Committed. ** In PostgreSQL, REPEATABLE READ is implemented as snapshot isolation and is stronger than the SQL-standard minimum, but true serializable semantics (using SSI) are provided by the SERIALIZABLE level.
Performance & EXPLAIN
| Command | Description |
|---|---|
| EXPLAIN SELECT ...; | Show query plan (estimated) |
| EXPLAIN ANALYZE SELECT ...; | Run query and show actual timings |
| EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...; | Detailed plan with buffer/IO stats |
| VACUUM ANALYZE tablename; | Reclaim space + update statistics |
| SELECT pg_size_pretty(pg_total_relation_size('t')); | Total size of table including indexes |
| SELECT * FROM pg_stat_user_tables; | Table-level stats (seq scans, inserts, etc.) |
| SELECT * FROM pg_stat_activity; | Currently running queries and connections |
| SELECT pg_cancel_backend(pid); | Cancel a running query |
| SELECT pg_terminate_backend(pid); | Kill a connection |
Gotcha
EXPLAIN ANALYZE actually runs the query — including any writes from UPDATE/DELETE. Wrap destructive queries in a transaction: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;
Common Data Types
| Type | Description | When to Use |
|---|---|---|
| INTEGER / BIGINT | 4-byte / 8-byte signed integer | IDs, counts — use BIGINT for tables that may exceed 2B rows |
| NUMERIC(p,s) | Exact decimal with precision | Money, financial data — never use FLOAT for money |
| TEXT | Variable-length string (unlimited) | Most string data — no performance penalty vs VARCHAR(n) |
| VARCHAR(n) | Variable-length with max | Only when you need to enforce a max length |
| BOOLEAN | true / false / NULL | Flags, toggles |
| TIMESTAMPTZ | Timestamp with time zone | Almost always — use this over TIMESTAMP |
| UUID | 128-bit universally unique ID | Distributed systems, public-facing IDs |
| JSONB | Binary JSON (indexable) | Semi-structured data, flexible schemas |
| TEXT[] | Array of text | Tags, labels — consider a join table for large sets |
| INET / CIDR | IPv4/IPv6 addresses | Networking data — supports containment operators |
| TSTZRANGE | Time range with timezone | Booking systems, schedules — supports overlap checks |
Gotcha
Always use TIMESTAMPTZ (with time zone), not TIMESTAMP. PostgreSQL stores TIMESTAMPTZ in UTC internally and converts on display. Plain TIMESTAMP has no zone info and leads to subtle bugs across time zones.
Useful Patterns & Recipes
| Pattern | SQL |
|---|---|
| Conditional count | SELECT COUNT(*) FILTER (WHERE status = 'paid') AS paid, COUNT(*) FILTER (WHERE status = 'pending') AS pending FROM orders; |
| Coalesce NULL | SELECT COALESCE(nickname, name, 'Anonymous') FROM users; |
| Generate series | SELECT generate_series('2025-01-01'::date, '2025-12-01', '1 month') AS month; |
| Lateral join (top-N per group) | SELECT * FROM depts d, LATERAL (SELECT * FROM emps WHERE dept_id = d.id ORDER BY salary DESC LIMIT 3) e; |
| Deduplicate rows | DELETE FROM t WHERE ctid NOT IN (SELECT MIN(ctid) FROM t GROUP BY unique_cols); |
| Bulk update from values | UPDATE t SET name = v.name FROM (VALUES (1,'Ada'),(2,'Bob')) AS v(id,name) WHERE t.id = v.id; |
| Recursive CTE (tree walk) | WITH RECURSIVE tree AS (SELECT * FROM nodes WHERE parent_id IS NULL UNION ALL SELECT n.* FROM nodes n JOIN tree t ON n.parent_id = t.id) SELECT * FROM tree; |
| Select into temp table | CREATE TEMP TABLE active_users AS SELECT * FROM users WHERE active = true; |
Frequently Asked Questions
What is the difference between VARCHAR and TEXT in PostgreSQL?
There is no performance difference between VARCHAR(n) and TEXT in PostgreSQL. Both are stored the same way internally. TEXT is generally preferred because it has no arbitrary length limit. Use VARCHAR(n) only when you need to enforce a maximum length as a data integrity constraint.
When should I use a subquery vs a JOIN?
Use a JOIN when you need columns from both tables in the result set. Use a subquery (especially EXISTS) when you only need to check for the existence of related rows. PostgreSQL's optimizer often rewrites one into the other, but EXISTS short-circuits and can be faster than IN for large datasets.
How do I improve slow queries in PostgreSQL?
Start with EXPLAIN ANALYZE to see the actual query plan. Look for sequential scans on large tables (add indexes), high row estimates vs actuals (run ANALYZE), and nested loops over large sets (consider hash joins). Common fixes: add targeted indexes, rewrite correlated subqueries as JOINs, use keyset pagination instead of OFFSET, and ensure your statistics are up to date with VACUUM ANALYZE.
Should I use SERIAL or IDENTITY for auto-incrementing IDs?
Use IDENTITY (GENERATED ALWAYS AS IDENTITY) for new tables. It is the SQL standard, has cleaner permission handling, and prevents accidental manual inserts. SERIAL creates a separate sequence with independent permissions, which can lead to subtle issues. IDENTITY has been available since PostgreSQL 10.
What is the difference between JSON and JSONB in PostgreSQL?
JSONB stores data in a binary format that is faster to process, supports indexing with GIN indexes, and allows containment queries with the @> operator. JSON stores the raw text and must be re-parsed on every access. Always use JSONB unless you need to preserve exact whitespace or duplicate keys.
How do CTEs work and when should I use them?
CTEs (WITH clauses) define named temporary result sets within a query. They improve readability for complex queries and are essential for recursive queries (WITH RECURSIVE). Since PostgreSQL 12, non-recursive CTEs are inlined by the optimizer, so they perform the same as subqueries. Use them whenever they make the query clearer.