PostgreSQL Cheat Sheet

PostgreSQL reference: CRUD, JOINs with visual diagrams, subquery vs JOIN vs CTE, window functions, indexing, JSONB, transactions, and performance tuning.

PostgreSQL is the most advanced open-source relational database. This cheatsheet covers essential SQL commands, JOINs with visual diagrams, subquery vs JOIN alternatives, CTEs, window functions, indexing strategies, JSON operations, and performance gotchas. Designed for developers who need a fast, accurate reference for daily PostgreSQL work — from basic CRUD to advanced query patterns.

Connecting & Database Management

Command	Description
psql -U user -d dbname -h host	Connect to a database
psql -U user -d dbname -c "SQL"	Run a single SQL command
\l	List all databases
\c dbname	Switch to a database
\dt	List tables in current schema
\d tablename	Describe table structure
\dn	List schemas
\du	List roles/users
\q	Quit psql
CREATE DATABASE mydb;	Create a new database
DROP DATABASE mydb;	Delete a database (irreversible)

CRUD Operations

Operation	Syntax
INSERT	INSERT INTO users (name, email) VALUES ('Ada', 'ada@example.com');
INSERT multiple	INSERT INTO users (name) VALUES ('Ada'), ('Bob'), ('Eve');
INSERT ... RETURNING	INSERT INTO users (name) VALUES ('Ada') RETURNING id, name;
SELECT	SELECT name, email FROM users WHERE active = true;
UPDATE	UPDATE users SET name = 'Ada L' WHERE id = 1;
UPDATE ... RETURNING	UPDATE users SET name = 'Ada L' WHERE id = 1 RETURNING *;
UPSERT	INSERT INTO users (id, name) VALUES (1, 'Ada') ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name;
DELETE	DELETE FROM users WHERE id = 1;
TRUNCATE	TRUNCATE TABLE users; -- faster than DELETE, no row-level triggers

Gotcha

TRUNCATE resets sequences and fires no row-level triggers. It also acquires an ACCESS EXCLUSIVE lock. Use DELETE if you need triggers to fire or need row-level WHERE filtering.

Table & Schema Management

Command	Description
CREATE TABLE t (id SERIAL PRIMARY KEY, name TEXT NOT NULL);	Create table with auto-increment PK
CREATE TABLE t (id UUID DEFAULT gen_random_uuid() PRIMARY KEY);	UUID primary key (pg 13+)
ALTER TABLE t ADD COLUMN email TEXT;	Add a column
ALTER TABLE t DROP COLUMN email;	Remove a column
ALTER TABLE t RENAME COLUMN old TO new;	Rename a column
ALTER TABLE t ALTER COLUMN name SET NOT NULL;	Add NOT NULL constraint
ALTER TABLE t ADD CONSTRAINT uq UNIQUE (email);	Add unique constraint
DROP TABLE t;	Drop a table
DROP TABLE IF EXISTS t CASCADE;	Drop table + dependents
CREATE SCHEMA app; SET search_path TO app, public;	Create and use a schema

Gotcha

Use IDENTITY columns instead of SERIAL for new tables: id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY. SERIAL creates an implicit sequence with separate permissions. IDENTITY is SQL-standard and safer.

How Do JOINs Work in PostgreSQL?

JOINs combine rows from two or more tables based on a related column. The type of JOIN determines which rows are included when there is no match.

INNER

LEFT

RIGHT

FULL

CROSS

JOIN Type	Returns	Example
INNER JOIN	Only matching rows in both tables	SELECT * FROM orders o INNER JOIN users u ON o.user_id = u.id;
LEFT JOIN	All rows from left + matches from right (NULLs if no match)	SELECT * FROM users u LEFT JOIN orders o ON u.id = o.user_id;
RIGHT JOIN	All rows from right + matches from left	SELECT * FROM orders o RIGHT JOIN users u ON u.id = o.user_id;
FULL OUTER JOIN	All rows from both (NULLs where no match)	SELECT * FROM users u FULL OUTER JOIN orders o ON u.id = o.user_id;
CROSS JOIN	Cartesian product (every combination)	SELECT * FROM sizes CROSS JOIN colors;
LATERAL JOIN	Subquery can reference preceding tables	SELECT * FROM users u, LATERAL (SELECT * FROM orders WHERE user_id = u.id LIMIT 3) o;

Gotcha

LEFT JOIN with a WHERE clause on the right table effectively becomes an INNER JOIN. Move the filter into the ON clause instead: LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'active'.

Subquery vs JOIN vs CTE — Same Data, Different Approaches

These three approaches often produce identical results. Choose based on readability, performance, and whether you need to reuse the result.

Approach 1: Subquery (in WHERE)

SELECT name FROM users WHERE id IN ( SELECT user_id FROM orders WHERE total > 100 );

Approach 2: JOIN

SELECT DISTINCT u.name FROM users u JOIN orders o ON u.id = o.user_id WHERE o.total > 100;

Approach 3: CTE (Common Table Expression)

WITH big_orders AS ( SELECT DISTINCT user_id FROM orders WHERE total > 100 ) SELECT u.name FROM users u JOIN big_orders bo ON u.id = bo.user_id;

Approach	Best When	Watch Out
Subquery in WHERE	Simple existence/membership checks	IN with large sets can be slow — use EXISTS instead
JOIN	You need columns from both tables	Can produce duplicates — use DISTINCT or aggregate
CTE	Complex queries with reusable intermediate results	Before pg 12, CTEs were always materialized (optimization fence)
EXISTS subquery	Checking existence without caring about values	Fastest for "does any matching row exist?" — stops at first match
Correlated subquery	Row-by-row dependent logic	Runs once per outer row — can be very slow on large tables

EXISTS vs IN — prefer EXISTS for large datasets

-- Slow with large orders table: SELECT name FROM users WHERE id IN (SELECT user_id FROM orders); -- Faster — stops at first match: SELECT name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id );

Filtering, Sorting & Pagination

Pattern	Example
WHERE ... AND / OR	WHERE age > 18 AND status = 'active'
BETWEEN	WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31'
IN	WHERE status IN ('active', 'pending')
LIKE / ILIKE	WHERE name ILIKE '%ada%' -- case-insensitive
IS NULL / IS NOT NULL	WHERE deleted_at IS NULL
ORDER BY	ORDER BY created_at DESC NULLS LAST
LIMIT + OFFSET	LIMIT 20 OFFSET 40 -- page 3
Keyset pagination	WHERE created_at < '2025-06-01' ORDER BY created_at DESC LIMIT 20

Gotcha

OFFSET-based pagination gets slower as offset grows because PostgreSQL must scan and discard rows. For large datasets, use keyset (cursor) pagination: WHERE id > last_seen_id ORDER BY id LIMIT 20.

Aggregation & Grouping

Function	Description	Example
COUNT(*)	Count all rows	SELECT COUNT(*) FROM orders;
COUNT(DISTINCT col)	Count unique values	SELECT COUNT(DISTINCT user_id) FROM orders;
SUM(col)	Sum of values	SELECT SUM(total) FROM orders WHERE status = 'paid';
AVG(col)	Average value	SELECT AVG(total) FROM orders;
MIN(col) / MAX(col)	Minimum/maximum	SELECT MIN(created_at), MAX(created_at) FROM orders;
ARRAY_AGG(col)	Collect into array	SELECT user_id, ARRAY_AGG(product) FROM orders GROUP BY user_id;
STRING_AGG(col, sep)	Concatenate strings	SELECT STRING_AGG(name, ', ') FROM users;
GROUP BY	Group rows for aggregation	SELECT status, COUNT(*) FROM orders GROUP BY status;
HAVING	Filter groups (not rows)	SELECT user_id, COUNT() FROM orders GROUP BY user_id HAVING COUNT() > 5;

Gotcha

WHERE filters rows before grouping. HAVING filters groups after aggregation. Don't use HAVING for conditions that could go in WHERE — it's less efficient.

What Are Window Functions?

Window functions compute a value across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY does. They use the OVER() clause.

Function	Description	Example
ROW_NUMBER()	Unique sequential number per partition	ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
RANK()	Rank with gaps on ties	RANK() OVER (ORDER BY score DESC)
DENSE_RANK()	Rank without gaps on ties	DENSE_RANK() OVER (ORDER BY score DESC)
LAG(col, n)	Value from n rows before	LAG(revenue, 1) OVER (ORDER BY month) -- previous month
LEAD(col, n)	Value from n rows after	LEAD(revenue, 1) OVER (ORDER BY month) -- next month
SUM() OVER()	Running total	SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)
AVG() OVER()	Moving average	AVG(amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
NTILE(n)	Split into n roughly equal buckets	NTILE(4) OVER (ORDER BY score) -- quartiles
FIRST_VALUE / LAST_VALUE	First/last in window frame	FIRST_VALUE(name) OVER (PARTITION BY dept ORDER BY salary DESC)

Top-N per group using ROW_NUMBER()

-- Get the top 3 highest paid employees per department SELECT * FROM ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) AS rn FROM employees ) ranked WHERE rn <= 3;

How Should You Index PostgreSQL Tables?

Index Type	Use Case	Syntax
B-tree (default)	Equality and range queries (=, <, >, BETWEEN)	CREATE INDEX idx ON t (col);
Unique	Enforce uniqueness + fast lookups	CREATE UNIQUE INDEX idx ON t (email);
Composite	Queries filtering on multiple columns	CREATE INDEX idx ON t (user_id, created_at);
Partial	Index only a subset of rows	CREATE INDEX idx ON orders (user_id) WHERE status = 'active';
GIN	Full-text search, JSONB, arrays	CREATE INDEX idx ON t USING GIN (tags);
GiST	Geometric, range, full-text (proximity)	CREATE INDEX idx ON t USING GiST (location);
BRIN	Very large tables with natural order (e.g. timestamps)	CREATE INDEX idx ON logs USING BRIN (created_at);
Expression	Index a computed value	CREATE INDEX idx ON t (LOWER(email));
Covering (INCLUDE)	Add non-key columns to avoid table lookup	CREATE INDEX idx ON t (user_id) INCLUDE (name, email);
Concurrent	Create without locking writes	CREATE INDEX CONCURRENTLY idx ON t (col);

Gotcha

Composite index column order matters. An index on (a, b, c) supports queries on (a), (a, b), and (a, b, c) — but NOT (b) or (c) alone. Put the most selective or most frequently filtered column first.

Gotcha

Unused indexes slow down writes and waste disk. Check with: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Working with JSONB

Operator / Function	Description	Example
->	Get JSON object field (as JSON)	data -> 'name'
->>	Get JSON field as text	data ->> 'name'
#>	Get nested field (as JSON)	data #> '{address,city}'
#>>	Get nested field as text	data #>> '{address,city}'
@>	Contains (for indexing)	WHERE data @> '{"role": "admin"}'
?	Key exists	WHERE data ? 'email'
jsonb_set()	Update a nested field	jsonb_set(data, '{name}', '"Ada"')
jsonb_build_object()	Build JSONB from key-value pairs	jsonb_build_object('id', 1, 'name', 'Ada')
jsonb_agg()	Aggregate rows to JSONB array	SELECT jsonb_agg(row_to_json(u)) FROM users u;
jsonb_each_text()	Expand JSONB to key-value rows	SELECT * FROM jsonb_each_text(data);
jsonb_array_elements()	Expand JSONB array to rows	SELECT * FROM jsonb_array_elements('[1,2,3]');

Gotcha

Use JSONB, not JSON. JSON stores raw text and re-parses on every access. JSONB is binary, indexable with GIN, and supports containment operators (@>). The only reason to use JSON is if you need to preserve exact whitespace/key ordering.

Transactions & Locking

Command	Description
BEGIN;	Start a transaction
COMMIT;	Save all changes
ROLLBACK;	Undo all changes since BEGIN
SAVEPOINT sp1;	Create a savepoint within a transaction
ROLLBACK TO sp1;	Undo back to the savepoint
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;	Strictest isolation (prevents phantom reads)
SELECT ... FOR UPDATE;	Lock selected rows until COMMIT
SELECT ... FOR UPDATE SKIP LOCKED;	Skip already-locked rows (great for job queues)
SELECT ... FOR UPDATE NOWAIT;	Error immediately if rows are locked
LOCK TABLE t IN ACCESS EXCLUSIVE MODE;	Full table lock (use sparingly)

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted*	No	Yes	Yes
Read Committed (default)	No	Yes	Yes
Repeatable Read	No	No	No**
Serializable	No	No	No

* PostgreSQL treats Read Uncommitted as Read Committed. ** In PostgreSQL, REPEATABLE READ is implemented as snapshot isolation and is stronger than the SQL-standard minimum, but true serializable semantics (using SSI) are provided by the SERIALIZABLE level.

Performance & EXPLAIN

Command	Description
EXPLAIN SELECT ...;	Show query plan (estimated)
EXPLAIN ANALYZE SELECT ...;	Run query and show actual timings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;	Detailed plan with buffer/IO stats
VACUUM ANALYZE tablename;	Reclaim space + update statistics
SELECT pg_size_pretty(pg_total_relation_size('t'));	Total size of table including indexes
SELECT * FROM pg_stat_user_tables;	Table-level stats (seq scans, inserts, etc.)
SELECT * FROM pg_stat_activity;	Currently running queries and connections
SELECT pg_cancel_backend(pid);	Cancel a running query
SELECT pg_terminate_backend(pid);	Kill a connection

Gotcha

EXPLAIN ANALYZE actually runs the query — including any writes from UPDATE/DELETE. Wrap destructive queries in a transaction: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;

Common Data Types

Type	Description	When to Use
INTEGER / BIGINT	4-byte / 8-byte signed integer	IDs, counts — use BIGINT for tables that may exceed 2B rows
NUMERIC(p,s)	Exact decimal with precision	Money, financial data — never use FLOAT for money
TEXT	Variable-length string (unlimited)	Most string data — no performance penalty vs VARCHAR(n)
VARCHAR(n)	Variable-length with max	Only when you need to enforce a max length
BOOLEAN	true / false / NULL	Flags, toggles
TIMESTAMPTZ	Timestamp with time zone	Almost always — use this over TIMESTAMP
UUID	128-bit universally unique ID	Distributed systems, public-facing IDs
JSONB	Binary JSON (indexable)	Semi-structured data, flexible schemas
TEXT[]	Array of text	Tags, labels — consider a join table for large sets
INET / CIDR	IPv4/IPv6 addresses	Networking data — supports containment operators
TSTZRANGE	Time range with timezone	Booking systems, schedules — supports overlap checks

Gotcha

Always use TIMESTAMPTZ (with time zone), not TIMESTAMP. PostgreSQL stores TIMESTAMPTZ in UTC internally and converts on display. Plain TIMESTAMP has no zone info and leads to subtle bugs across time zones.

Useful Patterns & Recipes

Pattern	SQL
Conditional count	SELECT COUNT() FILTER (WHERE status = 'paid') AS paid, COUNT() FILTER (WHERE status = 'pending') AS pending FROM orders;
Coalesce NULL	SELECT COALESCE(nickname, name, 'Anonymous') FROM users;
Generate series	SELECT generate_series('2025-01-01'::date, '2025-12-01', '1 month') AS month;
Lateral join (top-N per group)	SELECT * FROM depts d, LATERAL (SELECT * FROM emps WHERE dept_id = d.id ORDER BY salary DESC LIMIT 3) e;
Deduplicate rows	DELETE FROM t WHERE ctid NOT IN (SELECT MIN(ctid) FROM t GROUP BY unique_cols);
Bulk update from values	UPDATE t SET name = v.name FROM (VALUES (1,'Ada'),(2,'Bob')) AS v(id,name) WHERE t.id = v.id;
Recursive CTE (tree walk)	WITH RECURSIVE tree AS (SELECT * FROM nodes WHERE parent_id IS NULL UNION ALL SELECT n.* FROM nodes n JOIN tree t ON n.parent_id = t.id) SELECT * FROM tree;
Select into temp table	CREATE TEMP TABLE active_users AS SELECT * FROM users WHERE active = true;

Frequently Asked Questions

What is the difference between VARCHAR and TEXT in PostgreSQL?

There is no performance difference between VARCHAR(n) and TEXT in PostgreSQL. Both are stored the same way internally. TEXT is generally preferred because it has no arbitrary length limit. Use VARCHAR(n) only when you need to enforce a maximum length as a data integrity constraint.

When should I use a subquery vs a JOIN?

Use a JOIN when you need columns from both tables in the result set. Use a subquery (especially EXISTS) when you only need to check for the existence of related rows. PostgreSQL's optimizer often rewrites one into the other, but EXISTS short-circuits and can be faster than IN for large datasets.

How do I improve slow queries in PostgreSQL?

Start with EXPLAIN ANALYZE to see the actual query plan. Look for sequential scans on large tables (add indexes), high row estimates vs actuals (run ANALYZE), and nested loops over large sets (consider hash joins). Common fixes: add targeted indexes, rewrite correlated subqueries as JOINs, use keyset pagination instead of OFFSET, and ensure your statistics are up to date with VACUUM ANALYZE.

Should I use SERIAL or IDENTITY for auto-incrementing IDs?

Use IDENTITY (GENERATED ALWAYS AS IDENTITY) for new tables. It is the SQL standard, has cleaner permission handling, and prevents accidental manual inserts. SERIAL creates a separate sequence with independent permissions, which can lead to subtle issues. IDENTITY has been available since PostgreSQL 10.

What is the difference between JSON and JSONB in PostgreSQL?

JSONB stores data in a binary format that is faster to process, supports indexing with GIN indexes, and allows containment queries with the @> operator. JSON stores the raw text and must be re-parsed on every access. Always use JSONB unless you need to preserve exact whitespace or duplicate keys.

How do CTEs work and when should I use them?

CTEs (WITH clauses) define named temporary result sets within a query. They improve readability for complex queries and are essential for recursive queries (WITH RECURSIVE). Since PostgreSQL 12, non-recursive CTEs are inlined by the optimizer, so they perform the same as subqueries. Use them whenever they make the query clearer.

PostgreSQL Cheat Sheet

Connecting & Database Management

CRUD Operations

Table & Schema Management

How Do JOINs Work in PostgreSQL?

Subquery vs JOIN vs CTE — Same Data, Different Approaches

Filtering, Sorting & Pagination

Aggregation & Grouping

What Are Window Functions?

How Should You Index PostgreSQL Tables?

Working with JSONB

Transactions & Locking

Performance & EXPLAIN

Common Data Types

Useful Patterns & Recipes

Frequently Asked Questions

What is the difference between VARCHAR and TEXT in PostgreSQL?

When should I use a subquery vs a JOIN?

How do I improve slow queries in PostgreSQL?

Should I use SERIAL or IDENTITY for auto-incrementing IDs?

What is the difference between JSON and JSONB in PostgreSQL?

How do CTEs work and when should I use them?

Keep Reading

PostgreSQL vs MongoDB: Why Postgres Wins

SQL Cheat Sheet — Queries & Commands Quick Reference

MySQL Cheat Sheet — Commands & Queries Quick Reference

Related Comparisons

PostgreSQL vs MySQL: Which Database to Choose?

SQL vs NoSQL: How to Choose the Right Database

Related Tools

SQL Formatter