env.dev

PostgreSQL Cheat Sheet

Comprehensive PostgreSQL reference: CRUD, JOINs with visual diagrams, subquery vs JOIN vs CTE alternatives, window functions, indexing, JSONB, transactions, and performance tuning.

Last updated:

PostgreSQL is the most advanced open-source relational database. This cheatsheet covers essential SQL commands, JOINs with visual diagrams, subquery vs JOIN alternatives, CTEs, window functions, indexing strategies, JSON operations, and performance gotchas. Designed for developers who need a fast, accurate reference for daily PostgreSQL work — from basic CRUD to advanced query patterns.

Connecting & Database Management

CommandDescription
psql -U user -d dbname -h hostConnect to a database
psql -U user -d dbname -c "SQL"Run a single SQL command
\lList all databases
\c dbnameSwitch to a database
\dtList tables in current schema
\d tablenameDescribe table structure
\dnList schemas
\duList roles/users
\qQuit psql
CREATE DATABASE mydb;Create a new database
DROP DATABASE mydb;Delete a database (irreversible)

CRUD Operations

OperationSyntax
INSERTINSERT INTO users (name, email) VALUES ('Ada', 'ada@example.com');
INSERT multipleINSERT INTO users (name) VALUES ('Ada'), ('Bob'), ('Eve');
INSERT ... RETURNINGINSERT INTO users (name) VALUES ('Ada') RETURNING id, name;
SELECTSELECT name, email FROM users WHERE active = true;
UPDATEUPDATE users SET name = 'Ada L' WHERE id = 1;
UPDATE ... RETURNINGUPDATE users SET name = 'Ada L' WHERE id = 1 RETURNING *;
UPSERTINSERT INTO users (id, name) VALUES (1, 'Ada') ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name;
DELETEDELETE FROM users WHERE id = 1;
TRUNCATETRUNCATE TABLE users; -- faster than DELETE, no row-level triggers

Gotcha

TRUNCATE resets sequences and fires no row-level triggers. It also acquires an ACCESS EXCLUSIVE lock. Use DELETE if you need triggers to fire or need row-level WHERE filtering.

Table & Schema Management

CommandDescription
CREATE TABLE t (id SERIAL PRIMARY KEY, name TEXT NOT NULL);Create table with auto-increment PK
CREATE TABLE t (id UUID DEFAULT gen_random_uuid() PRIMARY KEY);UUID primary key (pg 13+)
ALTER TABLE t ADD COLUMN email TEXT;Add a column
ALTER TABLE t DROP COLUMN email;Remove a column
ALTER TABLE t RENAME COLUMN old TO new;Rename a column
ALTER TABLE t ALTER COLUMN name SET NOT NULL;Add NOT NULL constraint
ALTER TABLE t ADD CONSTRAINT uq UNIQUE (email);Add unique constraint
DROP TABLE t;Drop a table
DROP TABLE IF EXISTS t CASCADE;Drop table + dependents
CREATE SCHEMA app; SET search_path TO app, public;Create and use a schema

Gotcha

Use IDENTITY columns instead of SERIAL for new tables: id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY. SERIAL creates an implicit sequence with separate permissions. IDENTITY is SQL-standard and safer.

How Do JOINs Work in PostgreSQL?

JOINs combine rows from two or more tables based on a related column. The type of JOIN determines which rows are included when there is no match.

INNER

AB

LEFT

AB

RIGHT

AB

FULL

AB

CROSS

AB
JOIN TypeReturnsExample
INNER JOINOnly matching rows in both tablesSELECT * FROM orders o INNER JOIN users u ON o.user_id = u.id;
LEFT JOINAll rows from left + matches from right (NULLs if no match)SELECT * FROM users u LEFT JOIN orders o ON u.id = o.user_id;
RIGHT JOINAll rows from right + matches from leftSELECT * FROM orders o RIGHT JOIN users u ON u.id = o.user_id;
FULL OUTER JOINAll rows from both (NULLs where no match)SELECT * FROM users u FULL OUTER JOIN orders o ON u.id = o.user_id;
CROSS JOINCartesian product (every combination)SELECT * FROM sizes CROSS JOIN colors;
LATERAL JOINSubquery can reference preceding tablesSELECT * FROM users u, LATERAL (SELECT * FROM orders WHERE user_id = u.id LIMIT 3) o;

Gotcha

LEFT JOIN with a WHERE clause on the right table effectively becomes an INNER JOIN. Move the filter into the ON clause instead: LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'active'.

Subquery vs JOIN vs CTE — Same Data, Different Approaches

These three approaches often produce identical results. Choose based on readability, performance, and whether you need to reuse the result.

Approach 1: Subquery (in WHERE)

SELECT name FROM users WHERE id IN ( SELECT user_id FROM orders WHERE total > 100 );

Approach 2: JOIN

SELECT DISTINCT u.name FROM users u JOIN orders o ON u.id = o.user_id WHERE o.total > 100;

Approach 3: CTE (Common Table Expression)

WITH big_orders AS ( SELECT DISTINCT user_id FROM orders WHERE total > 100 ) SELECT u.name FROM users u JOIN big_orders bo ON u.id = bo.user_id;
ApproachBest WhenWatch Out
Subquery in WHERESimple existence/membership checksIN with large sets can be slow — use EXISTS instead
JOINYou need columns from both tablesCan produce duplicates — use DISTINCT or aggregate
CTEComplex queries with reusable intermediate resultsBefore pg 12, CTEs were always materialized (optimization fence)
EXISTS subqueryChecking existence without caring about valuesFastest for "does any matching row exist?" — stops at first match
Correlated subqueryRow-by-row dependent logicRuns once per outer row — can be very slow on large tables

EXISTS vs IN — prefer EXISTS for large datasets

-- Slow with large orders table: SELECT name FROM users WHERE id IN (SELECT user_id FROM orders); -- Faster — stops at first match: SELECT name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id );

Filtering, Sorting & Pagination

PatternExample
WHERE ... AND / ORWHERE age > 18 AND status = 'active'
BETWEENWHERE created_at BETWEEN '2025-01-01' AND '2025-12-31'
INWHERE status IN ('active', 'pending')
LIKE / ILIKEWHERE name ILIKE '%ada%' -- case-insensitive
IS NULL / IS NOT NULLWHERE deleted_at IS NULL
ORDER BYORDER BY created_at DESC NULLS LAST
LIMIT + OFFSETLIMIT 20 OFFSET 40 -- page 3
Keyset paginationWHERE created_at < '2025-06-01' ORDER BY created_at DESC LIMIT 20

Gotcha

OFFSET-based pagination gets slower as offset grows because PostgreSQL must scan and discard rows. For large datasets, use keyset (cursor) pagination: WHERE id > last_seen_id ORDER BY id LIMIT 20.

Aggregation & Grouping

FunctionDescriptionExample
COUNT(*)Count all rowsSELECT COUNT(*) FROM orders;
COUNT(DISTINCT col)Count unique valuesSELECT COUNT(DISTINCT user_id) FROM orders;
SUM(col)Sum of valuesSELECT SUM(total) FROM orders WHERE status = 'paid';
AVG(col)Average valueSELECT AVG(total) FROM orders;
MIN(col) / MAX(col)Minimum/maximumSELECT MIN(created_at), MAX(created_at) FROM orders;
ARRAY_AGG(col)Collect into arraySELECT user_id, ARRAY_AGG(product) FROM orders GROUP BY user_id;
STRING_AGG(col, sep)Concatenate stringsSELECT STRING_AGG(name, ', ') FROM users;
GROUP BYGroup rows for aggregationSELECT status, COUNT(*) FROM orders GROUP BY status;
HAVINGFilter groups (not rows)SELECT user_id, COUNT(*) FROM orders GROUP BY user_id HAVING COUNT(*) > 5;

Gotcha

WHERE filters rows before grouping. HAVING filters groups after aggregation. Don't use HAVING for conditions that could go in WHERE — it's less efficient.

What Are Window Functions?

Window functions compute a value across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY does. They use the OVER() clause.

FunctionDescriptionExample
ROW_NUMBER()Unique sequential number per partitionROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
RANK()Rank with gaps on tiesRANK() OVER (ORDER BY score DESC)
DENSE_RANK()Rank without gaps on tiesDENSE_RANK() OVER (ORDER BY score DESC)
LAG(col, n)Value from n rows beforeLAG(revenue, 1) OVER (ORDER BY month) -- previous month
LEAD(col, n)Value from n rows afterLEAD(revenue, 1) OVER (ORDER BY month) -- next month
SUM() OVER()Running totalSUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)
AVG() OVER()Moving averageAVG(amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
NTILE(n)Split into n roughly equal bucketsNTILE(4) OVER (ORDER BY score) -- quartiles
FIRST_VALUE / LAST_VALUEFirst/last in window frameFIRST_VALUE(name) OVER (PARTITION BY dept ORDER BY salary DESC)

Top-N per group using ROW_NUMBER()

-- Get the top 3 highest paid employees per department SELECT * FROM ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) AS rn FROM employees ) ranked WHERE rn <= 3;

How Should You Index PostgreSQL Tables?

Index TypeUse CaseSyntax
B-tree (default)Equality and range queries (=, <, >, BETWEEN)CREATE INDEX idx ON t (col);
UniqueEnforce uniqueness + fast lookupsCREATE UNIQUE INDEX idx ON t (email);
CompositeQueries filtering on multiple columnsCREATE INDEX idx ON t (user_id, created_at);
PartialIndex only a subset of rowsCREATE INDEX idx ON orders (user_id) WHERE status = 'active';
GINFull-text search, JSONB, arraysCREATE INDEX idx ON t USING GIN (tags);
GiSTGeometric, range, full-text (proximity)CREATE INDEX idx ON t USING GiST (location);
BRINVery large tables with natural order (e.g. timestamps)CREATE INDEX idx ON logs USING BRIN (created_at);
ExpressionIndex a computed valueCREATE INDEX idx ON t (LOWER(email));
Covering (INCLUDE)Add non-key columns to avoid table lookupCREATE INDEX idx ON t (user_id) INCLUDE (name, email);
ConcurrentCreate without locking writesCREATE INDEX CONCURRENTLY idx ON t (col);

Gotcha

Composite index column order matters. An index on (a, b, c) supports queries on (a), (a, b), and (a, b, c) — but NOT (b) or (c) alone. Put the most selective or most frequently filtered column first.

Gotcha

Unused indexes slow down writes and waste disk. Check with: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Working with JSONB

Operator / FunctionDescriptionExample
->Get JSON object field (as JSON)data -> 'name'
->>Get JSON field as textdata ->> 'name'
#>Get nested field (as JSON)data #> '{address,city}'
#>>Get nested field as textdata #>> '{address,city}'
@>Contains (for indexing)WHERE data @> '{"role": "admin"}'
?Key existsWHERE data ? 'email'
jsonb_set()Update a nested fieldjsonb_set(data, '{name}', '"Ada"')
jsonb_build_object()Build JSONB from key-value pairsjsonb_build_object('id', 1, 'name', 'Ada')
jsonb_agg()Aggregate rows to JSONB arraySELECT jsonb_agg(row_to_json(u)) FROM users u;
jsonb_each_text()Expand JSONB to key-value rowsSELECT * FROM jsonb_each_text(data);
jsonb_array_elements()Expand JSONB array to rowsSELECT * FROM jsonb_array_elements('[1,2,3]');

Gotcha

Use JSONB, not JSON. JSON stores raw text and re-parses on every access. JSONB is binary, indexable with GIN, and supports containment operators (@>). The only reason to use JSON is if you need to preserve exact whitespace/key ordering.

Transactions & Locking

CommandDescription
BEGIN;Start a transaction
COMMIT;Save all changes
ROLLBACK;Undo all changes since BEGIN
SAVEPOINT sp1;Create a savepoint within a transaction
ROLLBACK TO sp1;Undo back to the savepoint
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;Strictest isolation (prevents phantom reads)
SELECT ... FOR UPDATE;Lock selected rows until COMMIT
SELECT ... FOR UPDATE SKIP LOCKED;Skip already-locked rows (great for job queues)
SELECT ... FOR UPDATE NOWAIT;Error immediately if rows are locked
LOCK TABLE t IN ACCESS EXCLUSIVE MODE;Full table lock (use sparingly)
Isolation LevelDirty ReadNon-Repeatable ReadPhantom Read
Read Uncommitted*NoYesYes
Read Committed (default)NoYesYes
Repeatable ReadNoNoNo**
SerializableNoNoNo

* PostgreSQL treats Read Uncommitted as Read Committed. ** In PostgreSQL, REPEATABLE READ is implemented as snapshot isolation and is stronger than the SQL-standard minimum, but true serializable semantics (using SSI) are provided by the SERIALIZABLE level.

Performance & EXPLAIN

CommandDescription
EXPLAIN SELECT ...;Show query plan (estimated)
EXPLAIN ANALYZE SELECT ...;Run query and show actual timings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;Detailed plan with buffer/IO stats
VACUUM ANALYZE tablename;Reclaim space + update statistics
SELECT pg_size_pretty(pg_total_relation_size('t'));Total size of table including indexes
SELECT * FROM pg_stat_user_tables;Table-level stats (seq scans, inserts, etc.)
SELECT * FROM pg_stat_activity;Currently running queries and connections
SELECT pg_cancel_backend(pid);Cancel a running query
SELECT pg_terminate_backend(pid);Kill a connection

Gotcha

EXPLAIN ANALYZE actually runs the query — including any writes from UPDATE/DELETE. Wrap destructive queries in a transaction: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;

Common Data Types

TypeDescriptionWhen to Use
INTEGER / BIGINT4-byte / 8-byte signed integerIDs, counts — use BIGINT for tables that may exceed 2B rows
NUMERIC(p,s)Exact decimal with precisionMoney, financial data — never use FLOAT for money
TEXTVariable-length string (unlimited)Most string data — no performance penalty vs VARCHAR(n)
VARCHAR(n)Variable-length with maxOnly when you need to enforce a max length
BOOLEANtrue / false / NULLFlags, toggles
TIMESTAMPTZTimestamp with time zoneAlmost always — use this over TIMESTAMP
UUID128-bit universally unique IDDistributed systems, public-facing IDs
JSONBBinary JSON (indexable)Semi-structured data, flexible schemas
TEXT[]Array of textTags, labels — consider a join table for large sets
INET / CIDRIPv4/IPv6 addressesNetworking data — supports containment operators
TSTZRANGETime range with timezoneBooking systems, schedules — supports overlap checks

Gotcha

Always use TIMESTAMPTZ (with time zone), not TIMESTAMP. PostgreSQL stores TIMESTAMPTZ in UTC internally and converts on display. Plain TIMESTAMP has no zone info and leads to subtle bugs across time zones.

Useful Patterns & Recipes

PatternSQL
Conditional countSELECT COUNT(*) FILTER (WHERE status = 'paid') AS paid, COUNT(*) FILTER (WHERE status = 'pending') AS pending FROM orders;
Coalesce NULLSELECT COALESCE(nickname, name, 'Anonymous') FROM users;
Generate seriesSELECT generate_series('2025-01-01'::date, '2025-12-01', '1 month') AS month;
Lateral join (top-N per group)SELECT * FROM depts d, LATERAL (SELECT * FROM emps WHERE dept_id = d.id ORDER BY salary DESC LIMIT 3) e;
Deduplicate rowsDELETE FROM t WHERE ctid NOT IN (SELECT MIN(ctid) FROM t GROUP BY unique_cols);
Bulk update from valuesUPDATE t SET name = v.name FROM (VALUES (1,'Ada'),(2,'Bob')) AS v(id,name) WHERE t.id = v.id;
Recursive CTE (tree walk)WITH RECURSIVE tree AS (SELECT * FROM nodes WHERE parent_id IS NULL UNION ALL SELECT n.* FROM nodes n JOIN tree t ON n.parent_id = t.id) SELECT * FROM tree;
Select into temp tableCREATE TEMP TABLE active_users AS SELECT * FROM users WHERE active = true;

Frequently Asked Questions

What is the difference between VARCHAR and TEXT in PostgreSQL?

There is no performance difference between VARCHAR(n) and TEXT in PostgreSQL. Both are stored the same way internally. TEXT is generally preferred because it has no arbitrary length limit. Use VARCHAR(n) only when you need to enforce a maximum length as a data integrity constraint.

When should I use a subquery vs a JOIN?

Use a JOIN when you need columns from both tables in the result set. Use a subquery (especially EXISTS) when you only need to check for the existence of related rows. PostgreSQL's optimizer often rewrites one into the other, but EXISTS short-circuits and can be faster than IN for large datasets.

How do I improve slow queries in PostgreSQL?

Start with EXPLAIN ANALYZE to see the actual query plan. Look for sequential scans on large tables (add indexes), high row estimates vs actuals (run ANALYZE), and nested loops over large sets (consider hash joins). Common fixes: add targeted indexes, rewrite correlated subqueries as JOINs, use keyset pagination instead of OFFSET, and ensure your statistics are up to date with VACUUM ANALYZE.

Should I use SERIAL or IDENTITY for auto-incrementing IDs?

Use IDENTITY (GENERATED ALWAYS AS IDENTITY) for new tables. It is the SQL standard, has cleaner permission handling, and prevents accidental manual inserts. SERIAL creates a separate sequence with independent permissions, which can lead to subtle issues. IDENTITY has been available since PostgreSQL 10.

What is the difference between JSON and JSONB in PostgreSQL?

JSONB stores data in a binary format that is faster to process, supports indexing with GIN indexes, and allows containment queries with the @> operator. JSON stores the raw text and must be re-parsed on every access. Always use JSONB unless you need to preserve exact whitespace or duplicate keys.

How do CTEs work and when should I use them?

CTEs (WITH clauses) define named temporary result sets within a query. They improve readability for complex queries and are essential for recursive queries (WITH RECURSIVE). Since PostgreSQL 12, non-recursive CTEs are inlined by the optimizer, so they perform the same as subqueries. Use them whenever they make the query clearer.