Free SQL tutorial for beginners. Learn SELECT, WHERE, JOIN, GROUP BY, ORDER BY, SUM, COUNT with real examples and practice exercises. Plain English, no jargon.
A free, plain-English SQL reference for beginners. Covers SELECT, WHERE, JOIN, GROUP BY, ORDER BY, SUM, COUNT, and more — with real examples, practice exercises, and zero jargon.
Part of the NEXUS project — a free game where you learn SQL by solving a corporate fraud mystery. This guide covers every SQL concept taught in Season 1 and works as a standalone reference even if you don’t play the game.
This isn’t a textbook. It’s the cheat sheet your future self will want when you sit down at a job and someone says, “Can you pull that data?”
What’s covered: SELECT, FROM, WHERE, COUNT, SUM, GROUP BY, ORDER BY, JOIN, IN, HAVING, primary keys, foreign keys, aliases, subqueries, NULL handling, and the SQL order of operations.
SQL (Structured Query Language) is how you talk to databases. A database stores information in tables — think spreadsheet tabs. Each table has columns (fields) and rows (records). You write queries to ask questions about the data, and the database answers.
Every query follows the same basic shape:
SELECT what_you_want
FROM which_table
WHERE some_condition;
That’s it. Everything else is a variation on that pattern.
What it is: A table stores data in rows and columns — like one tab in a spreadsheet. A database is just a collection of related tables.
Why it matters: Before you can analyze anything, you need to know what tables exist and what data lives in each one.
Syntax:
-- List all tables in the database
SELECT name FROM sqlite_master WHERE type='table';
Real-world analogy: A database is a filing cabinet. Each table is a labeled drawer — “Employees,” “Transactions,” “Vendors.”
Common mistake: Table names are case-sensitive in some databases. Always check the exact name before querying.
In the investigation: You logged into the Nexus database and found 5 tables: employees, vendors, transactions, departments, and projects. This was your map.
What it is:
SELECT retrieves data. The asterisk * means “every column.”
FROM names the table you’re pulling from.
Why it matters:
The first thing you do with any new dataset: look at it.
SELECT * is your starting point.
Syntax:
SELECT * FROM employees;
SELECT * FROM transactions;
Real-world analogy: Opening a spreadsheet and scrolling through all the rows before you decide what to focus on.
Common mistake:
SELECT * on a huge table can be slow. Add LIMIT 10 to preview
without loading millions of rows:
SELECT * FROM transactions LIMIT 10;
In the investigation: You pulled the full employee roster — 10 people including yourself, Alex Chen. This told you who works here and in which departments.
What it is:
COUNT(*) is an aggregate function. Instead of returning every
matching row, it collapses them into a single number: the count.
Why it matters: “How many customers do we have?” “How many transactions happened this month?” COUNT answers those instantly.
Syntax:
SELECT COUNT(*) FROM employees;
SELECT COUNT(*) FROM transactions WHERE vendor_id = 4;
Real-world analogy: Asking someone to count the people in a room vs. listing every person’s name. You just want the number.
Common mistake:
COUNT(*) counts all rows, including ones with NULL values.
COUNT(column_name) skips rows where that column is NULL.
They give different numbers if your data has gaps.
In the investigation:
Sam asked for a headcount. You ran COUNT(*) on employees and
got 10. Quick, clean, useful.
What it is:
WHERE limits your results to only rows that match a condition.
Everything that doesn’t match gets filtered out.
Why it matters: Real datasets have thousands or millions of rows. WHERE is how you zoom in on just what matters.
Syntax:
SELECT * FROM vendors WHERE verified = 0;
SELECT * FROM employees WHERE department = 'Finance';
Comparison operators:
| Operator | Meaning |
|———-|———|
| = | Equal to |
| != or <> | Not equal to |
| > | Greater than |
| < | Less than |
| >= | Greater than or equal to |
| <= | Less than or equal to |
Real-world analogy: Ctrl+F in a spreadsheet — but instead of highlighting matches, it hides everything that doesn’t match.
Common mistake:
Strings need quotes: WHERE name = 'Alex'.
Numbers don’t: WHERE id = 4.
Mix them up and you’ll get errors.
In the investigation:
You filtered vendors where verified = 0 and found two unverified
companies: Apex Solutions LLC and Pinnacle Strategy Group.
No address. No phone. Red flags everywhere.
What it is:
SUM() adds up all the values in a column. Like COUNT, it’s an
aggregate function — it collapses rows into a single number.
Why it matters: “How much did we spend on this vendor?” “What’s our total revenue?” SUM is the answer.
Syntax:
SELECT SUM(amount) FROM transactions;
SELECT SUM(amount) FROM transactions WHERE vendor_id = 4;
Real-world analogy: Adding up all the receipts in a folder to get a total.
Common mistake:
SUM() returns NULL if no rows match your WHERE clause.
Wrap it in COALESCE to get 0 instead:
SELECT COALESCE(SUM(amount), 0) FROM transactions WHERE vendor_id = 99;
-- Returns 0 instead of NULL when nothing matches
In the investigation: You totalled all payments to Apex Solutions — over $1.2 million to a company with no verified address.
What it is:
GROUP BY bundles rows that share a value into groups, then lets
you run aggregates (SUM, COUNT, AVG) on each group separately.
Why it matters: “How much did we spend per vendor?” “How many orders per customer?” GROUP BY splits the data into piles and lets you summarize each pile.
Syntax:
SELECT vendor_id, SUM(amount) AS total
FROM transactions
GROUP BY vendor_id
ORDER BY total DESC;
Real-world analogy: Sorting receipts by category — groceries, utilities, restaurants — and totalling each pile.
Common mistake: Every column in your SELECT must either appear in GROUP BY or be inside an aggregate function (SUM, COUNT, AVG, etc.). If you select a column without grouping or aggregating it, the database won’t know which value to show.
-- ✅ Correct
SELECT vendor_id, SUM(amount) FROM transactions GROUP BY vendor_id;
-- ❌ Wrong — 'date' isn't grouped or aggregated
SELECT vendor_id, date, SUM(amount) FROM transactions GROUP BY vendor_id;
In the investigation: You ranked vendors by total spend. Two vendor IDs — 4 and 7 — were dramatically higher than everything else. That’s the pattern that cracked the case open.
What it is:
ORDER BY sorts your results. ASC = smallest first (default).
DESC = largest first.
Why it matters: “Who are our top 10 customers?” “What’s our biggest expense?” Sorting reveals the outliers immediately.
Syntax:
SELECT * FROM departments ORDER BY budget DESC;
SELECT * FROM employees ORDER BY name ASC;
Sorting by multiple columns:
SELECT * FROM transactions ORDER BY vendor_id ASC, amount DESC;
-- First sort by vendor, then within each vendor sort by amount
Real-world analogy: Clicking a column header in a spreadsheet to flip between A→Z and Z→A.
Common mistake: ORDER BY runs after WHERE and GROUP BY. You’re sorting the final result, not the raw table. The order of operations matters:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
In the investigation: You sorted Apex transactions by date and watched the amounts escalate: $87K → $243K in 9 months. He was getting bolder. ORDER BY made the pattern undeniable.
What it is:
JOIN combines rows from two tables where a column matches.
Think of it as a VLOOKUP that actually works properly.
Why it matters: Data is intentionally split across tables to avoid duplication. JOINs let you reassemble the pieces for analysis.
Syntax:
SELECT t.date, t.amount, v.name
FROM transactions t
JOIN vendors v ON t.vendor_id = v.id;
Table aliases:
The t and v after table names are aliases — shortcuts so you
don’t have to type transactions.date every time.
Types of JOINs:
| Type | What it keeps |
|——|—————|
| JOIN (or INNER JOIN) | Only rows that match in both tables |
| LEFT JOIN | All rows from the left table, matched or not |
| RIGHT JOIN | All rows from the right table |
| FULL OUTER JOIN | Everything from both tables |
Real-world analogy: You have a list of order IDs and a separate list of customer names. JOIN links them by the shared ID so you can see which customer made which order.
Common mistake:
If the join column has NULLs, those rows get dropped with a regular
JOIN. Use LEFT JOIN if you need to keep unmatched rows.
In the investigation: You linked transactions to vendor names and confirmed — every large unverified payment went to Apex or Pinnacle. Without the JOIN, you only had vendor IDs. With it, you had names.
What it is: A primary key is a unique ID column — no two rows share the same value. A foreign key is a column in one table that points to the primary key of another table. Together, they’re how tables connect.
Why it matters:
When you see approved_by in the transactions table with a value of
4, that’s a foreign key pointing to id = 4 in the employees table.
Understanding this link is how you follow the trail.
Syntax:
-- employees.id is a primary key (unique per employee)
SELECT * FROM employees WHERE id = 4;
-- transactions.approved_by is a foreign key → employees.id
SELECT * FROM transactions WHERE approved_by = 4;
Real-world analogy: A social security number — unique per person, used to link records across different systems (tax records, bank accounts, medical files).
Common mistake: Primary keys usually auto-increment (1, 2, 3…) but don’t assume they start at 1 or have no gaps. Deleted rows can leave holes.
In the investigation:
Every suspicious transaction had approved_by = 4. You looked up
employee #4 — Marcus Webb, the CFO. The foreign key led you straight
to the person signing off on the fraud.
What it is:
IN lets you match a column against multiple values at once.
It replaces a stack of OR conditions.
Why it matters: “Show me transactions from vendors 4 and 7.” IN is cleaner, faster, and easier to read than chaining ORs.
Syntax:
SELECT * FROM transactions
WHERE vendor_id IN (4, 7);
-- Equivalent to:
SELECT * FROM transactions
WHERE vendor_id = 4 OR vendor_id = 7;
Advanced — subqueries with IN:
SELECT * FROM transactions
WHERE vendor_id IN (
SELECT id FROM vendors WHERE verified = 0
);
-- Finds all transactions from unverified vendors
Real-world analogy: A guest list check — is this name on the list? If yes, include them.
Common mistake:
NOT IN is powerful but dangerous with NULLs. If any value in the
list is NULL, NOT IN returns no results. Filter NULLs out first.
In the investigation:
You queried both shell companies at once — WHERE vendor_id IN (4, 7).
Same approver, same pattern, thirteen months of payments.
What it is:
HAVING is like WHERE, but it filters after aggregation.
Use it when you need to filter on aggregated values (SUM, COUNT, AVG).
Why it matters: “Show me vendors where total spend exceeds $100,000.” You can’t use WHERE here because the total doesn’t exist until GROUP BY runs.
Syntax:
SELECT vendor_id, SUM(amount) AS total
FROM transactions
GROUP BY vendor_id
HAVING total > 100000;
WHERE vs. HAVING:
-- WHERE filters individual rows BEFORE grouping
SELECT vendor_id, SUM(amount) FROM transactions
WHERE amount > 1000
GROUP BY vendor_id;
-- HAVING filters groups AFTER aggregation
SELECT vendor_id, SUM(amount) AS total FROM transactions
GROUP BY vendor_id
HAVING total > 100000;
Common mistake: Using WHERE when you mean HAVING (or vice versa). Rule of thumb: if you’re filtering on a SUM, COUNT, or AVG, use HAVING.
In the investigation: You combined aggregates to calculate the total fraud amount across both shell companies: $1,869,500.
This is the order SQL actually processes your query, regardless of how you write it:
1. FROM — Which table(s)?
2. JOIN — Combine tables
3. WHERE — Filter individual rows
4. GROUP BY — Bundle rows into groups
5. HAVING — Filter groups
6. SELECT — Choose which columns to show
7. ORDER BY — Sort the results
8. LIMIT — Cap the number of rows returned
You write SELECT first, but the database processes FROM and
WHERE before it even looks at what you’re selecting.
| When you need to… | Use this |
|---|---|
| See what tables exist | SELECT name FROM sqlite_master WHERE type='table'; |
| Look at all data in a table | SELECT * FROM table_name; |
| Preview a big table | SELECT * FROM table_name LIMIT 10; |
| Count rows | SELECT COUNT(*) FROM table_name; |
| Filter rows | SELECT * FROM table_name WHERE condition; |
| Add up a column | SELECT SUM(column) FROM table_name; |
| Group and summarize | SELECT col, SUM(val) FROM table GROUP BY col; |
| Sort results | SELECT * FROM table_name ORDER BY column DESC; |
| Connect two tables | SELECT * FROM a JOIN b ON a.id = b.a_id; |
| Match a list of values | WHERE column IN (1, 2, 3) |
| Filter after grouping | HAVING SUM(amount) > 1000 |
| Rename a column in output | SELECT SUM(amount) AS total |
You finished the investigation. Now build the muscle memory. These exercises use the same NEXUS database — open the game and try them. No objectives required, just you and the SQL editor.
Build the fraud summary yourself. Write a single query that shows: vendor name, total amount paid, number of transactions, and whether they’re verified — for every vendor — sorted by total amount descending.
When you nail this one, you’ve got real analyst skills.
| Term | Definition |
|---|---|
| Query | A question you ask the database, written in SQL |
| Table | A structured set of data organized in rows and columns |
| Row (record) | One entry in a table (one employee, one transaction) |
| Column (field) | One attribute (name, amount, date) |
| Primary key | A unique identifier for each row in a table |
| Foreign key | A column that references the primary key of another table |
| Aggregate function | A function that combines multiple rows into one value (COUNT, SUM, AVG, MIN, MAX) |
| NULL | A missing or unknown value — not zero, not empty string, just absent |
| Alias | A temporary name for a column or table (AS total, FROM transactions t) |
| Subquery | A query nested inside another query |
| Schema | The structure of a database — what tables exist and what columns they have |
| SQLite | The lightweight database engine NEXUS uses. No server required — the whole database is one file |
| CRUD | Create, Read, Update, Delete — the four basic operations on data |
Season 1 covered SQL fundamentals — the tools you need for 80% of real-world data analysis work. Season 2 introduces Python alongside SQL, which is where things get interesting (and a little… strange).
Until then, the best thing you can do is practice. Open the game, run queries, break things, fix them. Every analyst you’ll ever meet got good the same way: by writing queries until the syntax stopped feeling foreign and started feeling like thinking.
Part of the NEXUS project by Michael Nocito.
If this helped, the best thing you can do is ⭐ star the repo — it helps other learners find it.
Tips are never expected, but always appreciated.
☕ Buy Me a Coffee