3.2.5.3. SQL Advanced Queries#

Up to now, we’ve learned how to retrieve data from a single table or by combining multiple tables. However, in real-world analysis, you often need to do more than just list records:

  • Find the largest or smallest values

  • Compute averages or totals

  • Select only a specific portion of the dataset

  • Create derived results with custom logic

SQL provides many built-in features for filtering, sorting, aggregation, and more advanced operations.

Filtering and Ordering#

Filtering and ordering allow you to narrow down and arrange your results.

WHERE#

Use WHERE to select rows matching specific conditions.
The condition can involve comparisons, ranges, or pattern matching.

Example:

SELECT *
FROM employees
WHERE dept_id = 2;

You can also combine multiple conditions using logical operators like AND, OR, and NOT.

Examples with multiple conditions:

-- Employees in department 2 earning more than 60,000
SELECT *
FROM employees
WHERE dept_id = 2
  AND salary > 60000;
-- Employees in department 2 or department 3
SELECT *
FROM employees
WHERE dept_id = 2
   OR dept_id = 3;
-- Employees NOT in department 1
SELECT *
FROM employees
WHERE NOT (dept_id = 1);

ORDER BY#

ORDER BY lets you sort rows by one or more columns.

Example:

SELECT *
FROM employees
ORDER BY salary DESC;

This lists employees with the highest salary first. By default, ORDER BY uses ascending order (ASC) if you don’t specify DESC.

You can also chain multiple columns to define a secondary sort order. When sorting by multiple columns, SQL first orders by the first column; if there are ties, it orders by the next column.

Example:

SELECT *
FROM employees
ORDER BY dept_id ASC, salary DESC;

This sorts rows by dept_id in ascending order. For employees in the same department, it then sorts by salary in descending order.

Note

In many databases, if you need deterministic ordering when values are identical in the sort columns, you must explicitly include all the columns you want to sort by - otherwise, the order can be arbitrary.

If you want to simulate prioritizing a secondary column first, you can first sort by it and then sort by your main column in an outer query:

Example workaround:

-- First, sort by salary ascending
SELECT *
FROM (
    SELECT *
    FROM employees
    ORDER BY salary ASC
)
ORDER BY dept_id ASC;

This way, within each department (dept_id), employees are already sorted by salary.

LIMIT and OFFSET#

LIMIT and OFFSET help you restrict the number of rows returned or skip rows.

  • LIMIT specifies how many rows to return.

  • OFFSET specifies how many rows to skip before starting to return rows.

By default:

  • If you don’t provide OFFSET, it is assumed to be 0 (start from the first row).

  • If you don’t provide LIMIT, all rows are returned.

Example:

SELECT *
FROM employees
ORDER BY emp_id
LIMIT 5 OFFSET 10;

This returns 5 rows starting after skipping the first 10 rows; in other words, rows 11 through 15.

CASE WHEN#

CASE WHEN allows you to create conditional logic directly in your SQL queries.
It’s like an IF-THEN-ELSE statement for each row.

Example: Categorize temperatures as High / Medium / Low

SELECT datetime, Celsius,
    CASE
        WHEN Celsius >= 30 THEN 'High'
        WHEN Celsius >= 15 THEN 'Medium'
        ELSE 'Low'
    END AS temp_category
FROM weather;

This query labels each temperature reading into a category.

Aggregation#

In many analyses, we don’t want to see each individual record. Instead, we want to understand trends, summaries, or boundary points of our data.

Aggregation allows you to summarize data. For example, counting rows, calculating averages, or finding minimum and maximum values.

GROUP BY#

GROUP BY is one of the most widely used SQL features for analysis. It lets you group similar records together and compute statistics for each group. Essentially, you split the data into groups and then apply aggregation functions on each group.

Example:

SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id;

This shows the average salary per department.

Note

Every column in your SELECT statement must either:

  • Appear in the GROUP BY clause, or

  • Be wrapped in an aggregation function.

This ensures the query knows how to collapse rows into groups.

There are numerous built-in aggregation functions you can use to calculate different metrics and explore trends in your data.

Aggregation Functions#

SQL provides several built-in functions you can use to calculate summary statistics:

  • MIN() – Get the smallest value in a range

  • MAX() – Get the largest value in a range

  • AVG() – Calculate the average of values

  • COUNT() – Count the number of rows/records

  • SUM() – Compute the total sum of values

Each of these functions can be used together with a GROUP BY clause or applied to the entire table without grouping.

Example without GROUP BY (summarize the whole table):

SELECT
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    AVG(salary) AS avg_salary,
    COUNT(*) AS num_employees
FROM employees;

Example with GROUP BY (summarize per department):

SELECT
    dept_id,
    COUNT(*) AS num_employees,
    AVG(salary) AS avg_salary,
    MAX(salary) AS highest_salary
FROM employees
GROUP BY dept_id;

These functions help you quickly understand the distribution and scale of your data.

Window Functions#

Window functions allow you to calculate running totals, moving averages, ranks, and other metrics across a set of rows without collapsing them into groups.
They are powerful tools for advanced analysis.

OVER#

The OVER clause defines the window (the range of rows) over which the function operates.

Example: Calculate the running average temperature

SELECT datetime, Celsius,
       AVG(Celsius) OVER (
           ORDER BY datetime
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS running_avg_celsius
FROM weather;

How it works:

  • ORDER BY datetime defines the sequence of rows.

  • ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means:

    • For each row, look at the current row and the 6 rows before it.

  • AVG() computes the average within that window.

ROW_NUMBER()#

ROW_NUMBER() assigns a unique sequential number to each row, based on the specified order.

Example:

SELECT datetime, Celsius,
       ROW_NUMBER() OVER (ORDER BY datetime) AS row_num
FROM weather;

This query adds a column numbering each row in chronological order.

RANK()#

RANK() assigns a rank to each row, with ties receiving the same rank and subsequent ranks skipping ahead.

Example: Rank temperatures within each day (highest temperature gets rank 1)

SELECT datetime, Celsius,
       RANK() OVER (
           PARTITION BY DATE(datetime)
           ORDER BY Celsius DESC
       ) AS temp_rank
FROM weather;

How it works:

  • PARTITION BY DATE(datetime) restarts the ranking for each day.

  • ORDER BY Celsius DESC sorts temperatures from highest to lowest.

SUM() OVER() (Running Totals)#

SUM() OVER() computes a cumulative total across rows without grouping.

Example: Cumulative sum of Celsius temperatures over time

SELECT datetime, Celsius,
       SUM(Celsius) OVER (
           ORDER BY datetime
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_sum
FROM weather;

How it works:

  • UNBOUNDED PRECEDING means the window starts at the very first row.

  • CURRENT ROW means it ends at the current row.

  • The result is a running total of temperatures up to each moment.

Window functions are essential for many time series and ranking analyses. You will see them often in data reporting and dashboards.

Complex Queries#

For more advanced analysis, SQL supports nested queries, common table expressions (CTEs), and recursive queries.

These techniques allow you to:

  • Break down complicated problems into smaller steps

  • Reuse intermediate results

  • Create more readable and maintainable queries

Below, we’ll go over a few common types and examples.

Subqueries (Nested Queries)#

A subquery is a query inside another query.
It allows you to calculate intermediate results and then use them as conditions or inputs in the outer query.

Example: Get employees earning more than the overall average salary:

SELECT *
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);

Here, the inner query calculates the average salary of all employees, and the outer query selects only those employees whose salary is above that average.

WITH (Common Table Expressions)#

WITH allows you to define temporary result sets, called Common Table Expressions (CTEs), that can be referenced within the main query.
This makes complex queries easier to read, maintain, and debug.

Example:

WITH dept_avg AS (
    SELECT dept_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY dept_id
)
SELECT e.name, e.salary, d.avg_salary
FROM employees e
JOIN dept_avg d ON e.dept_id = d.dept_id
WHERE e.salary > d.avg_salary;

In this example:

  • The dept_avg CTE calculates the average salary per department.

  • The main query then joins each employee with their department’s average.

  • Finally, it filters employees whose salary is above their department’s average.

WITH RECURSIVE#

Recursive CTEs let you repeatedly reference the same CTE to build up sequences or traverse hierarchies. They are especially useful for working with trees, graphs, or generating series of data.

Example: Generate numbers 1–5:

WITH RECURSIVE numbers(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 5
)
SELECT * FROM numbers;

Here’s how it works:

  • The anchor member SELECT 1 starts the sequence.

  • The recursive member SELECT n + 1 keeps adding 1 until the condition n < 5 is no longer true.

  • The result is a list of numbers from 1 to 5.

Recursive CTEs are a powerful tool for solving problems that require step-by-step accumulation or traversal.