視窗函數實戰
視窗函數是 SQL 中最強大的分析工具。它能在不合併列的情況下,對「視窗」內的資料進行計算。本篇將完整介紹視窗函數的各種應用。
一、 視窗函數 vs 聚合函數
| 特性 | 聚合函數 (GROUP BY) | 視窗函數 (OVER) |
|---|---|---|
| 結果列數 | 合併成一列 | 保留所有列 |
| 使用方式 | SUM(amount) | SUM(amount) OVER (...) |
| 可存取原始資料 | 否 | 是 |
sql
-- 聚合函數:每個部門一列
SELECT department, SUM(salary)
FROM employees
GROUP BY department;
-- 視窗函數:每個員工一列,但附帶部門總薪資
SELECT name, department, salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;二、 基本語法
sql
函數名稱(...) OVER (
[PARTITION BY 分組欄位]
[ORDER BY 排序欄位]
[ROWS/RANGE 框架定義]
)範例資料
sql
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
salesperson TEXT,
region TEXT,
amount NUMERIC(10, 2),
sale_date DATE
);
INSERT INTO sales (salesperson, region, amount, sale_date) VALUES
('Alice', 'North', 1000, '2024-01-15'),
('Alice', 'North', 1500, '2024-01-20'),
('Alice', 'North', 800, '2024-02-10'),
('Bob', 'North', 1200, '2024-01-18'),
('Bob', 'South', 900, '2024-02-05'),
('Charlie', 'South', 1100, '2024-01-25'),
('Charlie', 'South', 1300, '2024-02-15');三、 排名函數
ROW_NUMBER()
為每列分配唯一的序號:
sql
SELECT
salesperson,
amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) AS rank
FROM sales;| salesperson | amount | rank |
|---|---|---|
| Alice | 1500 | 1 |
| Charlie | 1300 | 2 |
| Bob | 1200 | 3 |
| Charlie | 1100 | 4 |
| ... | ... | ... |
PARTITION BY:分組排名
sql
-- 每個區域內的銷售排名
SELECT
salesperson,
region,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS region_rank
FROM sales;| salesperson | region | amount | region_rank |
|---|---|---|---|
| Alice | North | 1500 | 1 |
| Bob | North | 1200 | 2 |
| Alice | North | 1000 | 3 |
| Charlie | South | 1300 | 1 |
| Charlie | South | 1100 | 2 |
| Bob | South | 900 | 3 |
RANK() 與 DENSE_RANK()
當有相同值時的處理方式:
sql
SELECT
salesperson,
amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num,
RANK() OVER (ORDER BY amount DESC) AS rank,
DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank
FROM sales;假設有兩筆 1500:
| amount | row_num | rank | dense_rank |
|---|---|---|---|
| 1500 | 1 | 1 | 1 |
| 1500 | 2 | 1 | 1 |
| 1300 | 3 | 3 | 2 |
- ROW_NUMBER:永遠唯一,相同值也給不同序號
- RANK:相同值同名次,下一個跳過(1, 1, 3)
- DENSE_RANK:相同值同名次,下一個連續(1, 1, 2)
NTILE():分組
將結果平均分成 N 組:
sql
-- 將銷售額分成 4 組(四分位數)
SELECT
salesperson,
amount,
NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales;四、 存取其他列
LAG() 與 LEAD()
存取當前列的前/後列:
sql
SELECT
salesperson,
sale_date,
amount,
LAG(amount) OVER (ORDER BY sale_date) AS prev_amount,
LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales;| sale_date | amount | prev_amount | next_amount |
|---|---|---|---|
| 2024-01-15 | 1000 | NULL | 1200 |
| 2024-01-18 | 1200 | 1000 | 1500 |
| 2024-01-20 | 1500 | 1200 | 1100 |
指定偏移量和預設值:
sql
LAG(amount, 2, 0) OVER (...) -- 往前 2 列,若無則返回 0計算變化量
sql
SELECT
salesperson,
sale_date,
amount,
amount - LAG(amount) OVER (ORDER BY sale_date) AS diff,
ROUND(
(amount - LAG(amount) OVER (ORDER BY sale_date)) /
LAG(amount) OVER (ORDER BY sale_date) * 100, 2
) AS pct_change
FROM sales;FIRST_VALUE() 與 LAST_VALUE()
sql
SELECT
salesperson,
amount,
FIRST_VALUE(amount) OVER (ORDER BY sale_date) AS first_sale,
LAST_VALUE(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_sale
FROM sales;> **LAST_VALUE 的陷阱**
預設框架只到當前列,所以 LAST_VALUE 會返回當前列的值。需要明確指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING。
NTH_VALUE()
取得第 N 列的值:
sql
SELECT
salesperson,
amount,
NTH_VALUE(amount, 2) OVER (ORDER BY amount DESC) AS second_highest
FROM sales;五、 聚合視窗函數
標準聚合函數加上 OVER 就變成視窗函數:
sql
SELECT
salesperson,
region,
amount,
SUM(amount) OVER () AS total, -- 全部總和
SUM(amount) OVER (PARTITION BY region) AS region_total, -- 區域總和
AVG(amount) OVER (PARTITION BY salesperson) AS person_avg -- 個人平均
FROM sales;累計計算
sql
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total,
AVG(amount) OVER (ORDER BY sale_date) AS running_avg,
COUNT(*) OVER (ORDER BY sale_date) AS running_count
FROM sales;| sale_date | amount | running_total | running_avg |
|---|---|---|---|
| 2024-01-15 | 1000 | 1000 | 1000.00 |
| 2024-01-18 | 1200 | 2200 | 1100.00 |
| 2024-01-20 | 1500 | 3700 | 1233.33 |
六、 框架定義 (Frame)
框架定義了視窗函數計算時包含哪些列。
語法
sql
ROWS BETWEEN 起點 AND 終點
RANGE BETWEEN 起點 AND 終點可用的起點/終點:
UNBOUNDED PRECEDING:從第一列開始n PRECEDING:往前 n 列CURRENT ROW:當前列n FOLLOWING:往後 n 列UNBOUNDED FOLLOWING:到最後一列
ROWS vs RANGE
sql
-- ROWS:以「列數」計算
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- 當前列 + 前 2 列
-- RANGE:以「值」計算
SUM(amount) OVER (ORDER BY sale_date RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW)
-- 當前日期 ~ 7 天前的所有列常見框架
sql
-- 移動平均(近 3 筆)
AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- 累計到目前為止
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- 整個分區
SUM(amount) OVER (PARTITION BY region)
-- 等同於
SUM(amount) OVER (PARTITION BY region ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)預設框架
| 情況 | 預設框架 |
|---|---|
| 沒有 ORDER BY | 整個分區 |
| 有 ORDER BY | RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
七、 使用 WINDOW 子句
當多個視窗函數使用相同定義時,可以命名重複使用:
sql
SELECT
salesperson,
amount,
ROW_NUMBER() OVER w AS row_num,
SUM(amount) OVER w AS running_total,
AVG(amount) OVER w AS running_avg
FROM sales
WINDOW w AS (PARTITION BY region ORDER BY sale_date);八、 實戰範例
每組前 3 名
sql
WITH ranked AS (
SELECT
salesperson,
region,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
FROM sales
)
SELECT * FROM ranked WHERE rn <= 3;計算佔比
sql
SELECT
salesperson,
region,
amount,
ROUND(amount / SUM(amount) OVER (PARTITION BY region) * 100, 2) AS pct_of_region,
ROUND(amount / SUM(amount) OVER () * 100, 2) AS pct_of_total
FROM sales;連續天數計算
sql
WITH numbered AS (
SELECT
user_id,
login_date,
login_date - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date)::INTEGER AS grp
FROM logins
)
SELECT user_id, MIN(login_date), MAX(login_date), COUNT(*) AS consecutive_days
FROM numbered
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;同比/環比
sql
WITH monthly AS (
SELECT
DATE_TRUNC('month', sale_date) AS month,
SUM(amount) AS total
FROM sales
GROUP BY 1
)
SELECT
month,
total,
LAG(total) OVER (ORDER BY month) AS prev_month,
LAG(total, 12) OVER (ORDER BY month) AS same_month_last_year,
ROUND((total - LAG(total) OVER (ORDER BY month)) / LAG(total) OVER (ORDER BY month) * 100, 2) AS mom_pct
FROM monthly;九、 視窗函數總覽
| 函數 | 用途 |
|---|---|
ROW_NUMBER() | 唯一序號 |
RANK() | 有間隔的排名 |
DENSE_RANK() | 無間隔的排名 |
NTILE(n) | 分成 n 組 |
LAG(col, n) | 取前 n 列的值 |
LEAD(col, n) | 取後 n 列的值 |
FIRST_VALUE(col) | 視窗內第一個值 |
LAST_VALUE(col) | 視窗內最後一個值 |
NTH_VALUE(col, n) | 視窗內第 n 個值 |
SUM/AVG/COUNT OVER | 視窗內的聚合 |
總結
- 排名:ROW_NUMBER, RANK, DENSE_RANK, NTILE
- 存取其他列:LAG, LEAD, FIRST_VALUE, LAST_VALUE
- 聚合視窗:SUM, AVG, COUNT 加上 OVER
- 框架:ROWS/RANGE BETWEEN 定義計算範圍
- 命名視窗:WINDOW 子句重複使用定義
視窗函數是寫複雜報表和分析查詢的利器,務必熟練掌握!
進階挑戰
- 使用
LAG和LEAD計算每個銷售人員的「月增長率」 - 計算「連續 7 天登入」的使用者,並標記他們的連續登入起始日
- 使用
NTILE(100)計算銷售額的百分位數,找出 Top 10%