跳至主要內容
Skip to content

視窗函數實戰

視窗函數是 SQL 中最強大的分析工具。它能在不合併列的情況下,對「視窗」內的資料進行計算。本篇將完整介紹視窗函數的各種應用。


一、 視窗函數 vs 聚合函數

特性聚合函數 (GROUP BY)視窗函數 (OVER)
結果列數合併成一列保留所有列
使用方式SUM(amount)SUM(amount) OVER (...)
可存取原始資料
sql
-- 聚合函數:每個部門一列
SELECT department, SUM(salary)
FROM employees
GROUP BY department;

-- 視窗函數:每個員工一列,但附帶部門總薪資
SELECT name, department, salary,
       SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;

二、 基本語法

sql
函數名稱(...) OVER (
    [PARTITION BY 分組欄位]
    [ORDER BY 排序欄位]
    [ROWS/RANGE 框架定義]
)

範例資料

sql
CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    salesperson TEXT,
    region TEXT,
    amount NUMERIC(10, 2),
    sale_date DATE
);

INSERT INTO sales (salesperson, region, amount, sale_date) VALUES
    ('Alice', 'North', 1000, '2024-01-15'),
    ('Alice', 'North', 1500, '2024-01-20'),
    ('Alice', 'North', 800, '2024-02-10'),
    ('Bob', 'North', 1200, '2024-01-18'),
    ('Bob', 'South', 900, '2024-02-05'),
    ('Charlie', 'South', 1100, '2024-01-25'),
    ('Charlie', 'South', 1300, '2024-02-15');

三、 排名函數

ROW_NUMBER()

為每列分配唯一的序號:

sql
SELECT
    salesperson,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) AS rank
FROM sales;
salespersonamountrank
Alice15001
Charlie13002
Bob12003
Charlie11004
.........

PARTITION BY:分組排名

sql
-- 每個區域內的銷售排名
SELECT
    salesperson,
    region,
    amount,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS region_rank
FROM sales;
salespersonregionamountregion_rank
AliceNorth15001
BobNorth12002
AliceNorth10003
CharlieSouth13001
CharlieSouth11002
BobSouth9003

RANK() 與 DENSE_RANK()

當有相同值時的處理方式:

sql
SELECT
    salesperson,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num,
    RANK() OVER (ORDER BY amount DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank
FROM sales;

假設有兩筆 1500:

amountrow_numrankdense_rank
1500111
1500211
1300332
  • ROW_NUMBER:永遠唯一,相同值也給不同序號
  • RANK:相同值同名次,下一個跳過(1, 1, 3)
  • DENSE_RANK:相同值同名次,下一個連續(1, 1, 2)

NTILE():分組

將結果平均分成 N 組:

sql
-- 將銷售額分成 4 組(四分位數)
SELECT
    salesperson,
    amount,
    NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales;

四、 存取其他列

LAG() 與 LEAD()

存取當前列的前/後列:

sql
SELECT
    salesperson,
    sale_date,
    amount,
    LAG(amount) OVER (ORDER BY sale_date) AS prev_amount,
    LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales;
sale_dateamountprev_amountnext_amount
2024-01-151000NULL1200
2024-01-18120010001500
2024-01-20150012001100

指定偏移量和預設值

sql
LAG(amount, 2, 0) OVER (...)  -- 往前 2 列,若無則返回 0

計算變化量

sql
SELECT
    salesperson,
    sale_date,
    amount,
    amount - LAG(amount) OVER (ORDER BY sale_date) AS diff,
    ROUND(
        (amount - LAG(amount) OVER (ORDER BY sale_date)) /
        LAG(amount) OVER (ORDER BY sale_date) * 100, 2
    ) AS pct_change
FROM sales;

FIRST_VALUE() 與 LAST_VALUE()

sql
SELECT
    salesperson,
    amount,
    FIRST_VALUE(amount) OVER (ORDER BY sale_date) AS first_sale,
    LAST_VALUE(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_sale
FROM sales;

> **LAST_VALUE 的陷阱**

預設框架只到當前列,所以 LAST_VALUE 會返回當前列的值。需要明確指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

NTH_VALUE()

取得第 N 列的值:

sql
SELECT
    salesperson,
    amount,
    NTH_VALUE(amount, 2) OVER (ORDER BY amount DESC) AS second_highest
FROM sales;

五、 聚合視窗函數

標準聚合函數加上 OVER 就變成視窗函數:

sql
SELECT
    salesperson,
    region,
    amount,
    SUM(amount) OVER () AS total,                               -- 全部總和
    SUM(amount) OVER (PARTITION BY region) AS region_total,     -- 區域總和
    AVG(amount) OVER (PARTITION BY salesperson) AS person_avg   -- 個人平均
FROM sales;

累計計算

sql
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total,
    AVG(amount) OVER (ORDER BY sale_date) AS running_avg,
    COUNT(*) OVER (ORDER BY sale_date) AS running_count
FROM sales;
sale_dateamountrunning_totalrunning_avg
2024-01-15100010001000.00
2024-01-18120022001100.00
2024-01-20150037001233.33

六、 框架定義 (Frame)

框架定義了視窗函數計算時包含哪些列。

語法

sql
ROWS BETWEEN 起點 AND 終點
RANGE BETWEEN 起點 AND 終點

可用的起點/終點

  • UNBOUNDED PRECEDING:從第一列開始
  • n PRECEDING:往前 n 列
  • CURRENT ROW:當前列
  • n FOLLOWING:往後 n 列
  • UNBOUNDED FOLLOWING:到最後一列

ROWS vs RANGE

sql
-- ROWS:以「列數」計算
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- 當前列 + 前 2 列

-- RANGE:以「值」計算
SUM(amount) OVER (ORDER BY sale_date RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW)
-- 當前日期 ~ 7 天前的所有列

常見框架

sql
-- 移動平均(近 3 筆)
AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)

-- 累計到目前為止
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- 整個分區
SUM(amount) OVER (PARTITION BY region)
-- 等同於
SUM(amount) OVER (PARTITION BY region ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

預設框架

情況預設框架
沒有 ORDER BY整個分區
有 ORDER BYRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

七、 使用 WINDOW 子句

當多個視窗函數使用相同定義時,可以命名重複使用:

sql
SELECT
    salesperson,
    amount,
    ROW_NUMBER() OVER w AS row_num,
    SUM(amount) OVER w AS running_total,
    AVG(amount) OVER w AS running_avg
FROM sales
WINDOW w AS (PARTITION BY region ORDER BY sale_date);

八、 實戰範例

每組前 3 名

sql
WITH ranked AS (
    SELECT
        salesperson,
        region,
        amount,
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
    FROM sales
)
SELECT * FROM ranked WHERE rn <= 3;

計算佔比

sql
SELECT
    salesperson,
    region,
    amount,
    ROUND(amount / SUM(amount) OVER (PARTITION BY region) * 100, 2) AS pct_of_region,
    ROUND(amount / SUM(amount) OVER () * 100, 2) AS pct_of_total
FROM sales;

連續天數計算

sql
WITH numbered AS (
    SELECT
        user_id,
        login_date,
        login_date - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date)::INTEGER AS grp
    FROM logins
)
SELECT user_id, MIN(login_date), MAX(login_date), COUNT(*) AS consecutive_days
FROM numbered
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

同比/環比

sql
WITH monthly AS (
    SELECT
        DATE_TRUNC('month', sale_date) AS month,
        SUM(amount) AS total
    FROM sales
    GROUP BY 1
)
SELECT
    month,
    total,
    LAG(total) OVER (ORDER BY month) AS prev_month,
    LAG(total, 12) OVER (ORDER BY month) AS same_month_last_year,
    ROUND((total - LAG(total) OVER (ORDER BY month)) / LAG(total) OVER (ORDER BY month) * 100, 2) AS mom_pct
FROM monthly;

九、 視窗函數總覽

函數用途
ROW_NUMBER()唯一序號
RANK()有間隔的排名
DENSE_RANK()無間隔的排名
NTILE(n)分成 n 組
LAG(col, n)取前 n 列的值
LEAD(col, n)取後 n 列的值
FIRST_VALUE(col)視窗內第一個值
LAST_VALUE(col)視窗內最後一個值
NTH_VALUE(col, n)視窗內第 n 個值
SUM/AVG/COUNT OVER視窗內的聚合

總結

  • 排名:ROW_NUMBER, RANK, DENSE_RANK, NTILE
  • 存取其他列:LAG, LEAD, FIRST_VALUE, LAST_VALUE
  • 聚合視窗:SUM, AVG, COUNT 加上 OVER
  • 框架:ROWS/RANGE BETWEEN 定義計算範圍
  • 命名視窗:WINDOW 子句重複使用定義

視窗函數是寫複雜報表和分析查詢的利器,務必熟練掌握!


進階挑戰

  1. 使用 LAGLEAD 計算每個銷售人員的「月增長率」
  2. 計算「連續 7 天登入」的使用者,並標記他們的連續登入起始日
  3. 使用 NTILE(100) 計算銷售額的百分位數,找出 Top 10%

延伸閱讀與資源


← 上一章:子查詢與 CTE | 返回專題首頁 | 下一章:索引原理與策略 →