文字轉語音 (Speech Synthesis)

在現代網頁應用中，無障礙功能（Accessibility）越來越受重視。文字轉語音（TTS，Text-to-Speech）技術讓視障使用者也能「聆聽」網頁內容，同時也能用於語言學習、兒童教育等多元場景。

本篇將探討如何將瀏覽器原生的 Web Speech API 封裝成一個功能完整、類型安全的 Vue 3 組合函式（Composable）。

一、 Web Speech API 簡介

Web Speech API（網頁語音 API）是瀏覽器提供的原生語音處理能力，分為兩大部分：

SpeechSynthesis（語音合成）：將文字轉換為語音輸出
SpeechRecognition（語音辨識）：將語音轉換為文字

本篇聚焦於 SpeechSynthesis，它透過 window.speechSynthesis 提供控制介面，讓我們能夠：

播放、暫停、繼續、取消語音
選擇不同語音（聲音）
調整語速、音調、音量

二、 SpeechSynthesis 核心概念

2.1 關鍵物件

物件	說明
`SpeechSynthesis`	語音合成控制器，透過 `window.speechSynthesis` 取得
`SpeechSynthesisUtterance`	語音片段物件，包含要朗讀的文字與相關設定
`SpeechSynthesisVoice`	語音選項，代表可用的聲音（如中文、英文、男聲、女聲）

2.2 核心方法

typescript

// 取得語音合成控制器
const synth = window.speechSynthesis;

// 主要方法
synth.speak(utterance); // 開始朗讀
synth.pause(); // 暫停
synth.resume(); // 繼續
synth.cancel(); // 取消所有語音

// 狀態屬性
synth.speaking; // 是否正在說話
synth.paused; // 是否已暫停
synth.pending; // 佇列中是否有待處理語音

2.3 Utterance 設定選項

typescript

const utterance = new SpeechSynthesisUtterance("你好，世界！");

utterance.lang = "zh-TW"; // 語言
utterance.voice = voice; // 指定語音
utterance.rate = 1; // 語速 (0.1 ~ 10)
utterance.pitch = 1; // 音調 (0 ~ 2)
utterance.volume = 1; // 音量 (0 ~ 1)

三、設計 Composable 介面

在封裝 Composable 前，我們先定義清楚的 TypeScript 介面：

typescript

// types/useTTS.ts
import type { Ref, ComputedRef } from "vue";

export interface TTSOptions {
  lang?: string; // 語言代碼，如 'zh-TW'
  rate?: number; // 語速 (0.1 ~ 10)
  pitch?: number; // 音調 (0 ~ 2)
  volume?: number; // 音量 (0 ~ 1)
  voiceName?: string; // 指定語音名稱
}

export interface UseTTSReturn {
  // 狀態
  isSupported: ComputedRef<boolean>;
  isSpeaking: Ref<boolean>;
  isPaused: Ref<boolean>;
  voices: Ref<SpeechSynthesisVoice[]>;
  currentVoice: Ref<SpeechSynthesisVoice | null>;

  // 方法
  speak: (text: string, options?: TTSOptions) => void;
  pause: () => void;
  resume: () => void;
  stop: () => void;
  setVoice: (voice: SpeechSynthesisVoice) => void;
}

四、實作 useTTS Composable

4.1 完整實作

typescript

// composables/useTTS.ts
import { ref, computed, onMounted, onUnmounted } from "vue";
import type { TTSOptions, UseTTSReturn } from "@/types/useTTS";

export function useTTS(defaultOptions: TTSOptions = {}): UseTTSReturn {
  // 瀏覽器支援檢測
  const synth = typeof window !== "undefined" ? window.speechSynthesis : null;
  const isSupported = computed(() => synth !== null);

  // 響應式狀態
  const isSpeaking = ref(false);
  const isPaused = ref(false);
  const voices = ref<SpeechSynthesisVoice[]>([]);
  const currentVoice = ref<SpeechSynthesisVoice | null>(null);

  // 當前的 Utterance 實例（用於事件綁定）
  let currentUtterance: SpeechSynthesisUtterance | null = null;

  // 載入可用語音
  function loadVoices() {
    if (!synth) return;

    const availableVoices = synth.getVoices();
    voices.value = availableVoices;

    // 自動選擇預設語音
    if (!currentVoice.value && availableVoices.length > 0) {
      // 優先選擇符合語言的語音
      const preferredVoice = defaultOptions.voiceName
        ? availableVoices.find((v) => v.name === defaultOptions.voiceName)
        : defaultOptions.lang
          ? availableVoices.find((v) => v.lang.startsWith(defaultOptions.lang!))
          : availableVoices.find((v) => v.default);

      currentVoice.value = preferredVoice || availableVoices[0];
    }
  }

  // 朗讀文字
  function speak(text: string, options: TTSOptions = {}) {
    if (!synth || !text.trim()) return;

    // 取消之前的語音
    synth.cancel();

    const utterance = new SpeechSynthesisUtterance(text);

    // 合併預設選項與當前選項
    const mergedOptions = { ...defaultOptions, ...options };

    if (mergedOptions.lang) utterance.lang = mergedOptions.lang;
    if (mergedOptions.rate) utterance.rate = mergedOptions.rate;
    if (mergedOptions.pitch) utterance.pitch = mergedOptions.pitch;
    if (mergedOptions.volume) utterance.volume = mergedOptions.volume;
    if (currentVoice.value) utterance.voice = currentVoice.value;

    // 事件處理
    utterance.onstart = () => {
      isSpeaking.value = true;
      isPaused.value = false;
    };

    utterance.onend = () => {
      isSpeaking.value = false;
      isPaused.value = false;
    };

    utterance.onerror = (event) => {
      console.error("TTS Error:", event.error);
      isSpeaking.value = false;
      isPaused.value = false;
    };

    utterance.onpause = () => {
      isPaused.value = true;
    };

    utterance.onresume = () => {
      isPaused.value = false;
    };

    currentUtterance = utterance;
    synth.speak(utterance);
  }

  // 暫停
  function pause() {
    if (!synth) return;
    synth.pause();
  }

  // 繼續
  function resume() {
    if (!synth) return;
    synth.resume();
  }

  // 停止
  function stop() {
    if (!synth) return;
    synth.cancel();
    isSpeaking.value = false;
    isPaused.value = false;
  }

  // 設定語音
  function setVoice(voice: SpeechSynthesisVoice) {
    currentVoice.value = voice;
  }

  // 生命週期
  onMounted(() => {
    if (!synth) return;

    // 首次載入語音列表
    loadVoices();

    // 監聽語音列表變化（某些瀏覽器會異步載入）
    synth.addEventListener("voiceschanged", loadVoices);
  });

  onUnmounted(() => {
    if (!synth) return;

    // 清理：取消未完成的語音
    synth.cancel();
    synth.removeEventListener("voiceschanged", loadVoices);
  });

  return {
    isSupported,
    isSpeaking,
    isPaused,
    voices,
    currentVoice,
    speak,
    pause,
    resume,
    stop,
    setVoice,
  };
}

4.2 狀態流程圖

五、元件中使用

5.1 基本用法

vue

<script setup lang="ts">
import { ref } from "vue";
import { useTTS } from "@/composables/useTTS";

const text = ref("你好，這是一段測試文字。");
const { isSupported, isSpeaking, speak, stop } = useTTS({
  lang: "zh-TW",
  rate: 1,
});
</script>

<template>
  <div v-if="isSupported">
    <textarea v-model="text" rows="4" />
    <div class="controls">
      <button @click="speak(text)" :disabled="isSpeaking">播放</button>
      <button @click="stop" :disabled="!isSpeaking">停止</button>
    </div>
  </div>
  <div v-else>您的瀏覽器不支援語音合成功能</div>
</template>

5.2 進階用法：語音選擇器

vue

<script setup lang="ts">
import { computed } from "vue";
import { useTTS } from "@/composables/useTTS";

const { voices, currentVoice, setVoice, speak, isSpeaking } = useTTS();

// 過濾出中文語音
const chineseVoices = computed(() =>
  voices.value.filter((v) => v.lang.startsWith("zh")),
);

function onVoiceChange(event: Event) {
  const select = event.target as HTMLSelectElement;
  const voice = voices.value.find((v) => v.name === select.value);
  if (voice) setVoice(voice);
}
</script>

<template>
  <div>
    <select @change="onVoiceChange">
      <option
        v-for="voice in chineseVoices"
        :key="voice.name"
        :value="voice.name"
        :selected="voice.name === currentVoice?.name"
      >
        {{ voice.name }} ({{ voice.lang }})
      </option>
    </select>

    <button @click="speak('測試語音')">試聽</button>
  </div>
</template>

六、技術實務與相容性

6.1 瀏覽器支援狀況

Web Speech API 的語音合成部分已獲得廣泛支援：

瀏覽器	支援狀況
Chrome	完整支援
Edge	完整支援
Safari	完整支援
Firefox	支援（語音選擇較少）
iOS Safari	需使用者互動觸發

WARNING

iOS Safari 限制：在 iOS 裝置上，speechSynthesis.speak() 必須由使用者主動觸發（如點擊事件）。在 onMounted 或自動播放的情境下會失敗。

6.2 相容性探測

typescript

// 完整的支援檢測
function checkTTSSupport(): {
  supported: boolean;
  hasVoices: boolean;
  issues: string[];
} {
  const issues: string[] = [];

  if (typeof window === "undefined") {
    return { supported: false, hasVoices: false, issues: ["SSR 環境"] };
  }

  if (!window.speechSynthesis) {
    issues.push("瀏覽器不支援 SpeechSynthesis");
    return { supported: false, hasVoices: false, issues };
  }

  const voices = window.speechSynthesis.getVoices();
  if (voices.length === 0) {
    issues.push("語音列表尚未載入（可能需等待 voiceschanged 事件）");
  }

  return {
    supported: true,
    hasVoices: voices.length > 0,
    issues,
  };
}

6.3 錯誤處理

typescript

utterance.onerror = (event) => {
  switch (event.error) {
    case "canceled":
      console.log("語音被取消");
      break;
    case "interrupted":
      console.log("語音被中斷");
      break;
    case "audio-busy":
      console.error("音訊裝置忙碌中");
      break;
    case "network":
      console.error("網路錯誤（某些語音需要網路）");
      break;
    case "synthesis-unavailable":
      console.error("語音合成服務不可用");
      break;
    case "not-allowed":
      console.error("權限被拒絕（可能需要使用者互動）");
      break;
    default:
      console.error("未知錯誤:", event.error);
  }
};

6.4 行動端注意事項

typescript

// iOS 需要使用者互動的解決方案
function initTTSWithUserGesture() {
  // 在首次使用者點擊時「喚醒」TTS
  const utterance = new SpeechSynthesisUtterance("");
  window.speechSynthesis.speak(utterance);
  window.speechSynthesis.cancel();
}

// 在按鈕點擊時呼叫
button.addEventListener(
  "click",
  () => {
    initTTSWithUserGesture();
    // 之後的 speak() 就能正常運作
  },
  { once: true },
);

總結

概念	說明
`SpeechSynthesis`	瀏覽器語音合成控制器
`SpeechSynthesisUtterance`	語音片段物件，設定文字與參數
`getVoices()`	取得可用語音列表
`voiceschanged`	語音列表載入完成事件
Composable 封裝	提供響應式狀態與類型安全

TIP

最佳實踐：

務必處理 voiceschanged 事件，因為語音列表可能異步載入
在 onUnmounted 中呼叫 cancel() 避免記憶體洩漏
針對 iOS 提供使用者互動的備案

進階挑戰

實作一個帶有進度指示的 TTS 元件，顯示當前朗讀到哪個字
結合 SpeechRecognition API，實作語音對話機器人
建立一個支援多語言自動切換的朗讀系統

文字轉語音 (Speech Synthesis) ​

一、 Web Speech API 簡介 ​

二、 SpeechSynthesis 核心概念 ​

2.1 關鍵物件 ​

2.2 核心方法 ​

2.3 Utterance 設定選項 ​

三、 設計 Composable 介面 ​

四、 實作 useTTS Composable ​

4.1 完整實作 ​

4.2 狀態流程圖 ​

五、 元件中使用 ​

5.1 基本用法 ​

5.2 進階用法：語音選擇器 ​

六、 技術實務與相容性 ​

6.1 瀏覽器支援狀況 ​

6.2 相容性探測 ​

6.3 錯誤處理 ​

6.4 行動端注意事項 ​

總結 ​

進階挑戰 ​

延伸閱讀與資源 ​

文字轉語音 (Speech Synthesis)

一、 Web Speech API 簡介

二、 SpeechSynthesis 核心概念

2.1 關鍵物件

2.2 核心方法

2.3 Utterance 設定選項

三、設計 Composable 介面

四、實作 useTTS Composable

4.1 完整實作

4.2 狀態流程圖

五、元件中使用

5.1 基本用法

5.2 進階用法：語音選擇器

六、技術實務與相容性

6.1 瀏覽器支援狀況

6.2 相容性探測

6.3 錯誤處理

6.4 行動端注意事項

總結

進階挑戰

延伸閱讀與資源