将 Speech-to-Text API 与 Python 搭配使用

1. 概览

借助 Speech-to-Text API，开发者可以通过易用的 API 来运用强大的神经网络模型，将音频转换为超过 125 种语言和语言变体的文本。

在本教程中，您将重点学习如何将 Speech-to-Text API 与 Python 配合使用。

学习内容

如何设置环境
如何转写英语音频文件
如何转写带有字词时间戳的音频文件
如何转写不同语言的音频文件

所需条件

Google Cloud 项目
一个浏览器，例如 Chrome 或 Firefox
熟悉如何使用 Python

调查问卷

您将如何使用本教程？

仅阅读教程内容

阅读并完成练习

您如何评价使用 Python 的体验？

新手水平

中等水平

熟练水平

您如何评价自己在 Google Cloud 服务方面的经验水平？

新手水平

中等水平

熟练水平

2. 设置和要求

自定进度的环境设置

项目名称是此项目参与者的显示名称。它是 Google API 尚未使用的字符串。您可以随时对其进行更新。
项目 ID 在所有 Google Cloud 项目中是唯一的，并且是不可变的（一经设置便无法更改）。Cloud 控制台会自动生成一个唯一字符串；通常情况下，您无需关注该字符串。在大多数 Codelab 中，您都需要引用项目 ID（通常用 PROJECT_ID 标识）。如果您不喜欢生成的 ID，可以再随机生成一个 ID。或者，您也可以尝试自己的项目 ID，看看是否可用。完成此步骤后便无法更改该 ID，并且此 ID 在项目期间会一直保留。
此外，还有第三个值，即部分 API 使用的项目编号，供您参考。如需详细了解所有这三个值，请参阅文档。

接下来，您需要在 Cloud 控制台中启用结算功能，以便使用 Cloud 资源/API。运行此 Codelab 应该不会产生太多的费用（如果有的话）。若要关闭资源以避免产生超出本教程范围的结算费用，您可以删除自己创建的资源或删除项目。Google Cloud 新用户符合参与 300 美元免费试用计划的条件。

启动 Cloud Shell

虽然可以通过笔记本电脑对 Google Cloud 进行远程操作，但在此 Codelab 中，您将使用 Cloud Shell，这是一个在云端运行的命令行环境。

激活 Cloud Shell

在 Cloud Console 中，点击激活 Cloud Shell。

如果您是第一次启动 Cloud Shell，系统会显示一个介绍其功能的过渡页面。如果您看到了过渡页面，请点击继续。

预配和连接到 Cloud Shell 只需花几分钟时间。

这个虚拟机已加载了所需的所有开发工具。它提供了一个持久的 5 GB 主目录，并且在 Google Cloud 中运行，大大增强了网络性能和身份验证。只需使用一个浏览器即可完成本 Codelab 中的大部分工作。

连接到 Cloud Shell 后，您应该会看到自己已通过身份验证，并且相关项目已设置为您的项目 ID。

在 Cloud Shell 中运行以下命令以确认您已通过身份验证：

gcloud auth list

命令输出

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

在 Cloud Shell 中运行以下命令，以确认 gcloud 命令了解您的项目：

gcloud config list project

命令输出

[core]
project = <PROJECT_ID>

如果不是上述结果，您可以使用以下命令进行设置：

gcloud config set project <PROJECT_ID>

命令输出

Updated property [core/project].

3. 环境设置

开始使用 Speech-to-Text API 之前，请在 Cloud Shell 中运行以下命令以启用该 API：

gcloud services enable speech.googleapis.com

您应该会看到与以下类似的内容：

Operation "operations/..." finished successfully.

现在，您可以使用 Speech-to-Text API 了！

前往您的主目录：

cd ~

创建 Python 虚拟环境以隔离依赖项：

virtualenv venv-speech

激活此虚拟环境：

source venv-speech/bin/activate

安装 IPython 和 Speech-to-Text API 客户端库：

pip install ipython google-cloud-speech

您应该会看到与以下类似的内容：

...
Installing collected packages: ..., ipython, google-cloud-speech
Successfully installed ... google-cloud-speech-2.25.1 ...

现在，您可以使用 Speech-to-Text API 客户端库了！

在接下来的步骤中，您将使用在上一步中安装的名为 IPython 的交互式 Python 解释器。在 Cloud Shell 中运行 ipython 以启动会话：

ipython

您应该会看到与以下类似的内容：

Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

您已准备好发出第一个请求…

4. 转写音频文件

在本部分中，您将转录一个英语音频文件。

将以下代码复制到 IPython 会话中：

from google.cloud import speech


def speech_to_text(
    config: speech.RecognitionConfig,
    audio: speech.RecognitionAudio,
) -> speech.RecognizeResponse:
    client = speech.SpeechClient()

    # Synchronous speech recognition request
    response = client.recognize(config=config, audio=audio)

    return response


def print_response(response: speech.RecognizeResponse):
    for result in response.results:
        print_result(result)


def print_result(result: speech.SpeechRecognitionResult):
    best_alternative = result.alternatives[0]
    print("-" * 80)
    print(f"language_code: {result.language_code}")
    print(f"transcript:    {best_alternative.transcript}")
    print(f"confidence:    {best_alternative.confidence:.0%}")

请花点时间研究一下此代码，看看它如何使用 recognize 客户端库方法转写音频文件。config 参数用于指明如何处理请求，audio 参数用于指定要识别的音频数据。

发送请求：

config = speech.RecognitionConfig(
    language_code="en",
)
audio = speech.RecognitionAudio(
    uri="gs://cloud-samples-data/speech/brooklyn_bridge.flac",
)

response = speech_to_text(config, audio)
print_response(response)

您应该会看到以下输出内容：

--------------------------------------------------------------------------------
language_code: en-us
transcript:    how old is the Brooklyn Bridge
confidence:    98%

更新配置以启用自动加注标点符号功能，然后发送新请求：

config.enable_automatic_punctuation = True

response = speech_to_text(config, audio)
print_response(response)

您应该会看到以下输出内容：

--------------------------------------------------------------------------------
language_code: en-us
transcript:    How old is the Brooklyn Bridge?
confidence:    98%

摘要

在此步骤中，您能够使用不同的参数转写英语音频文件，并打印出结果。您可以详细了解如何转写音频文件。

5. 获取字词时间戳

Speech-to-Text 可以检测转写音频的时间偏移值（时间戳）。时间偏移值能显示所提供音频中每个语音内容的开始时间和结束时间。时间偏移值表示从音频开头起已经过的时间长度，以 100 毫秒为增量。

如需转写包含字词时间戳的音频文件，请将以下代码复制到 IPython 会话中，以更新您的代码：

def print_result(result: speech.SpeechRecognitionResult):
    best_alternative = result.alternatives[0]
    print("-" * 80)
    print(f"language_code: {result.language_code}")
    print(f"transcript:    {best_alternative.transcript}")
    print(f"confidence:    {best_alternative.confidence:.0%}")
    print("-" * 80)
    for word in best_alternative.words:
        start_s = word.start_time.total_seconds()
        end_s = word.end_time.total_seconds()
        print(f"{start_s:>7.3f} | {end_s:>7.3f} | {word.word}")

请花点时间研究一下代码，看看它是如何转写带有字词时间戳的音频文件的。enable_word_time_offsets 参数用于告知 API 返回每个字词的时间偏移量（如需了解详情，请参阅文档）。

发送请求：

config = speech.RecognitionConfig(
    language_code="en",
    enable_automatic_punctuation=True,
    enable_word_time_offsets=True,
)
audio = speech.RecognitionAudio(
    uri="gs://cloud-samples-data/speech/brooklyn_bridge.flac",
)

response = speech_to_text(config, audio)
print_response(response)

您应该会看到以下输出内容：

--------------------------------------------------------------------------------
language_code: en-us
transcript:    How old is the Brooklyn Bridge?
confidence:    98%
--------------------------------------------------------------------------------
  0.000 |   0.300 | How
  0.300 |   0.600 | old
  0.600 |   0.800 | is
  0.800 |   0.900 | the
  0.900 |   1.100 | Brooklyn
  1.100 |   1.400 | Bridge?

摘要

在此步骤中，您能够转录包含字词时间戳的英语音频文件，并打印结果。详细了解如何获取字词时间戳。

6. 转写不同语言

Speech-to-Text API 可识别超过 125 种语言和语言变体！您可以点击此处查看支持的语言列表。

在本部分中，您将转录一个法语音频文件。

如需转写法语音频文件，请将以下代码复制到 IPython 会话中，以更新您的代码：

config = speech.RecognitionConfig(
    language_code="fr-FR",
    enable_automatic_punctuation=True,
    enable_word_time_offsets=True,
)
audio = speech.RecognitionAudio(
    uri="gs://cloud-samples-data/speech/corbeau_renard.flac",
)

response = speech_to_text(config, audio)
print_response(response)

您应该会看到以下输出内容：

--------------------------------------------------------------------------------
language_code: fr-fr
transcript:    Maître corbeau sur un arbre perché Tenait dans son bec un fromage maître Renard par l'odeur alléché lui tint à peu près ce langage et bonjour monsieur du corbeau.
confidence:    94%
--------------------------------------------------------------------------------
  0.000 |   0.700 | Maître
  0.700 |   1.100 | corbeau
  1.100 |   1.300 | sur
  1.300 |   1.600 | un
  1.600 |   1.700 | arbre
  1.700 |   2.000 | perché
  2.000 |   3.000 | Tenait
  3.000 |   3.000 | dans
  3.000 |   3.200 | son
  3.200 |   3.500 | bec
  3.500 |   3.700 | un
  3.700 |   3.800 | fromage
...
 10.800 |  11.800 | monsieur
 11.800 |  11.900 | du
 11.900 |  12.100 | corbeau.

摘要

在此步骤中，您能够转录法语音频文件并打印结果。您可以详细了解支持的语言。

7. 恭喜！

您学习了如何使用 Python 中的 Speech-to-Text API 对音频文件执行不同类型的转写！

清理

如需清理开发环境，请在 Cloud Shell 中执行以下操作：

如果您仍在 IPython 会话中，请返回到 shell：exit
停止使用 Python 虚拟环境：deactivate
删除虚拟环境文件夹：cd ~ ; rm -rf ./venv-speech

如需删除 Google Cloud 项目，请在 Cloud Shell 中执行以下操作：

检索当前项目 ID：PROJECT_ID=$(gcloud config get-value core/project)
请确保这是您要删除的项目：echo $PROJECT_ID
删除项目：gcloud projects delete $PROJECT_ID

了解详情

在浏览器中测试演示：https://cloud.google.com/speech-to-text
Speech-to-Text 文档：https://cloud.google.com/speech-to-text/docs
Google Cloud 上的 Python：https://cloud.google.com/python
Python 版 Cloud 客户端库：https://github.com/googleapis/google-cloud-python

许可

此作品已获得 Creative Commons Attribution 2.0 通用许可授权。

将 Speech-to-Text API 与 Python 搭配使用 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

1. 概览

学习内容

所需条件

调查问卷

您将如何使用本教程？

您如何评价使用 Python 的体验？

您如何评价自己在 Google Cloud 服务方面的经验水平？

2. 设置和要求

自定进度的环境设置

启动 Cloud Shell

激活 Cloud Shell

3. 环境设置

4. 转写音频文件

摘要

5. 获取字词时间戳

摘要

6. 转写不同语言

摘要

7. 恭喜！

清理

了解详情

许可

将 Speech-to-Text API 与 Python 搭配使用