1. 简介
在此 Codelab 中,您将创建一个使用 Node.js 编写的 Cloud Run 作业,用于提供视频中每个场景的直观说明。首先,您的作业将使用 Video Intelligence API 检测每次场景变化的时间戳。接下来,您的作业将使用名为 ffmpeg 的第三方二进制文件捕获每个场景切换时间戳的屏幕截图。最后,Vertex AI 可视化字幕用于提供屏幕截图的视觉说明。
此 Codelab 还演示了如何在 Cloud Run 作业中使用 ffmpeg 从视频中捕获给定时间戳的图片。由于 ffmpeg 需要单独安装,因此此 Codelab 会介绍如何创建一个 Dockerfile,以便在 Cloud Run 作业中安装 ffmpeg。
下图说明了 Cloud Run 作业的工作原理:
- 如何使用 Dockerfile 创建容器映像以安装第三方二进制文件
- 如何通过为 Cloud Run 作业创建服务账号来调用其他 Google Cloud 服务来遵循最小权限原则
- 如何通过 Cloud Run 作业使用 Video Intelligence 客户端库
- 如何调用 Google API 以获取 Vertex AI 中每个场景的直观描述
2. 设置和要求
- 您已登录 Cloud 控制台。
- 您之前已部署了 Cloud Run 服务。例如,您可以按照快速入门:部署 Web 服务开始操作。
激活 Cloud Shell
- 在 Cloud Console 中,点击激活 Cloud Shell
如果这是您第一次启动 Cloud Shell,系统会显示一个中间屏幕,说明它是什么。如果您看到中间屏幕,请点击继续。
预配和连接到 Cloud Shell 只需花几分钟时间。
这个虚拟机装有所需的所有开发工具。它提供了一个持久的 5 GB 主目录,并在 Google Cloud 中运行,大大增强了网络性能和身份验证功能。您在此 Codelab 中的大部分(即使不是全部)工作都可以通过浏览器完成。
在连接到 Cloud Shell 后,您应该会看到自己已通过身份验证,并且相关项目已设为您的项目 ID。
- 在 Cloud Shell 中运行以下命令以确认您已通过身份验证:
gcloud auth list
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
- 在 Cloud Shell 中运行以下命令,以确认 gcloud 命令了解您的项目:
gcloud config list project
[core] project = <PROJECT_ID>
gcloud config set project <PROJECT_ID>
Updated property [core/project].
3. 启用 API 并设置环境变量
在开始使用此 Codelab 之前,您需要先启用多个 API。此 Codelab 需要使用以下 API。您可以通过运行以下命令来启用这些 API:
gcloud services enable run.googleapis.com \ storage.googleapis.com \ cloudbuild.googleapis.com \ videointelligence.googleapis.com \ aiplatform.googleapis.com
然后,您可以设置要在整个 Codelab 中使用的环境变量。
REGION=<YOUR-REGION> PROJECT_ID=<YOUR-PROJECT-ID> PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format='value(projectNumber)') JOB_NAME=video-describer-job BUCKET_ID=$PROJECT_ID-video-describer SERVICE_ACCOUNT="cloud-run-job-video" SERVICE_ACCOUNT_ADDRESS=$SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com
4. 创建服务账号
您将为 Cloud Run 作业创建一个服务账号,用于访问 Cloud Storage、Vertex AI 和 Video Intelligence API。
gcloud iam service-accounts create $SERVICE_ACCOUNT \ --display-name="Cloud Run Video Scene Image Describer service account"
然后向服务账号授予对 Cloud Storage 存储分区和 Vertex AI API 的访问权限。
# to view & download storage bucket objects gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \ --role=roles/storage.objectViewer # to call the Vertex AI imagetext model gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \ --role=roles/aiplatform.user
5. 创建 Cloud Storage 存储分区
使用以下命令创建一个 Cloud Storage 存储分区,您可以在其中上传视频以供 Cloud Run 作业进行处理:
gsutil mb -l us-central1 gs://$BUCKET_ID/
[可选] 您可以下载此示例视频,将其下载到本地,加以使用。
gsutil cp gs://cloud-samples-data/video/visionapi.mp4 testvideo.mp4
6. 创建 Cloud Run 作业
首先,为源代码创建一个目录,然后通过 cd 命令进入该目录。
mkdir video-describer-job && cd $_
然后,创建一个包含以下内容的 package.json
{ "name": "video-describer-job", "version": "1.0.0", "private": true, "description": "describes the image in every scene for a given video", "main": "app.js", "author": "Google LLC", "license": "Apache-2.0", "scripts": { "start": "node app.js" }, "dependencies": { "@google-cloud/storage": "^7.7.0", "@google-cloud/video-intelligence": "^5.0.1", "axios": "^1.6.2", "fluent-ffmpeg": "^2.1.2", "google-auth-library": "^9.4.1" } }
为提高可读性,此应用包含多个源文件。首先,创建一个包含以下内容的 app.js
const bucketName = "<YOUR_BUCKET_ID>"; const videoFilename = "<YOUR-VIDEO-FILENAME>"; const { captureImages } = require("./helpers/imageCapture.js"); const { detectSceneChanges } = require("./helpers/sceneDetector.js"); const { getImageCaption } = require("./helpers/imageCaptioning.js"); const storageHelper = require("./helpers/storage.js"); const authHelper = require("./helpers/auth.js"); const fs = require("fs").promises; const path = require("path"); const main = async () => { try { // download the file to locally to the Cloud Run Job instance let localFilename = await storageHelper.downloadVideoFile( bucketName, videoFilename ); // PART 1 - Use Video Intelligence API // detect all the scenes in the video & save timestamps to an array // EXAMPLE OUTPUT // Detected scene changes at the following timestamps: // [1, 7, 11, 12] let timestamps = await detectSceneChanges(localFilename); console.log( "Detected scene changes at the following timestamps: ", timestamps ); // PART 2 - Use ffmpeg via dockerfile install // create an image of each scene change // and save to a local directory called "output" // returns the base filename for the generated images // EXAMPLE OUTPUT // creating screenshot for scene: 1 at output/video-filename-1.png // creating screenshot for scene: 7 at output/video-filename-7.png // creating screenshot for scene: 11 at output/video-filename-11.png // creating screenshot for scene: 12 at output/video-filename-12.png // returns the base filename for the generated images let imageBaseName = await captureImages(localFilename, timestamps); // PART 3a - get Access Token to call Vertex AI APIs via REST // needed for the image captioning // since we're calling the Vertex AI APIs directly let accessToken = await authHelper.getAccessToken(); console.log("got an access token"); // PART 3b - use Image Captioning to describe each scene per screenshot // EXAMPLE OUTPUT /* [ { timestamp: 1, description: "an aerial view of a city with a bridge in the background" }, { timestamp: 7, description: "a man in a blue shirt sits in front of shelves of donuts" }, { timestamp: 11, description: "a black and white photo of people working in a bakery" }, { timestamp: 12, description: "a black and white photo of a man and woman working in a bakery" } ]; */ // instantiate the data structure for storing the scene description and timestamp // e.g. an array of json objects, // [{ timestamp: 5, description: "..." }, ...] let scenes = []; // for each timestamp, send the image to Vertex AI console.log("getting Vertex AI description for each timestamps"); scenes = await Promise.all( timestamps.map(async (timestamp) => { let filepath = path.join( "./output", imageBaseName + "-" + timestamp + ".png" ); // get the base64 encoded image bc sending via REST const encodedFile = await fs.readFile(filepath, "base64"); // send each screenshot to Vertex AI for description let description = await getImageCaption( accessToken, encodedFile ); return { timestamp: timestamp, description: description }; }) ); console.log("finished collecting all the scenes"); console.log(scenes); } catch (error) { //return an error console.error("received error: ", error); } }; // Start script main().catch((err) => { console.error(err); });
接下来,创建 Dockerfile
# Copyright 2020 Google, LLC. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # Use the official lightweight Node.js image. # https://hub.docker.com/_/node FROM node:20.10.0-slim # Create and change to the app directory. WORKDIR /usr/src/app RUN apt-get update && apt-get install -y ffmpeg # Copy application dependency manifests to the container image. # A wildcard is used to ensure both package.json AND package-lock.json are copied. # Copying this separately prevents re-running npm install on every code change. COPY package*.json ./ # Install dependencies. # If you add a package-lock.json speed your build by switching to 'npm ci'. # RUN npm ci --only=production RUN npm install --production # Copy local code to the container image. COPY . . # Run the job on container startup. CMD [ "npm", "start" ]
创建一个名为 .dockerignore
Dockerfile .dockerignore node_modules npm-debug.log
现在,创建一个名为 helpers
的文件夹。此文件夹将包含 5 个帮助程序文件。
mkdir helpers cd helpers
接下来,创建一个包含以下内容的 sceneDetector.js
文件。此文件使用 Video Intelligence API 检测视频中的场景何时发生变化。
const fs = require("fs"); const util = require("util"); const readFile = util.promisify(fs.readFile); const ffmpeg = require("fluent-ffmpeg"); const Video = require("@google-cloud/video-intelligence"); const client = new Video.VideoIntelligenceServiceClient(); module.exports = { detectSceneChanges: async function (downloadedFile) { // Reads a local video file and converts it to base64 const file = await readFile(downloadedFile); const inputContent = file.toString("base64"); // setup request for shot change detection const videoContext = { speechTranscriptionConfig: { languageCode: "en-US", enableAutomaticPunctuation: true } }; const request = { inputContent: inputContent, features: ["SHOT_CHANGE_DETECTION"] }; // Detects camera shot changes const [operation] = await client.annotateVideo(request); console.log("Shot (scene) detection in progress..."); const [operationResult] = await operation.promise(); // Gets shot changes const shotChanges = operationResult.annotationResults[0].shotAnnotations; console.log( "Shot (scene) changes detected: " + shotChanges.length ); // data structure to be returned let sceneChanges = []; // for the initial scene sceneChanges.push(1); // if only one scene, keep at 1 second if (shotChanges.length === 1) { return sceneChanges; } // get length of video const videoLength = await getVideoLength(downloadedFile); shotChanges.forEach((shot, shotIndex) => { if (shot.endTimeOffset === undefined) { shot.endTimeOffset = {}; } if (shot.endTimeOffset.seconds === undefined) { shot.endTimeOffset.seconds = 0; } if (shot.endTimeOffset.nanos === undefined) { shot.endTimeOffset.nanos = 0; } // convert to a number let currentTimestampSecond = Number( shot.endTimeOffset.seconds ); let sceneChangeTime = 0; // double-check no scenes were detected within the last second if (currentTimestampSecond + 1 > videoLength) { sceneChangeTime = currentTimestampSecond; } else { // otherwise, for simplicity, just round up to the next second sceneChangeTime = currentTimestampSecond + 1; } sceneChanges.push(sceneChangeTime); }); return sceneChanges; } }; async function getVideoLength(localFile) { let getLength = util.promisify(ffmpeg.ffprobe); let length = await getLength(localFile); console.log("video length: ", length.format.duration); return length.format.duration; }
现在,创建一个名为 imageCapture.js
且包含以下内容的文件。此文件使用节点软件包 fluent-ffmpeg 从节点应用中运行 ffmpeg 命令。
const ffmpeg = require("fluent-ffmpeg"); const path = require("path"); const util = require("util"); module.exports = { captureImages: async function (localFile, scenes) { let imageBaseName = path.parse(localFile).name; try { for (scene of scenes) { console.log("creating screenshot for scene: ", +scene); await createScreenshot(localFile, imageBaseName, scene); } } catch (error) { console.log("error gathering screenshots: ", error); } console.log("finished gathering the screenshots"); return imageBaseName; // return the base filename for each image } }; async function createScreenshot(localFile, imageBaseName, scene) { return new Promise((resolve, reject) => { ffmpeg(localFile) .screenshots({ timestamps: [scene], filename: `${imageBaseName}-${scene}.png`, folder: "output", size: "320x240" }) .on("error", () => { console.log( "Failed to create scene for timestamp: " + scene ); return reject( "Failed to create scene for timestamp: " + scene ); }) .on("end", () => { return resolve(); }); }); }
最后,创建一个名为 imageCaptioning.js
且包含以下内容的文件。此文件使用 Vertex AI 获取每张场景图片的直观描述。
const axios = require("axios"); const { GoogleAuth } = require("google-auth-library"); const auth = new GoogleAuth({ scopes: "https://www.googleapis.com/auth/cloud-platform" }); module.exports = { getImageCaption: async function (token, encodedFile) { // this example shows you how to call the Vertex REST APIs directly // https://cloud.google.com/vertex-ai/generative-ai/docs/image/image-captioning#get-captions-short // https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/image-captioning let projectId = await auth.getProjectId(); let config = { headers: { "Authorization": "Bearer " + token, "Content-Type": "application/json; charset=utf-8" } }; const json = { "instances": [ { "image": { "bytesBase64Encoded": encodedFile } } ], "parameters": { "sampleCount": 1, "language": "en" } }; let response = await axios.post( "https://us-central1-aiplatform.googleapis.com/v1/projects/" + projectId + "/locations/us-central1/publishers/google/models/imagetext:predict", json, config ); return response.data.predictions[0]; } };
创建名为 auth.js
的文件。 此文件将使用 Google 身份验证客户端库来获取直接调用 Vertex AI 端点所需的访问令牌。
const { GoogleAuth } = require("google-auth-library"); const auth = new GoogleAuth({ scopes: "https://www.googleapis.com/auth/cloud-platform" }); module.exports = { getAccessToken: async function () { return await auth.getAccessToken(); } };
最后,创建一个名为 storage.js
的文件。此文件将使用 Cloud Storage 客户端库从 Cloud Storage 下载视频。
const { Storage } = require("@google-cloud/storage"); module.exports = { downloadVideoFile: async function (bucketName, videoFilename) { // Creates a client const storage = new Storage(); // keep same name locally let localFilename = videoFilename; const options = { destination: localFilename }; // Download the file await storage .bucket(bucketName) .file(videoFilename) .download(options); console.log( `gs://${bucketName}/${videoFilename} downloaded locally to ${localFilename}.` ); return localFilename; } };
7. 部署和执行 Cloud Run 作业
首先,确保您位于此 Codelab 的根目录 video-describer-job
cd .. && pwd
然后,您可以使用此命令部署 Cloud Run 作业。
gcloud run jobs deploy $JOB_NAME --source . --region $REGION
现在,您可以通过运行以下命令执行 Cloud Run 作业:
gcloud run jobs execute $JOB_NAME
作业执行完毕后,您可以运行以下命令来获取日志 URI 的链接。(或者,您也可以使用 Cloud 控制台并直接转到 Cloud Run 作业来查看日志。)
gcloud run jobs executions describe <JOB_EXECUTION_ID>
[{ timestamp: 1, description: 'what is google cloud vision api ? is written on a white background .'}, { timestamp: 3, description: 'a woman wearing a google cloud vision api shirt sits at a table'}, { timestamp: 18, description: 'a person holding a cell phone with the words what is cloud vision api on the bottom' }, ...]
8. 恭喜!
恭喜您完成此 Codelab!
建议您查看有关 Video Intelligence API、Cloud Run 和 Vertex AI 视觉字幕的文档。
9. 清理
为避免产生意外费用(例如,如果此 Cloud Run 作业被意外调用的次数超过免费层级中的每月 Cloud Run 调用次数),您可以删除该 Cloud Run 作业或删除您在第 2 步中创建的项目。
如需删除 Cloud Run 作业,请前往 https://console.cloud.google.com/run/ 前往 Cloud Run Cloud 控制台,然后删除 video-describer-job
函数(如果您使用的是其他名称,则删除 $JOB_NAME)。
如果您选择删除整个项目,可以前往 https://console.cloud.google.com/cloud-resource-manager,选择您在第 2 步中创建的项目,然后选择“删除”。如果删除项目,则需要在 Cloud SDK 中更改项目。您可以通过运行 gcloud projects list