Phân tích video trên YouTube để tiếp thị bằng Gemini

Còn 6 phút

Thông tin về lớp học lập trình này

Lần cập nhật gần đây nhất: thg 4 3, 2025

Tác giả: Jisub Lee, Kyungjune Shin

Trang này được dịch bởi Cloud Translation API.

1. Giới thiệu

Lần cập nhật gần đây nhất: 12/3/2025

Tuyên bố từ chối trách nhiệm

Đây là mã mẫu phân tích video bằng YouTube Data API và Gemini. Người dùng chịu trách nhiệm về việc sử dụng. Bạn nên cân nhắc kỹ lưỡng khi sử dụng mã này trong môi trường thực tế. Tác giả không chịu trách nhiệm về mọi vấn đề phát sinh từ việc sử dụng mã này. Ngoài ra, do bản chất của trí tuệ nhân tạo, luôn có khả năng kết quả có thể khác với thực tế. Do đó, bạn không nên tin tưởng một cách mù quáng vào kết quả và cần xem xét kỹ lưỡng.

Mục tiêu của dự án này

Mục tiêu chính là xác định những video và nhà sáng tạo trên YouTube phù hợp để quảng bá thương hiệu bằng cách phân tích nội dung video và cảm xúc.

Tổng quan

Dự án này tận dụng API Dữ liệu của YouTube để tìm nạp thông tin video và API Vertex AI của GCP với mô hình Gemini để phân tích nội dung video. Công cụ này chạy trên Colab của Google.

Bạn có thể dán các mã sẽ xuất hiện trong tương lai vào Colab và chạy từng mã một.

Kiến thức bạn sẽ học được

Cách sử dụng YouTube Data API để tìm nạp thông tin video.
Cách sử dụng API Vertex AI của GCP với mô hình Gemini để phân tích nội dung video.
Cách sử dụng Google Colab để chạy mã.
Cách tạo bảng tính từ dữ liệu đã phân tích.

Bạn cần có

Để triển khai giải pháp này, bạn cần có:

Một dự án trên Google Cloud Platform.
Bật YouTube Data API v3, Vertex AI API, API Ngôn ngữ tạo sinh, Google Drive API và Google Trang tính API trên dự án.
Tạo khoá API trong thẻ thông tin xác thực có quyền truy cập vào YouTube Data API phiên bản 3.

Giải pháp này sử dụng YouTube Data API và API Vertex AI của GCP.

2. Mã và nội dung giải thích

Điều đầu tiên chúng ta cần làm là nhập các thư viện mà chúng ta muốn sử dụng. Sau đó, hãy đăng nhập bằng Tài khoản Google của bạn và cấp quyền truy cập vào Google Drive.

# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth

# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig

# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta

auth.authenticate_user()

[Action Required]

KHÓA API và MÃ DỰ ÁN từ GCP là những giá trị thường cần thay đổi. Các ô bên dưới là dành cho giá trị cài đặt GCP.

# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}

[Action Required]

Vui lòng thay đổi các giá trị biến trong khi kiểm tra mã bên dưới Input (Dữ liệu đầu vào).

Lấy thương hiệu "Google" làm ví dụ, bài viết này sẽ hướng dẫn cách tìm kiếm video trên YouTube về một chủ đề cụ thể (ví dụ: "AI của Google") trong khi loại trừ video trên kênh của chính thương hiệu đó.

Biến đầu vào để phân tích video trên YouTube

BRAND_NAME (Bắt buộc): Tên thương hiệu để phân tích (ví dụ: Google).
MY_COMPANY_INFO (Bắt buộc): Nội dung mô tả ngắn và bối cảnh về thương hiệu.
SEARCH_QUERY (Bắt buộc): Cụm từ tìm kiếm cho video trên YouTube (ví dụ: AI của Google).
VIEWER_COUNTRY: Mã quốc gia của người xem (mã quốc gia gồm hai chữ cái: ISO 3166-1 alpha-2) (ví dụ: KR).
GENERATION_LANGUAGE (Bắt buộc): Ngôn ngữ cho kết quả của Gemini (ví dụ: Tiếng Hàn).
EXCEPT_CHANNEL_IDS: Mã nhận dạng kênh được phân tách bằng dấu phẩy để loại trừ.

Bạn có thể tìm thấy mã nhận dạng kênh trên kênh YouTube.

VIDEO_TOPIC: Mã chủ đề trên YouTube để tinh chỉnh.

Bạn có thể tìm thấy giá trị chủ đề video trong phần Tìm kiếm: danh sách | API Dữ liệu YouTube | Google dành cho nhà phát triển.

DATE_INPUT (Bắt buộc): Ngày bắt đầu của video đã xuất bản (YYYY-MM-DD).

# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}

# Auth Scope
SCOPE = [
    'https://www.googleapis.com/auth/youtube.readonly',
    'https://www.googleapis.com/auth/spreadsheets',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/cloud-platform'
]

# validation check
if not SEARCH_QUERY or not DATE_INPUT:
  raise ValueError("Search query and date input are required.")

EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]

Văn bản được cung cấp liệt kê các hàm chính liên quan đến việc tương tác với API Dữ liệu của YouTube.

# YouTube API function

def get_youtube_videos(q, viewer_country_code, topic_str, start_period):

    page_token_number = 1
    next_page_token = ''
    merged_array = []

    published_after_date = f"{start_period}T00:00:00Z"

    while page_token_number < 9 and len(merged_array) <= 75:
        result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
        merged_array = list(set(merged_array + result['items']))
        next_page_token = result['nextPageToken']
        page_token_number += 1

    return merged_array

def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):

    if not query:
        return None

    q = query

    url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}&regionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'

    response = requests.get(url)
    data = response.json()
    results = data.get('items', [])
    next_page_token = data.get('nextPageToken', '')
    return_results = [item['id']['videoId'] for item in results]

    print(url)

    return {
        "nextPageToken": next_page_token,
        "items": return_results
    }

def get_date_string(days_ago):

    date = datetime.now() + timedelta(days=days_ago)
    return date.strftime('%Y-%m-%dT00:00:00Z')

def get_video_details(video_id):

    url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
    response = requests.get(url)
    data = response.json()

    if data.get('items'):
        video = data['items'][0]
        snippet = video['snippet']
        content_details = video['contentDetails']

        title = snippet.get('title', 'no title')
        description = snippet.get('description', 'no description')
        duration_iso = content_details.get('duration', None)
        channel_id = snippet.get('channelId', 'no channel id')
        channel_title = snippet.get('channelTitle', 'no channel title')
        return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
    else:
        return None

def duration_to_seconds(duration_str):
  match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
  if not match:
    return None

  hours, minutes, seconds = match.groups()

  total_seconds = 0
  if hours:
    total_seconds += int(hours) * 3600
  if minutes:
    total_seconds += int(minutes) * 60
  if seconds:
    total_seconds += int(seconds)

  return total_seconds

Văn bản này cung cấp một mẫu lời nhắc có thể điều chỉnh khi cần, cùng với các hàm chính để tương tác với API Vertex AI của GCP.

# GCP Vertex AI API

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models

def request_gemini(prompt, video_link):
  video_extraction_json_generation_config = GenerateContentConfig(
    temperature=0.0,
    max_output_tokens=2048,
  )

  contents = [
      Part.from_uri(
          file_uri=video_link,
          mime_type="video/mp4",
      ),
      prompt
  ]

  response = model.generate_content(
      model=LANGUAGE_MODEL,
      contents=contents,
      config=video_extraction_json_generation_config
  )

  try:
    return response.text
  except:
    return response.GenerateContentResponse

def create_prompt(yt_title, yt_description, yt_link):
  return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.

### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.

1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.

### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}

Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}

### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)

### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.

### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""

def parse_response(response: str):
  brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
  brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
  brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
  video_content_summary_pattern = r"video_content_summary:\s*(.*)"
  video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
  opinion_pattern = r"opinion:\s*(.*)"
  brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
  brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
  brand_positive_score_match = re.search( brand_positive_score_pattern, response )
  brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
  brand_negative_score_match = re.search( brand_negative_score_pattern, response )
  brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
  video_content_score_match = re.search( video_content_summary_pattern, response )
  video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
  video_brand_summary_match = re.search( video_brand_summary_pattern, response )
  video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
  opinion_match = re.search( opinion_pattern, response )
  opinion = ( opinion_match.group(1) if opinion_match else '' )
  return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)

def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
  retries = 0
  while retries <= max_retries:
    try:
      response = request_gemini(prompt, youtube_link)
      ( brand_relevance_score,
        brand_positive_score,
        brand_negative_score,
        video_content_summary,
        video_brand_summary,
        opinion) = parse_response(response)
      if ( validate_score(brand_relevance_score) and
           validate_score(brand_positive_score) and
           validate_score(brand_negative_score) and
           validate_summary(video_content_summary) and
           validate_summary(video_brand_summary) ):

        return ( brand_relevance_score,
                 brand_positive_score,
                 brand_negative_score,
                 video_content_summary,
                 video_brand_summary,
                 opinion
              )
      else:
        retries += 1
        ValueError(
            "The value may be incorrect, there may be a range issue, a parsing"
            " issue, or a response issue with Gemini: score -"
            f" {brand_relevance_score}, {brand_positive_score},"
            f" {brand_negative_score} , summary - {video_content_summary},"
            f" {video_brand_summary}" )

    except Exception as e:
      print(f"Request failed: {e}")
      retries += 1
      if retries <= max_retries:
        print(f"retry ({retries}/{max_retries})...")
      else:
        print("Maximum number of retries exceeded")
        return 0, 0, 0, "", "", ""

def validate_score(score):
  return score >= 0 and score <= 100

def validate_summary(summary):
  return len(summary) > 0

Khối mã này chịu trách nhiệm cho 3 chức năng chính: tạo một khung dữ liệu, thực thi một phân tích Gemini và sau đó cập nhật khung dữ liệu.

def df_youtube_videos():
  youtube_video_list = get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
  youtube_video_link_list = []
  youtube_video_title_list = []
  youtube_video_description_list = []
  youtube_video_channel_title_list = []
  youtube_video_duration_list = []

  for video_id in youtube_video_list:
    video_details = get_video_details(video_id)
    # https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
    if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
      youtube_video_link_list.append(f'https://www.youtube.com/watch?v={video_id}')
      if video_details:
        youtube_video_title_list.append(video_details['title'])
        youtube_video_description_list.append(video_details['description'])
        youtube_video_channel_title_list.append(video_details['channel_title'])
        duration_new_format = f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
        youtube_video_duration_list.append(duration_new_format)
      else:
        youtube_video_title_list.append('')
        youtube_video_description_list.append('')
        youtube_video_channel_title_list.append(video_details['channel_title'])
        youtube_video_duration_list.append('')

  df = DataFrame({
      'video_id': youtube_video_link_list,
      'title': youtube_video_title_list,
      'description': youtube_video_description_list,
      'channel_title': youtube_video_channel_title_list,
      'length': youtube_video_duration_list
  })
  return df

def run_gemini(df):
  for index, row in df.iterrows():
    video_title = row['title']
    video_description = row['description']
    video_link = row['video_id']
    prompt = create_prompt(video_title, video_description, video_link)
    ( brand_relevance_score,
      brand_positive_score,
      brand_negative_score,
      video_content_summary,
      video_brand_summary,
      opinion) = request_gemini_with_retry(prompt, video_link)
    df.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
    df.at[index, 'gemini_brand_positive_score'] = brand_positive_score
    df.at[index, 'gemini_brand_negative_score'] = brand_negative_score
    df.at[index, 'gemini_video_content_summary'] = video_content_summary
    df.at[index, 'gemini_video_brand_summary'] = video_brand_summary
    df.at[index, 'gemini_opinion'] = opinion
    # https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
    time.sleep(1)
    print(f"Processing: {index}/{len(df)}")
    print(f"video_title: {video_title}")
  return df

Đây là một khối mã thực thi tất cả mã đã viết cho đến thời điểm này. Hàm này tìm nạp dữ liệu từ YouTube, phân tích dữ liệu đó bằng Gemini và cuối cùng tạo một khung dữ liệu.

# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )

df

Bước cuối cùng là tạo bảng tính từ khung dữ liệu. Để kiểm tra tiến trình, hãy sử dụng URL đầu ra.

import gspread
from google.auth import default

today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"

creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)

print("URL: ", sh.url)

3. Tài liệu tham khảo

Tôi đã tham khảo những nội dung sau để viết mã. Nếu bạn cần sửa đổi mã hoặc muốn biết cách sử dụng chi tiết hơn, vui lòng tham khảo đường liên kết bên dưới.

Báo cáo lỗi