1. 简介
在此 Codelab 中,您将学习如何使用 BigQuery Graph 为虚构零售公司 Cymbal Pets 构建客户 360 度视图和推荐引擎。您将利用 SQL 的强大功能直接在 BigQuery 中创建、查询和分析图数据,并将其与向量搜索相结合,以实现高级商品推荐。
借助 BigQuery Graph,您可以将数据实体(例如客户、商品和订单)之间的关系建模为图,从而轻松回答有关客户行为和商品亲和力的复杂问题。

您将执行的操作
- 为 Cymbal Pets 图创建 BigQuery 数据集和架构
- 从 Cloud Storage 加载示例数据(客户、商品、订单、商店)
- 在 BigQuery 中创建属性图,以连接这些实体
- 使用图查询直观呈现客户交易记录
- 使用向量搜索 构建商品推荐系统
- 使用“一起购买”图关系增强推荐
所需条件
- 网络浏览器,例如 Chrome
- 启用了结算功能的 Google Cloud 项目
此 Codelab 适用于各种水平的开发者,包括新手。
2. 准备工作
创建 Google Cloud 项目
- 在 Google Cloud 控制台中,选择或创建 Google Cloud 项目。
- 确保您的 Cloud 项目已启用结算功能。
启动 Cloud Shell
- 点击 Google Cloud 控制台顶部的激活 Cloud Shell 。
- 核实身份验证:
gcloud auth list
- 确认您的项目:
gcloud config get project
- 根据需要进行设置:
export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID
启用 API
运行此命令以启用所需的 BigQuery API:
gcloud services enable bigquery.googleapis.com
3. 定义架构
首先,您需要创建一个数据集来存储与图相关的表,并为节点和边定义架构。
- 在此 Codelab 中,我们将执行 SQL 命令。您可以在 BigQuery Studio > SQL 编辑器 中运行这些命令,也可以在 Cloud Shell 中使用
bq query命令。
我们将假定您使用的是 BigQuery SQL 编辑器 ,以便更好地使用多行创建语句。 - 创建
cymbal_pets_demo数据集:
CREATE SCHEMA IF NOT EXISTS cymbal_pets_demo;
- 为
order_items、products、orders、stores、customers和co_related_products_for_angelica创建表。这些表将作为图的源数据。
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.order_items
(
order_id INT64,
product_id INT64,
order_item_id INT64,
quantity INT64,
price FLOAT64,
PRIMARY KEY (order_id, product_id, order_item_id) NOT ENFORCED
)
CLUSTER BY order_item_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.products
(
product_id INT64,
product_name STRING,
brand STRING,
category STRING,
subcategory INT64,
animal_type INT64,
search_keywords INT64,
price FLOAT64,
description STRING,
inventory_level INT64,
supplier_id INT64,
average_rating FLOAT64,
uri STRING,
embedding ARRAY<FLOAT64>,
PRIMARY KEY (product_id) NOT ENFORCED
)
CLUSTER BY product_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.orders
(
customer_id INT64,
order_id INT64,
shipping_address_city STRING,
store_id INT64,
order_date DATE,
order_type STRING,
payment_method STRING,
PRIMARY KEY (order_id) NOT ENFORCED
)
PARTITION BY order_date
CLUSTER BY order_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.stores
(
store_id INT64,
store_name STRING,
address_state STRING,
address_city STRING,
latitude FLOAT64,
longitude FLOAT64,
opening_hours STRUCT<Monday STRING, Tuesday STRING, Wednesday STRING, Thursday STRING, Friday STRING, Saturday STRING, Sunday STRING>,
manager_id INT64,
PRIMARY KEY (store_id) NOT ENFORCED
)
CLUSTER BY store_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.customers
(
customer_id INT64,
first_name STRING,
last_name STRING,
email STRING,
gender STRING,
address_city STRING,
address_state STRING,
loyalty_member BOOL,
PRIMARY KEY (customer_id) NOT ENFORCED
)
CLUSTER BY customer_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.co_related_products_for_angelica
(
angelica_product_id INT64,
other_product_id INT64,
co_purchase_count INT64
);
您现在已定义图数据的结构。
4. 加载数据
现在,使用 Cloud Storage 中的示例数据填充表。
在 BigQuery SQL 编辑器中运行以下 LOAD DATA 语句:
LOAD DATA INTO `cymbal_pets_demo.customers`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/customers/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.order_items`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/order_items/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.orders`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/orders/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.products`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/products/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.stores`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/stores/*.avro']
);
您应该会看到确认消息,表明行已加载到每个表中。
5. 创建属性图
加载数据后,您现在可以定义属性图 了。这会告知 BigQuery 哪些表代表节点 (例如客户、商品等实体),哪些表代表边 (例如“访问过”“已下单”“有”等关系)。

运行以下 DDL 语句:
CREATE OR REPLACE PROPERTY GRAPH cymbal_pets_demo.PetsOrderGraph
NODE TABLES (
cymbal_pets_demo.customers KEY(customer_id) LABEL Customer,
cymbal_pets_demo.products KEY(product_id) LABEL Products,
cymbal_pets_demo.stores KEY(store_id) LABEL Stores,
cymbal_pets_demo.orders KEY(order_id) LABEL Orders
)
EDGE TABLES (
cymbal_pets_demo.orders as customer_to_store_edge
KEY (order_id)
SOURCE KEY (customer_id) references customers(customer_id)
DESTINATION KEY (store_id) references stores(store_id)
LABEL Visited
PROPERTIES ALL COLUMNS,
cymbal_pets_demo.order_items
KEY (order_item_id)
SOURCE KEY (order_id) references orders(order_id)
DESTINATION KEY (product_id) references products(product_id)
LABEL Has
PROPERTIES ALL COLUMNS,
cymbal_pets_demo.orders as customer_to_orders_edge
KEY (order_id)
SOURCE KEY (customer_id) references customers(customer_id)
DESTINATION KEY (order_id) references orders(order_id)
LABEL Placed
PROPERTIES ALL COLUMNS,
cymbal_pets_demo.co_related_products_for_angelica
KEY (angelica_product_id)
SOURCE KEY (angelica_product_id) references products(product_id)
DESTINATION KEY (other_product_id) references products(product_id)
LABEL BoughtTogether
PROPERTIES ALL COLUMNS
);
这会创建图 PetsOrderGraph,以便我们使用 GRAPH_TABLE 运算符执行图遍历。
6. 直观呈现所有客户的购买历史记录
在 BigQuery Studio 中打开新建笔记本 。

对于此 Codelab 的可视化和推荐部分,我们将在 BigQuery Studio 中使用 Google Colab 笔记本。这样,我们就可以轻松直观呈现图结果。
BigQuery Graph 笔记本以 IPython 魔法命令的形式实现。通过添加带有 TO_JSON 函数的 %%bigquery 魔法命令,您可以直观呈现结果,如以下部分所示。
假设 Cymbal Pets 想要获得所有客户及其在特定时间窗口内所购商品的 360 度可视化视图。
在新单元格中运行以下命令:
%%bigquery --graph
GRAPH cymbal_pets_demo.PetsOrderGraph
# finds the customer node and then finds all
# the Orders nodes that are connected to that customer through the
# Placed relationship
MATCH (customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
# filters the Orders nodes to only include those where the
# order_date is within the last 3 months.
WHERE ordr.order_date >= date('2024-11-27')
# # This line finds all the Products nodes that are connected to the
# # filtered Orders nodes through the Has relationship.
MATCH p=(customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
LIMIT 40
RETURN
TO_JSON(p) as paths
您应该会看到图结果的可视化表示。

7. 直观呈现 Angelica 的购买历史记录
假设 Cymbal Pets 想要深入了解一位名为 Angelica Russell 的客户。他们想要分析 Angelica 在过去 3 个月内购买的商品以及客户访问过的商店。
%%bigquery --graph
GRAPH cymbal_pets_demo.PetsOrderGraph
# finds the customer node with the name "Angelica Russell" and then finds all
# the Orders nodes that are connected to that customer through the
# Placed relationship and all the Products nodes that are connected to the
# filtered Orders nodes through the Has relationship.
MATCH p=(customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
# filters the Orders nodes to only include those where the
# order_date is within the last 3 months.
WHERE ordr.order_date >= date('2024-11-27')
# finds the Stores nodes where Angelica placed order from
MATCH p2=(customer)-[visited:Visited]->(store:Stores)
RETURN
TO_JSON(p) as path, TO_JSON(p2) as path2

8. 使用向量搜索进行商品推荐
Cymbal Pets 想要根据 Angelica 最近购买的商品向她推荐商品。我们可以使用向量搜索 查找与她过去购买的商品具有相似嵌入的商品。
在新 Colab 单元格中运行以下 SQL 脚本。此脚本:
- 识别 Angelica 最近购买的商品。
- 使用
VECTOR_SEARCH从products表中查找前 4 个类似商品。
注意:此步骤假定您已运行 AI.GENERATE_EMBEDDINGS 以在商品表中创建嵌入列。
%%bigquery
DECLARE products_bought_by_angelica ARRAY<INT64>;
-- 1. Get IDs of products bought by Angelica
SET products_bought_by_angelica = (
SELECT ARRAY_AGG(product_id) FROM
GRAPH_TABLE(
cymbal_pets_demo.PetsOrderGraph
MATCH (c:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(o:Orders)
WHERE o.order_date >= date('2024-11-27')
MATCH (o)-[has_edge:Has]->(p:Products)
RETURN DISTINCT p.product_id as product_id
));
-- 2. Find similar products using vector search
SELECT
query.product_name as AngelicaBought,
base.product_name as RecommendedProducts,
base.category
FROM
VECTOR_SEARCH(
TABLE cymbal_pets_demo.products,
'embedding',
(SELECT * FROM cymbal_pets_demo.products
WHERE product_id IN UNNEST(products_bought_by_angelica)),
'embedding',
top_k => 4)
WHERE query.product_name <> base.product_name;
您应该会看到一个推荐商品列表,这些商品在语义上与 Angelica 购买的商品相似。

9. 使用“一起购买”关系进行推荐
另一种强大的推荐技术是“协同过滤” - 推荐其他用户经常一起购买的商品。我们在图中将其建模为 BoughtTogether 边。
为了推荐一起购买的商品,Cymbal Pets 执行了分析型离线图查询,以查找要为 Angelica 购买的每件商品推荐的热门商品。
%%bigquery
CREATE OR REPLACE TABLE cymbal_pets_demo.co_related_products_for_angelica AS
SELECT
angelica_product_id,
other_product_id,
co_purchase_count
FROM (
SELECT
angelicaProduct.product_id AS angelica_product_id,
otherProduct.product_id AS other_product_id,
count(otherProduct) AS co_purchase_count,
# ensures that the row numbering is done separately for each angelica_product_id
ROW_NUMBER() OVER (PARTITION BY angelicaProduct.product_id ORDER BY count(otherProduct) DESC) AS rn
FROM
GRAPH_TABLE (cymbal_pets_demo.PetsOrderGraph
MATCH (angelica:Customer {first_name: 'Angelica', last_name: 'Russell'})-[:Placed]->(o:Orders)-[:Has]->(angelicaProduct:Products)
WHERE o.order_date >= date('2024-11-27')
WITH angelica, angelicaProduct
MATCH (otherCustomer:Customer)-[:Placed]->(otherOrder:Orders)-[:Has]->(angelicaProduct) # Find orders where Angelica's products were bought
WHERE otherCustomer <> angelica # Exclude Angelica's own orders
WITH angelicaProduct, otherOrder
MATCH (otherOrder)-[:HAS]->(otherProduct:Products) # Find other products in those orders
WHERE angelicaProduct <> otherProduct # Exclude the original product.
RETURN angelicaProduct, otherProduct, otherOrder
)
GROUP BY
angelicaProduct.product_id, otherProduct.product_id
)
WHERE rn <= 3; # only keep top 3 co-related products

运行此查询以推荐通过 BoughtTogether 边直接与 Angelica 的购买商品相关联的商品:
%%bigquery
SELECT * FROM GRAPH_TABLE(
cymbal_pets_demo.PetsOrderGraph
MATCH (customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)
WHERE ordr.order_date >= date('2024-11-27')
MATCH (ordr)-[has:Has]->(product:Products)
MATCH (product)-[bought_together:BoughtTogether]->(recommended_product:Products)
RETURN
product.product_name AS OriginalProduct,
recommended_product.product_name AS Recommended,
bought_together.co_purchase_count AS Strength
);
此查询从客户 -> 订单 -> 商品 ->(一起购买)-> 推荐商品进行遍历,根据集体购买行为向您显示推荐。

10. 清理
为避免系统向您的 Google Cloud 账号持续收取费用,请删除在此 Codelab 期间创建的资源。
删除数据集和所有表:
DROP SCHEMA IF EXISTS cymbal_pets_demo CASCADE;
如果您为此 Codelab 创建了一个新项目,也可以删除该项目:
gcloud projects delete $PROJECT_ID
11. 恭喜
恭喜!您已使用 BigQuery Graph 成功构建了客户 360 度视图和推荐引擎。
您学到的内容
- 如何在 BigQuery 中创建属性图。
- 如何将数据加载到图节点和边。
- 如何使用
GRAPH_TABLE和MATCH查询图模式。 - 如何将图查询与向量搜索相结合,以实现混合推荐。
后续步骤
- 探索 BigQuery Graph 文档。
- 详细了解 BigQuery 中的 向量搜索。