使用 BigQuery Graph 构建 Customer 360 推荐应用

1. 简介

在此 Codelab 中,您将学习如何使用 BigQuery Graph 为虚构零售公司 Cymbal Pets 构建客户 360 度视图和推荐引擎。您将利用 SQL 的强大功能直接在 BigQuery 中创建、查询和分析图数据,并将其与向量搜索相结合,以实现高级商品推荐。

借助 BigQuery Graph,您可以将数据实体(例如客户、商品和订单)之间的关系建模为图,从而轻松回答有关客户行为和商品亲和力的复杂问题。

使用场景图示

您将执行的操作

  • 为 Cymbal Pets 图创建 BigQuery 数据集和架构
  • 从 Cloud Storage 加载示例数据(客户、商品、订单、商店)
  • 在 BigQuery 中创建属性图,以连接这些实体
  • 使用图查询直观呈现客户交易记录
  • 使用向量搜索 构建商品推荐系统
  • 使用“一起购买”图关系增强推荐

所需条件

  • 网络浏览器,例如 Chrome
  • 启用了结算功能的 Google Cloud 项目

此 Codelab 适用于各种水平的开发者,包括新手。

2. 准备工作

创建 Google Cloud 项目

  1. Google Cloud 控制台中,选择或创建 Google Cloud 项目
  2. 确保您的 Cloud 项目已启用结算功能。

启动 Cloud Shell

  1. 点击 Google Cloud 控制台顶部的激活 Cloud Shell
  2. 核实身份验证:
gcloud auth list
  1. 确认您的项目:
gcloud config get project
  1. 根据需要进行设置:
export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID

启用 API

运行此命令以启用所需的 BigQuery API:

gcloud services enable bigquery.googleapis.com

3. 定义架构

首先,您需要创建一个数据集来存储与图相关的表,并为节点和边定义架构。

  1. 在此 Codelab 中,我们将执行 SQL 命令。您可以在 BigQuery Studio > SQL 编辑器 中运行这些命令,也可以在 Cloud Shell 中使用 bq query 命令。新建 SQL 查询我们将假定您使用的是 BigQuery SQL 编辑器 ,以便更好地使用多行创建语句。
  2. 创建 cymbal_pets_demo 数据集:
CREATE SCHEMA IF NOT EXISTS cymbal_pets_demo;
  1. order_itemsproductsordersstorescustomersco_related_products_for_angelica 创建表。这些表将作为图的源数据。
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.order_items
(
  order_id INT64,
  product_id INT64,
  order_item_id INT64,
  quantity INT64,
  price FLOAT64,
  PRIMARY KEY (order_id, product_id, order_item_id) NOT ENFORCED
)
CLUSTER BY order_item_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.products
(
  product_id INT64,
  product_name STRING,
  brand STRING,
  category STRING,
  subcategory INT64,
  animal_type INT64,
  search_keywords INT64,
  price FLOAT64,
  description STRING,
  inventory_level INT64,
  supplier_id INT64,
  average_rating FLOAT64,
  uri STRING,
  embedding ARRAY<FLOAT64>,
  PRIMARY KEY (product_id) NOT ENFORCED
)
CLUSTER BY product_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.orders
(
  customer_id INT64,
  order_id INT64,
  shipping_address_city STRING,
  store_id INT64,
  order_date DATE,
  order_type STRING,
  payment_method STRING,
  PRIMARY KEY (order_id) NOT ENFORCED
)
PARTITION BY order_date
CLUSTER BY order_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.stores
(
  store_id INT64,
  store_name STRING,
  address_state STRING,
  address_city STRING,
  latitude FLOAT64,
  longitude FLOAT64,
  opening_hours STRUCT<Monday STRING, Tuesday STRING, Wednesday STRING, Thursday STRING, Friday STRING, Saturday STRING, Sunday STRING>,
  manager_id INT64,
  PRIMARY KEY (store_id) NOT ENFORCED
)
CLUSTER BY store_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.customers
(
  customer_id INT64,
  first_name STRING,
  last_name STRING,
  email STRING,
  gender STRING,
  address_city STRING,
  address_state STRING,
  loyalty_member BOOL,
  PRIMARY KEY (customer_id) NOT ENFORCED
)
CLUSTER BY customer_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.co_related_products_for_angelica
(
  angelica_product_id INT64,
  other_product_id INT64,
  co_purchase_count INT64
);

您现在已定义图数据的结构。

4. 加载数据

现在,使用 Cloud Storage 中的示例数据填充表。

在 BigQuery SQL 编辑器中运行以下 LOAD DATA 语句:

LOAD DATA INTO `cymbal_pets_demo.customers`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/customers/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.order_items`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/order_items/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.orders`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/orders/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.products`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/products/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.stores`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/stores/*.avro']
);

您应该会看到确认消息,表明行已加载到每个表中。

5. 创建属性图

加载数据后,您现在可以定义属性图 了。这会告知 BigQuery 哪些表代表节点 (例如客户、商品等实体),哪些表代表 (例如“访问过”“已下单”“有”等关系)。

图表架构

运行以下 DDL 语句:

CREATE OR REPLACE PROPERTY GRAPH cymbal_pets_demo.PetsOrderGraph
NODE TABLES (
  cymbal_pets_demo.customers KEY(customer_id) LABEL Customer,
  cymbal_pets_demo.products KEY(product_id) LABEL Products,
  cymbal_pets_demo.stores KEY(store_id) LABEL Stores,
  cymbal_pets_demo.orders KEY(order_id) LABEL Orders
)
EDGE TABLES (
  cymbal_pets_demo.orders as customer_to_store_edge
    KEY (order_id)
    SOURCE KEY (customer_id) references customers(customer_id)
    DESTINATION KEY (store_id) references stores(store_id)
    LABEL Visited
    PROPERTIES ALL COLUMNS,

  cymbal_pets_demo.order_items
    KEY (order_item_id)
    SOURCE KEY (order_id) references orders(order_id)
    DESTINATION KEY (product_id) references products(product_id)
    LABEL Has
    PROPERTIES ALL COLUMNS,

  cymbal_pets_demo.orders as customer_to_orders_edge
    KEY (order_id)
    SOURCE KEY (customer_id) references customers(customer_id)
    DESTINATION KEY (order_id) references orders(order_id)
    LABEL Placed
    PROPERTIES ALL COLUMNS,

  cymbal_pets_demo.co_related_products_for_angelica
    KEY (angelica_product_id)
    SOURCE KEY (angelica_product_id) references products(product_id)
    DESTINATION KEY (other_product_id) references products(product_id)
    LABEL BoughtTogether
    PROPERTIES ALL COLUMNS
);

这会创建图 PetsOrderGraph,以便我们使用 GRAPH_TABLE 运算符执行图遍历。

6. 直观呈现所有客户的购买历史记录

在 BigQuery Studio 中打开新建笔记本

创建新笔记本

对于此 Codelab 的可视化和推荐部分,我们将在 BigQuery Studio 中使用 Google Colab 笔记本。这样,我们就可以轻松直观呈现图结果。

BigQuery Graph 笔记本以 IPython 魔法命令的形式实现。通过添加带有 TO_JSON 函数的 %%bigquery 魔法命令,您可以直观呈现结果,如以下部分所示。

假设 Cymbal Pets 想要获得所有客户及其在特定时间窗口内所购商品的 360 度可视化视图。

在新单元格中运行以下命令:

%%bigquery --graph

GRAPH cymbal_pets_demo.PetsOrderGraph
  # finds the customer node and then finds all
  # the Orders nodes that are connected to that customer through the
  # Placed relationship
  MATCH (customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
  # filters the Orders nodes to only include those where the
  # order_date is within the last 3 months.
  WHERE ordr.order_date >= date('2024-11-27')
  # # This line finds all the Products nodes that are connected to the
  # # filtered Orders nodes through the Has relationship.
  MATCH p=(customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
  LIMIT 40
  RETURN 
    TO_JSON(p) as paths

您应该会看到图结果的可视化表示。

所有客户的购买历史记录

7. 直观呈现 Angelica 的购买历史记录

假设 Cymbal Pets 想要深入了解一位名为 Angelica Russell 的客户。他们想要分析 Angelica 在过去 3 个月内购买的商品以及客户访问过的商店。

%%bigquery --graph

GRAPH cymbal_pets_demo.PetsOrderGraph
  # finds the customer node with the name "Angelica Russell" and then finds all
  # the Orders nodes that are connected to that customer through the
  # Placed relationship and all the Products nodes that are connected to the
  # filtered Orders nodes through the Has relationship.
   MATCH p=(customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
  # filters the Orders nodes to only include those where the
  # order_date is within the last 3 months.
  WHERE ordr.order_date >= date('2024-11-27')
  # finds the Stores nodes where Angelica placed order from
  MATCH p2=(customer)-[visited:Visited]->(store:Stores)
  RETURN
    TO_JSON(p) as path, TO_JSON(p2) as path2

Angelica 的购买记录

8. 使用向量搜索进行商品推荐

Cymbal Pets 想要根据 Angelica 最近购买的商品向她推荐商品。我们可以使用向量搜索 查找与她过去购买的商品具有相似嵌入的商品。

在新 Colab 单元格中运行以下 SQL 脚本。此脚本:

  1. 识别 Angelica 最近购买的商品。
  2. 使用 VECTOR_SEARCHproducts 表中查找前 4 个类似商品。

注意:此步骤假定您已运行 AI.GENERATE_EMBEDDINGS 以在商品表中创建嵌入列。

%%bigquery
DECLARE products_bought_by_angelica ARRAY<INT64>;

-- 1. Get IDs of products bought by Angelica
SET products_bought_by_angelica = (
  SELECT ARRAY_AGG(product_id) FROM
   GRAPH_TABLE(
    cymbal_pets_demo.PetsOrderGraph
      MATCH (c:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(o:Orders)
      WHERE o.order_date >= date('2024-11-27')
      MATCH (o)-[has_edge:Has]->(p:Products)
      RETURN DISTINCT p.product_id as product_id
  ));

-- 2. Find similar products using vector search
SELECT 
  query.product_name as AngelicaBought, 
  base.product_name as RecommendedProducts, 
  base.category
FROM
  VECTOR_SEARCH(
    TABLE cymbal_pets_demo.products,
    'embedding',
    (SELECT * FROM cymbal_pets_demo.products
     WHERE product_id IN UNNEST(products_bought_by_angelica)),
    'embedding',
    top_k => 4)
WHERE query.product_name <> base.product_name;

您应该会看到一个推荐商品列表,这些商品在语义上与 Angelica 购买的商品相似。

Vector Search 结果

9. 使用“一起购买”关系进行推荐

另一种强大的推荐技术是“协同过滤” - 推荐其他用户经常一起购买的商品。我们在图中将其建模为 BoughtTogether 边。

为了推荐一起购买的商品,Cymbal Pets 执行了分析型离线图查询,以查找要为 Angelica 购买的每件商品推荐的热门商品。

%%bigquery
CREATE OR REPLACE TABLE cymbal_pets_demo.co_related_products_for_angelica AS
SELECT
    angelica_product_id,
    other_product_id,
    co_purchase_count
FROM (
    SELECT
        angelicaProduct.product_id AS angelica_product_id,
        otherProduct.product_id AS other_product_id,
        count(otherProduct) AS co_purchase_count,
        # ensures that the row numbering is done separately for each angelica_product_id
        ROW_NUMBER() OVER (PARTITION BY angelicaProduct.product_id ORDER BY count(otherProduct) DESC) AS rn
    FROM
        GRAPH_TABLE (cymbal_pets_demo.PetsOrderGraph
          MATCH (angelica:Customer {first_name: 'Angelica', last_name: 'Russell'})-[:Placed]->(o:Orders)-[:Has]->(angelicaProduct:Products)
          WHERE o.order_date >= date('2024-11-27')
          WITH angelica, angelicaProduct
          MATCH (otherCustomer:Customer)-[:Placed]->(otherOrder:Orders)-[:Has]->(angelicaProduct) # Find orders where Angelica's products were bought
          WHERE otherCustomer <> angelica # Exclude Angelica's own orders
          WITH angelicaProduct, otherOrder
          MATCH (otherOrder)-[:HAS]->(otherProduct:Products) # Find other products in those orders
          WHERE angelicaProduct <> otherProduct # Exclude the original product.
          RETURN angelicaProduct, otherProduct, otherOrder
        )
    GROUP BY
        angelicaProduct.product_id, otherProduct.product_id
)
WHERE rn <= 3; # only keep top 3 co-related products

推荐逻辑

运行此查询以推荐通过 BoughtTogether 边直接与 Angelica 的购买商品相关联的商品:

%%bigquery
SELECT * FROM GRAPH_TABLE(
  cymbal_pets_demo.PetsOrderGraph
  MATCH (customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)
  WHERE ordr.order_date >= date('2024-11-27')
  MATCH (ordr)-[has:Has]->(product:Products)
  MATCH (product)-[bought_together:BoughtTogether]->(recommended_product:Products)
  RETURN 
    product.product_name AS OriginalProduct,
    recommended_product.product_name AS Recommended,
    bought_together.co_purchase_count AS Strength
);

此查询从客户 -> 订单 -> 商品 ->(一起购买)-> 推荐商品进行遍历,根据集体购买行为向您显示推荐。

一起购买

10. 清理

为避免系统向您的 Google Cloud 账号持续收取费用,请删除在此 Codelab 期间创建的资源。

删除数据集和所有表:

DROP SCHEMA IF EXISTS cymbal_pets_demo CASCADE;

如果您为此 Codelab 创建了一个新项目,也可以删除该项目:

gcloud projects delete $PROJECT_ID

11. 恭喜

恭喜!您已使用 BigQuery Graph 成功构建了客户 360 度视图和推荐引擎。

您学到的内容

  • 如何在 BigQuery 中创建属性图。
  • 如何将数据加载到图节点和边。
  • 如何使用 GRAPH_TABLEMATCH 查询图模式。
  • 如何将图查询与向量搜索相结合,以实现混合推荐。

后续步骤