BigQuery 그래프로 고객 360 추천 애플리케이션 빌드

1. 소개

이 Codelab에서는 BigQuery 그래프를 사용하여 가상의 소매업체인 Cymbal Pets의 고객 360도 보기와 추천 엔진을 빌드하는 방법을 알아봅니다. SQL의 기능을 활용하여 BigQuery 내에서 직접 그래프 데이터를 만들고, 쿼리하고, 분석하고, 벡터 검색과 결합하여 고급 제품 추천을 제공합니다.

BigQuery 그래프를 사용하면 고객, 제품, 주문과 같은 데이터 항목 간의 관계를 그래프로 모델링하여 고객 행동 및 제품 선호도에 관한 복잡한 질문에 쉽게 답할 수 있습니다.

사용 사례 다이어그램

실습할 내용

Cymbal Pets 그래프의 BigQuery 데이터 세트 및 스키마 만들기
Cloud Storage에서 샘플 데이터 (고객, 제품, 주문, 매장) 로드
이러한 항목을 연결하는 BigQuery의 속성 그래프 만들기
그래프 쿼리를 사용하여 고객 구매 내역 시각화
벡터 검색을 사용하여 제품 추천 시스템 빌드
'함께 구매한 상품' 그래프 관계를 사용하여 추천 개선

필요한 항목

웹브라우저(예: Chrome)
결제가 사용 설정된 Google Cloud 프로젝트

이 Codelab은 초보자를 포함한 모든 수준의 개발자를 대상으로 합니다.

2. 시작하기 전에

Google Cloud 프로젝트 만들기

Google Cloud 콘솔에서 Google Cloud 프로젝트를 선택하거나 만듭니다.
Cloud 프로젝트에 결제가 사용 설정되어 있는지 확인합니다.

Cloud Shell 시작

Google Cloud 콘솔 상단에서 Cloud Shell 활성화를 클릭합니다.
인증을 확인합니다.

gcloud auth list

프로젝트를 확인합니다.

gcloud config get project

필요한 경우 설정합니다.

export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID

API 사용 설정

다음 명령어를 실행하여 필요한 BigQuery API를 사용 설정합니다.

gcloud services enable bigquery.googleapis.com

3. 스키마 정의

먼저 그래프 관련 테이블을 저장할 데이터 세트를 만들고 노드와 에지의 스키마를 정의해야 합니다.

이 Codelab에서는 SQL 명령어를 실행합니다. BigQuery Studio > SQL 편집기에서 이러한 명령어를 실행하거나 Cloud Shell에서 bq query 명령어를 사용할 수 있습니다. 여러 줄로 된 생성 문을 더 효과적으로 사용하려면 BigQuery SQL 편집기를 사용하는 것이 좋습니다.
cymbal_pets_demo 데이터 세트를 만듭니다.

CREATE SCHEMA IF NOT EXISTS cymbal_pets_demo;

order_items, products, orders, stores, customers, co_related_products_for_angelica 테이블을 만듭니다. 이 테이블은 그래프의 소스 데이터로 사용됩니다.

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.order_items
(
  order_id INT64,
  product_id INT64,
  order_item_id INT64,
  quantity INT64,
  price FLOAT64,
  PRIMARY KEY (order_id, product_id, order_item_id) NOT ENFORCED
)
CLUSTER BY order_item_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.products
(
  product_id INT64,
  product_name STRING,
  brand STRING,
  category STRING,
  subcategory INT64,
  animal_type INT64,
  search_keywords INT64,
  price FLOAT64,
  description STRING,
  inventory_level INT64,
  supplier_id INT64,
  average_rating FLOAT64,
  uri STRING,
  embedding ARRAY<FLOAT64>,
  PRIMARY KEY (product_id) NOT ENFORCED
)
CLUSTER BY product_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.orders
(
  customer_id INT64,
  order_id INT64,
  shipping_address_city STRING,
  store_id INT64,
  order_date DATE,
  order_type STRING,
  payment_method STRING,
  PRIMARY KEY (order_id) NOT ENFORCED
)
PARTITION BY order_date
CLUSTER BY order_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.stores
(
  store_id INT64,
  store_name STRING,
  address_state STRING,
  address_city STRING,
  latitude FLOAT64,
  longitude FLOAT64,
  opening_hours STRUCT<Monday STRING, Tuesday STRING, Wednesday STRING, Thursday STRING, Friday STRING, Saturday STRING, Sunday STRING>,
  manager_id INT64,
  PRIMARY KEY (store_id) NOT ENFORCED
)
CLUSTER BY store_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.customers
(
  customer_id INT64,
  first_name STRING,
  last_name STRING,
  email STRING,
  gender STRING,
  address_city STRING,
  address_state STRING,
  loyalty_member BOOL,
  PRIMARY KEY (customer_id) NOT ENFORCED
)
CLUSTER BY customer_id;

CREATE TABLE IF NOT EXISTS cymbal_pets_demo.co_related_products_for_angelica
(
  angelica_product_id INT64,
  other_product_id INT64,
  co_purchase_count INT64
);

이제 그래프 데이터의 구조를 정의했습니다.

4. 데이터 로드

이제 Cloud Storage의 샘플 데이터로 테이블을 채웁니다.

BigQuery SQL 편집기에서 다음 LOAD DATA 문을 실행합니다.

LOAD DATA INTO `cymbal_pets_demo.customers`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/customers/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.order_items`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/order_items/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.orders`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/orders/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.products`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/products/*.avro']
);

LOAD DATA INTO `cymbal_pets_demo.stores`
FROM FILES (
    format = 'AVRO',
    uris = ['gs://sample-data-and-media/cymbal-pets/tables/stores/*.avro']
);

각 테이블에 행이 로드되었다는 확인 메시지가 표시됩니다.

5. 속성 그래프 만들기

데이터가 로드되면 이제 속성 그래프를 정의할 수 있습니다. 이를 통해 BigQuery는 노드 (고객, 제품과 같은 항목)를 나타내는 테이블과 에지('방문함', '배치됨', '보유함'과 같은 관계)를 나타내는 테이블을 구분할 수 있습니다.

그래프 스키마

다음 DDL 문을 실행합니다.

CREATE OR REPLACE PROPERTY GRAPH cymbal_pets_demo.PetsOrderGraph
NODE TABLES (
  cymbal_pets_demo.customers KEY(customer_id) LABEL Customer,
  cymbal_pets_demo.products KEY(product_id) LABEL Products,
  cymbal_pets_demo.stores KEY(store_id) LABEL Stores,
  cymbal_pets_demo.orders KEY(order_id) LABEL Orders
)
EDGE TABLES (
  cymbal_pets_demo.orders as customer_to_store_edge
    KEY (order_id)
    SOURCE KEY (customer_id) references customers(customer_id)
    DESTINATION KEY (store_id) references stores(store_id)
    LABEL Visited
    PROPERTIES ALL COLUMNS,

  cymbal_pets_demo.order_items
    KEY (order_item_id)
    SOURCE KEY (order_id) references orders(order_id)
    DESTINATION KEY (product_id) references products(product_id)
    LABEL Has
    PROPERTIES ALL COLUMNS,

  cymbal_pets_demo.orders as customer_to_orders_edge
    KEY (order_id)
    SOURCE KEY (customer_id) references customers(customer_id)
    DESTINATION KEY (order_id) references orders(order_id)
    LABEL Placed
    PROPERTIES ALL COLUMNS,

  cymbal_pets_demo.co_related_products_for_angelica
    KEY (angelica_product_id)
    SOURCE KEY (angelica_product_id) references products(product_id)
    DESTINATION KEY (other_product_id) references products(product_id)
    LABEL BoughtTogether
    PROPERTIES ALL COLUMNS
);

이렇게 하면 GRAPH_TABLE 연산자를 사용하여 그래프 순회를 실행할 수 있는 그래프 PetsOrderGraph가 생성됩니다.

6. 모든 고객의 구매 내역 시각화

BigQuery Studio에서 새 노트북을 엽니다.

새 노트북 만들기

이 Codelab의 시각화 및 추천 부분에서는 BigQuery Studio의 Google Colab 노트북을 사용합니다. 이렇게 하면 그래프 결과를 쉽게 시각화할 수 있습니다.

BigQuery 그래프 노트북은 IPython 매직으로 구현됩니다. TO_JSON 함수와 함께 %%bigquery 매직 명령어를 추가하면 다음 섹션에 표시된 대로 결과를 시각화할 수 있습니다.

Cymbal Pets에서 특정 기간에 모든 고객과 구매 내역을 360도로 시각화하려고 한다고 가정해 보겠습니다.

새 셀에서 다음을 실행합니다.

%%bigquery --graph

GRAPH cymbal_pets_demo.PetsOrderGraph
  # finds the customer node and then finds all
  # the Orders nodes that are connected to that customer through the
  # Placed relationship
  MATCH (customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
  # filters the Orders nodes to only include those where the
  # order_date is within the last 3 months.
  WHERE ordr.order_date >= date('2024-11-27')
  # # This line finds all the Products nodes that are connected to the
  # # filtered Orders nodes through the Has relationship.
  MATCH p=(customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
  LIMIT 40
  RETURN 
    TO_JSON(p) as paths

그래프 결과가 시각적으로 표시됩니다.

모든 고객의 구매 내역

7. 안젤리카의 구매 내역 시각화

Cymbal Pets에서 Angelica Russell이라는 고객을 자세히 살펴보고 싶다고 가정해 보겠습니다. 지난 3개월 동안 Angelica가 구매한 제품과 고객이 방문한 매장을 분석하려고 합니다.

%%bigquery --graph

GRAPH cymbal_pets_demo.PetsOrderGraph
  # finds the customer node with the name "Angelica Russell" and then finds all
  # the Orders nodes that are connected to that customer through the
  # Placed relationship and all the Products nodes that are connected to the
  # filtered Orders nodes through the Has relationship.
   MATCH p=(customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
  # filters the Orders nodes to only include those where the
  # order_date is within the last 3 months.
  WHERE ordr.order_date >= date('2024-11-27')
  # finds the Stores nodes where Angelica placed order from
  MATCH p2=(customer)-[visited:Visited]->(store:Stores)
  RETURN
    TO_JSON(p) as path, TO_JSON(p2) as path2

Angelica의 구매 내역

8. 벡터 검색을 사용한 제품 추천

Cymbal Pets는 최근에 Angelica가 구매한 제품을 기반으로 제품을 추천하고자 합니다. 벡터 검색을 사용하여 과거 구매와 임베딩이 유사한 제품을 찾을 수 있습니다.

새 Colab 셀에서 다음 SQL 스크립트를 실행합니다. 이 스크립트는 다음을 수행합니다.

Angelica가 최근에 구매한 제품을 식별합니다.
VECTOR_SEARCH을 사용하여 products 표에서 유사한 상위 4개 제품을 찾습니다.

참고: 이 단계에서는 AI.GENERATE_EMBEDDINGS를 실행하여 제품 테이블에 임베딩 열을 만들었다고 가정합니다.

%%bigquery
DECLARE products_bought_by_angelica ARRAY<INT64>;

-- 1. Get IDs of products bought by Angelica
SET products_bought_by_angelica = (
  SELECT ARRAY_AGG(product_id) FROM
   GRAPH_TABLE(
    cymbal_pets_demo.PetsOrderGraph
      MATCH (c:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(o:Orders)
      WHERE o.order_date >= date('2024-11-27')
      MATCH (o)-[has_edge:Has]->(p:Products)
      RETURN DISTINCT p.product_id as product_id
  ));

-- 2. Find similar products using vector search
SELECT 
  query.product_name as AngelicaBought, 
  base.product_name as RecommendedProducts, 
  base.category
FROM
  VECTOR_SEARCH(
    TABLE cymbal_pets_demo.products,
    'embedding',
    (SELECT * FROM cymbal_pets_demo.products
     WHERE product_id IN UNNEST(products_bought_by_angelica)),
    'embedding',
    top_k => 4)
WHERE query.product_name <> base.product_name;

Angelica가 구매한 제품과 의미적으로 유사한 추천 제품 목록이 표시됩니다.

벡터 검색 결과

9. '함께 구매한 상품' 관계를 사용하는 추천

또 다른 강력한 추천 기법은 '협업 필터링'입니다. 다른 사용자가 함께 자주 구매하는 제품을 추천하는 것입니다. Google에서는 이를 그래프에서 BoughtTogether 에지로 모델링했습니다.

함께 구매한 제품을 추천하기 위해 Cymbal Pets는 분석 오프라인 그래프 쿼리를 실행하여 Angelica가 구매한 각 제품에 추천할 인기 제품을 찾았습니다.

%%bigquery
CREATE OR REPLACE TABLE cymbal_pets_demo.co_related_products_for_angelica AS
SELECT
    angelica_product_id,
    other_product_id,
    co_purchase_count
FROM (
    SELECT
        angelicaProduct.product_id AS angelica_product_id,
        otherProduct.product_id AS other_product_id,
        count(otherProduct) AS co_purchase_count,
        # ensures that the row numbering is done separately for each angelica_product_id
        ROW_NUMBER() OVER (PARTITION BY angelicaProduct.product_id ORDER BY count(otherProduct) DESC) AS rn
    FROM
        GRAPH_TABLE (cymbal_pets_demo.PetsOrderGraph
          MATCH (angelica:Customer {first_name: 'Angelica', last_name: 'Russell'})-[:Placed]->(o:Orders)-[:Has]->(angelicaProduct:Products)
          WHERE o.order_date >= date('2024-11-27')
          WITH angelica, angelicaProduct
          MATCH (otherCustomer:Customer)-[:Placed]->(otherOrder:Orders)-[:Has]->(angelicaProduct) # Find orders where Angelica's products were bought
          WHERE otherCustomer <> angelica # Exclude Angelica's own orders
          WITH angelicaProduct, otherOrder
          MATCH (otherOrder)-[:HAS]->(otherProduct:Products) # Find other products in those orders
          WHERE angelicaProduct <> otherProduct # Exclude the original product.
          RETURN angelicaProduct, otherProduct, otherOrder
        )
    GROUP BY
        angelicaProduct.product_id, otherProduct.product_id
)
WHERE rn <= 3; # only keep top 3 co-related products

10. 삭제

Google Cloud 계정에 지속적으로 비용이 청구되지 않도록 하려면 이 Codelab 중에 만든 리소스를 삭제합니다.

데이터 세트와 모든 테이블을 삭제합니다.

DROP SCHEMA IF EXISTS cymbal_pets_demo CASCADE;

이 Codelab용으로 새 프로젝트를 만든 경우 프로젝트를 삭제할 수도 있습니다.

gcloud projects delete $PROJECT_ID

11. 마무리

축하합니다. BigQuery 그래프를 사용하여 고객 360도 보기 및 추천 엔진을 성공적으로 빌드했습니다.

학습한 내용

BigQuery에서 속성 그래프를 만드는 방법을 설명합니다.
그래프 노드 및 에지에 데이터를 로드하는 방법
GRAPH_TABLE 및 MATCH을 사용하여 그래프 패턴을 쿼리하는 방법
그래프 쿼리와 벡터 검색을 결합하여 하이브리드 추천을 만드는 방법

다음 단계

BigQuery 그래프 문서를 살펴봅니다.
BigQuery의 벡터 검색에 대해 자세히 알아보세요.