این صفحه به‌وسیله ‏Cloud Translation API‏ ترجمه شده است.

کارگاه مش سرویس Anthos: راهنمای آزمایشگاه

۱. کارگاه آلفا

لینک کارگاه codelab bit.ly/asm-workshop

۲. مرور کلی

نمودار معماری

این کارگاه یک تجربه عملی و فراگیر است که نحوه راه‌اندازی سرویس‌های توزیع‌شده جهانی روی GCP در محیط عملیاتی را آموزش می‌دهد. فناوری‌های اصلی مورد استفاده عبارتند از موتور کوبرنتیز گوگل (GKE) برای محاسبات و شبکه سرویس Istio برای ایجاد اتصال امن، قابلیت مشاهده و شکل‌دهی پیشرفته ترافیک. تمام شیوه‌ها و ابزارهای مورد استفاده در این کارگاه همان‌هایی هستند که در محیط عملیاتی استفاده خواهید کرد.

دستور کار

ماژول ۰ - مقدمه و راه‌اندازی پلتفرم
مقدمه و معماری
مقدمه‌ای بر Service Mesh و Istio/ASM
آزمایشگاه: راه‌اندازی زیرساخت: گردش کار کاربر
شکستن
کیو ان ای
ماژول ۱ - نصب، ایمن‌سازی و نظارت بر برنامه‌ها با ASM
مدل مخزن: توضیح مخازن زیرساخت و Kubernetes
آزمایشگاه: استقرار برنامه نمونه
سرویس‌های توزیع‌شده و قابلیت مشاهده
ناهار
آزمایشگاه: مشاهده‌پذیری با Stackdriver
قنا
ماژول 2 - DevOps - راه‌اندازی‌های Canary، سیاست/RBAC
کشف سرویس چند خوشه‌ای و امنیت/سیاست‌گذاری
آزمایشگاه: TLS متقابل
استقرارهای قناری
آزمایشگاه: استقرارهای قناری
متعادل‌سازی بار سراسری چندخوشه‌ای امن
شکستن
آزمایشگاه: سیاست مجوز
قنا
ماژول ۳ - عملیات مادون قرمز - ارتقاء پلتفرم
بلوک‌های سازنده خدمات توزیع‌شده
آزمایشگاه: مقیاس‌پذیری زیرساخت
مراحل بعدی

اسلایدها

اسلایدهای این کارگاه را می‌توانید از لینک زیر دریافت کنید:

اسلایدهای کارگاه ASM

پیش‌نیازها

قبل از ادامه این کارگاه، موارد زیر الزامی است:

یک گره سازمان GCP
شناسه حساب صورتحساب (کاربر شما باید در این حساب صورتحساب ، مدیر صورتحساب باشد)
نقش مدیر سازمان (IAM) در سطح سازمانی برای کاربر شما

۳. راه‌اندازی زیرساخت - گردش کار مدیریتی

اسکریپت کارگاه بوت‌استرپ توضیح داده شد

اسکریپتی به نام bootstrap_workshop.sh برای راه‌اندازی محیط اولیه کارگاه استفاده می‌شود. در صورتی که این کارگاه را به عنوان آموزش برای چندین کاربر ارائه می‌دهید، می‌توانید از این اسکریپت برای راه‌اندازی یک محیط واحد برای خودتان یا چندین محیط برای چندین کاربر استفاده کنید.

اسکریپت کارگاه بوت‌استرپ به ورودی‌های زیر نیاز دارد:

نام سازمان (برای مثال yourcompany.com ) - این سازمانی است که در آن محیط‌هایی برای کارگاه ایجاد می‌کنید.
شناسه صورتحساب (برای مثال 12345-12345-12345 ) - این شناسه صورتحساب برای صدور صورتحساب تمام منابع مورد استفاده در طول کارگاه استفاده می‌شود.
شماره کارگاه (برای مثال 01 ) - یک عدد دو رقمی. این عدد در صورتی استفاده می‌شود که شما چندین کارگاه را در یک روز انجام می‌دهید و می‌خواهید آنها را جداگانه پیگیری کنید. شماره‌های کارگاه همچنین برای استخراج شناسه‌های پروژه استفاده می‌شوند. داشتن شماره‌های کارگاه جداگانه، اطمینان از دریافت شناسه‌های منحصر به فرد پروژه را در هر بار آسان‌تر می‌کند. علاوه بر شماره کارگاه، تاریخ فعلی (با فرمت YYMMDD ) نیز برای شناسه‌های پروژه استفاده می‌شود. ترکیب تاریخ و شماره کارگاه، شناسه‌های منحصر به فرد پروژه را فراهم می‌کند.
شماره کاربر شروع (برای مثال 1 ) - این شماره اولین کاربر در کارگاه را نشان می‌دهد. برای مثال، اگر می‌خواهید یک کارگاه برای ۱۰ کاربر ایجاد کنید، ممکن است شماره کاربر شروع ۱ و شماره کاربر نهایی ۱۰ باشد.
شماره کاربر نهایی (برای مثال 10 ) - این شماره آخرین کاربر در کارگاه را نشان می‌دهد. برای مثال، اگر می‌خواهید یک کارگاه برای ۱۰ کاربر ایجاد کنید، می‌توانید شماره کاربر شروع ۱ و شماره کاربر نهایی ۱۰ را داشته باشید. اگر در حال راه‌اندازی یک محیط واحد هستید (برای مثال برای خودتان)، شماره کاربر شروع و پایان را یکسان کنید. این کار یک محیط واحد ایجاد می‌کند.

سطل GCS مدیر (برای مثال my-gcs-bucket-name ) - یک سطل GCS برای ذخیره اطلاعات مربوط به کارگاه استفاده می‌شود. این اطلاعات توسط اسکریپت cleanup_workshop.sh برای حذف صحیح تمام منابع ایجاد شده در طول اسکریپت کارگاه بوت استرپ استفاده می‌شود. مدیرانی که کارگاه‌ها را ایجاد می‌کنند باید مجوزهای خواندن/نوشتن این سطل را داشته باشند.

اسکریپت کارگاه بوت‌استرپ از مقادیر ارائه شده در بالا استفاده می‌کند و به عنوان یک اسکریپت پوششی عمل می‌کند که اسکریپت setup-terraform-admin-project.sh را فراخوانی می‌کند. اسکریپت setup-terraform-admin-project.sh محیط کارگاه را برای یک کاربر واحد ایجاد می‌کند.

مجوزهای مدیریتی مورد نیاز برای بوت‌استرپ کردن کارگاه

در این کارگاه دو نوع کاربر وجود دارد. یک ADMIN_USER که منابع این کارگاه را ایجاد و حذف می‌کند. دومی MY_USER است که مراحل کارگاه را انجام می‌دهد. MY_USER فقط به منابع خودش دسترسی دارد. ADMIN_USER به تمام تنظیمات کاربر دسترسی دارد. اگر این تنظیمات را برای خودتان ایجاد می‌کنید، ADMIN_USER و MY_USER یکسان هستند. اگر شما مربی هستید که این کارگاه را برای چندین دانشجو ایجاد می‌کنید، ADMIN_USER و MY_USER شما متفاوت خواهند بود.

مجوزهای سطح سازمانی زیر برای ADMIN_USER مورد نیاز است:

مالک - مجوز مالک پروژه برای تمام پروژه‌های سازمان.
مدیریت پوشه - امکان ایجاد و حذف پوشه‌ها در سازمان. هر کاربر یک پوشه با تمام منابع خود در داخل پروژه دریافت می‌کند.
مدیر سازمان
ایجادکننده پروژه - امکان ایجاد پروژه‌ها در سازمان.
حذف‌کننده پروژه - امکان حذف پروژه‌ها در سازمان.
مدیر IAM پروژه - امکان ایجاد قوانین IAM در تمام پروژه‌های سازمان.

علاوه بر این موارد، ADMIN_USER باید مدیر صورتحساب برای شناسه صورتحساب مورد استفاده برای کارگاه نیز باشد.

طرحواره کاربری و مجوزهای انجام کارگاه

اگر قصد دارید این کارگاه را برای کاربران (غیر از خودتان) در سازمان خود ایجاد کنید، باید از یک طرح نامگذاری کاربر خاص برای MY_USERs پیروی کنید. در طول اسکریپت bootstrap_workshop.sh، یک شماره کاربر شروع و یک شماره کاربر نهایی ارائه می‌دهید. این شماره‌ها برای ایجاد نام‌های کاربری زیر استفاده می‌شوند:

user<3 digit user number>@<organization_name>

برای مثال، اگر اسکریپت کارگاه بوت‌استرپ را با شماره کاربر شروع ۱ و شماره کاربر نهایی ۳، در سازمان خود به نام yourcompany.com اجرا کنید، محیط‌های کارگاه برای کاربران زیر ایجاد می‌شوند:

user001@yourcompany.com
user002@yourcompany.com
user003@yourcompany.com

این نام‌های کاربری، نقش‌های مالک پروژه را برای پروژه‌های خاص خود که در طول اسکریپت setup_terraform_admin_project.sh ایجاد شده‌اند، تعیین می‌کنند. هنگام استفاده از اسکریپت bootstrap، باید از این طرح نامگذاری کاربر پیروی کنید. به نحوه اضافه کردن چندین کاربر به طور همزمان در GSuite مراجعه کنید.

ابزار مورد نیاز برای کارگاه

این کارگاه قرار است با استفاده از Cloud Shell به صورت بوت‌استرپ (bootstrap) برگزار شود. ابزارهای زیر برای این کارگاه مورد نیاز است.

جی‌کلاود (نسخه >= ۲۷۰)
کوبکتل
sed (با sed در Cloud Shell/Linux کار می‌کند و نه Mac OS)
git (مطمئن شوید که به‌روز هستید)
sudo apt update
sudo apt install git
جی کیو
envsub
سفارشی‌سازی

کارگاهی برای خودتان راه‌اندازی کنید (راه‌اندازی تک‌کاربره)

Cloud Shell را باز کنید، تمام اقدامات زیر را در Cloud Shell انجام دهید. روی لینک زیر کلیک کنید.

پوسته ابری

تأیید کنید که با کاربر ادمین مورد نظر وارد gcloud شده‌اید.

gcloud config list

یک WORKDIR ایجاد کنید و مخزن کارگاه را کلون کنید.

mkdir asm-workshop
cd asm-workshop
export WORKDIR=`pwd`
git clone https://github.com/GoogleCloudPlatform/anthos-service-mesh-workshop.git asm

نام سازمان، شناسه صورتحساب، شماره کارگاه و یک سطل GCS مدیر برای استفاده در کارگاه تعریف کنید. مجوزهای لازم برای راه‌اندازی کارگاه را در بخش‌های بالا مرور کنید.

gcloud organizations list
export ORGANIZATION_NAME=<ORGANIZATION NAME>

gcloud beta billing accounts list
export ADMIN_BILLING_ID=<ADMIN_BILLING ID>

export WORKSHOP_NUMBER=<two digit number for example 01>

export ADMIN_STORAGE_BUCKET=<ADMIN CLOUD STORAGE BUCKET>

اسکریپت bootstrap_workshop.sh را اجرا کنید. تکمیل این اسکریپت ممکن است چند دقیقه طول بکشد.

cd asm
./scripts/bootstrap_workshop.sh --org-name ${ORGANIZATION_NAME} --billing-id ${ADMIN_BILLING_ID} --workshop-num ${WORKSHOP_NUMBER} --admin-gcs-bucket ${ADMIN_STORAGE_BUCKET} --set-up-for-admin

پس از تکمیل اسکریپت bootstrap_workshop.sh، یک پوشه GCP برای هر کاربر درون سازمان ایجاد می‌شود. درون این پوشه، یک پروژه terraform admin ایجاد می‌شود. پروژه terraform admin برای ایجاد بقیه منابع GCP مورد نیاز برای این کارگاه استفاده می‌شود. شما APIهای مورد نیاز را در پروژه terraform admin فعال می‌کنید. شما از Cloud Build برای اعمال برنامه‌های Terraform استفاده می‌کنید. شما به حساب سرویس Cloud Build نقش‌های IAM مناسب را می‌دهید تا بتواند منابع را در GCP ایجاد کند. در نهایت، یک backend از راه دور را در یک سطل Google Cloud Storage (GCS) پیکربندی می‌کنید تا حالت‌های Terraform را برای همه منابع GCP ذخیره کند.

برای مشاهده وظایف Cloud Build در پروژه terraform admin، به شناسه پروژه terraform admin نیاز دارید. این شناسه در فایل vars/vars.sh در دایرکتوری asm شما ذخیره می‌شود. این دایرکتوری فقط در صورتی که کارگاه را برای خودتان به عنوان مدیر راه‌اندازی می‌کنید، باقی می‌ماند.

فایل متغیرها را برای تنظیم متغیرهای محیطی منبع‌یابی کنید

echo "export WORKDIR=$WORKDIR" >> $WORKDIR/asm/vars/vars.sh
source $WORKDIR/asm/vars/vars.sh

راه‌اندازی کارگاه برای چندین کاربر (راه‌اندازی چند کاربره)

Cloud Shell را باز کنید، تمام اقدامات زیر را در Cloud Shell انجام دهید. روی لینک زیر کلیک کنید.

پوسته ابری

تأیید کنید که با کاربر ادمین مورد نظر وارد gcloud شده‌اید.

gcloud config list

یک WORKDIR ایجاد کنید و مخزن کارگاه را کلون کنید.

mkdir asm-workshop
cd asm-workshop
export WORKDIR=`pwd`
git clone https://github.com/GoogleCloudPlatform/anthos-service-mesh-workshop.git asm

نام سازمان، شناسه صورتحساب، شماره کارگاه، شماره کاربر شروع و پایان و یک سطل GCS مدیر برای استفاده در کارگاه را تعریف کنید. مجوزهای لازم برای راه‌اندازی کارگاه را در بخش‌های بالا مرور کنید.

gcloud organizations list
export ORGANIZATION_NAME=<ORGANIZATION NAME>

gcloud beta billing accounts list
export ADMIN_BILLING_ID=<BILLING ID>

export WORKSHOP_NUMBER=<two digit number for example 01>

export START_USER_NUMBER=<number for example 1>

export END_USER_NUMBER=<number greater or equal to START_USER_NUM>

export ADMIN_STORAGE_BUCKET=<ADMIN CLOUD STORAGE BUCKET>

اسکریپت bootstrap_workshop.sh را اجرا کنید. تکمیل این اسکریپت ممکن است چند دقیقه طول بکشد.

cd asm
./scripts/bootstrap_workshop.sh --org-name ${ORGANIZATION_NAME} --billing-id ${ADMIN_BILLING_ID} --workshop-num ${WORKSHOP_NUMBER} --start-user-num ${START_USER_NUMBER} --end-user-num ${END_USER_NUMBER} --admin-gcs-bucket ${ADMIN_STORAGE_BUCKET}

فایل workshop.txt را از مخزن GCS ادمین دریافت کنید تا شناسه‌های پروژه‌های terraform را بازیابی کنید.

export WORKSHOP_ID="$(date '+%y%m%d')-${WORKSHOP_NUMBER}"
gsutil cp gs://${ADMIN_STORAGE_BUCKET}/${ORGANIZATION_NAME}/${WORKSHOP_ID}/workshop.txt .

۴. آماده‌سازی و چیدمان آزمایشگاه

مسیر آزمایشگاهی خود را انتخاب کنید

آزمایشگاه‌های این کارگاه می‌توانند به یکی از دو روش زیر انجام شوند:

روش « اسکریپت‌های تعاملی آسان و سریع »
روش « کپی و پیست دستی هر دستورالعمل »

روش اسکریپت‌های سریع به شما این امکان را می‌دهد که برای هر آزمایشگاه، یک اسکریپت تعاملی واحد اجرا کنید که با اجرای خودکار دستورات مربوط به آن آزمایشگاه، شما را در طول آن راهنمایی می‌کند. دستورات به صورت دسته‌ای اجرا می‌شوند و توضیحات مختصری در مورد هر مرحله و نتایج آنها ارائه می‌شود. پس از هر دسته، از شما خواسته می‌شود که به دسته بعدی دستورات بروید. به این ترتیب می‌توانید آزمایشگاه‌ها را با سرعت خود اجرا کنید. اسکریپت‌های سریع، خودتوان هستند، به این معنی که می‌توانید این اسکریپت‌ها را چندین بار اجرا کنید و به نتیجه یکسانی برسید.

اسکریپت‌های مسیر سریع در بالای هر آزمایشگاه در یک کادر سبز رنگ، همانطور که در زیر نشان داده شده است، ظاهر می‌شوند.

روش کپی و چسباندن، روش سنتی کپی و چسباندن بلوک‌های فرمان به صورت جداگانه به همراه توضیحات مربوط به دستورات است. این روش فقط برای یک بار اجرا در نظر گرفته شده است. هیچ تضمینی وجود ندارد که اجرای مجدد دستورات در این روش، نتایج مشابهی را به شما ارائه دهد.

هنگام انجام آزمایش‌ها، لطفاً یکی از دو روش را انتخاب کنید.

تنظیم سریع اسکریپت

دریافت اطلاعات کاربر

این کارگاه با استفاده از یک حساب کاربری موقت (یا یک حساب آزمایشگاهی) که توسط مدیر کارگاه ایجاد شده است، انجام می‌شود. حساب آزمایشگاهی مالک تمام پروژه‌های کارگاه است. مدیر کارگاه، اطلاعات حساب آزمایشگاهی (نام کاربری و رمز عبور) را در اختیار کاربری که کارگاه را انجام می‌دهد، قرار می‌دهد. همه پروژه‌های کاربر با نام کاربری حساب آزمایشگاهی پیشوند می‌گیرند، به عنوان مثال برای حساب آزمایشگاهی user001@yourcompany.com ، شناسه پروژه مدیر terraform، user001-200131-01-tf-abcde و به همین ترتیب برای بقیه پروژه‌ها خواهد بود. هر کاربر باید با حساب آزمایشگاهی ارائه شده توسط مدیر کارگاه وارد شود و کارگاه را با استفاده از حساب آزمایشگاهی انجام دهد.

با کلیک روی لینک زیر، Cloud Shell را باز کنید.

پوسته ابری

با اطلاعات حساب کاربری آزمایشگاه وارد شوید (با حساب کاربری شرکتی یا شخصی خود وارد نشوید). حساب کاربری آزمایشگاه به شکل userXYZ@<workshop_domain>.com خواهد بود.
از آنجایی که این یک حساب کاربری جدید است، از شما خواسته می‌شود که شرایط خدمات گوگل را بپذیرید. روی «پذیرش» کلیک کنید.

۴. در صفحه بعد، کادر موافقت با شرایط خدمات گوگل را علامت بزنید و روی Start Cloud Shell کلیک کنید.

این مرحله یک ماشین مجازی کوچک لینوکس دبیان را برای شما فراهم می‌کند تا بتوانید از آن برای دسترسی به منابع GCP استفاده کنید. هر حساب کاربری یک ماشین مجازی Cloud Shell دریافت می‌کند. ورود به سیستم با حساب آزمایشگاهی، شما را با استفاده از اعتبارنامه‌های حساب آزمایشگاهی وارد سیستم می‌کند. علاوه بر Cloud Shell، یک ویرایشگر کد نیز فراهم شده است که ویرایش فایل‌های پیکربندی (terraform، YAML و غیره) را آسان‌تر می‌کند. به طور پیش‌فرض، صفحه Cloud Shell به محیط پوسته Cloud Shell (در پایین) و ویرایشگر کد Cloud (در بالا) تقسیم می‌شود. مداد و اعلان پوسته آیکون‌های گوشه بالا سمت راست به شما امکان می‌دهند بین این دو (پوسته و ویرایشگر کد) جابجا شوید. همچنین می‌توانید نوار جداکننده وسط را (به بالا یا پایین) بکشید و اندازه هر پنجره را به صورت دستی تغییر دهید. ۵. یک WORKDIR برای این کارگاه ایجاد کنید. WORKDIR پوشه‌ای است که تمام آزمایش‌های این کارگاه را از آن انجام می‌دهید. دستورات زیر را در Cloud Shell اجرا کنید تا WORKDIR ایجاد شود.

mkdir -p ${HOME}/asm-workshop
cd ${HOME}/asm-workshop
export WORKDIR=`pwd`

کاربر حساب آزمایشگاهی را به عنوان متغیری که برای این کارگاه استفاده خواهد شد، صادر کنید. این همان حسابی است که با آن وارد Cloud Shell شده‌اید.

export MY_USER=<LAB ACCOUNT EMAIL PROVIDED BY THE WORKSHOP ADMIN>
# For example export MY_USER=user001@gcpworkshops.com

متغیرهای WORKDIR و MY_USER را با اجرای دستورات زیر، برای اطمینان از تنظیم صحیح، چاپ کنید.

echo "WORKDIR set to ${WORKDIR}" && echo "MY_USER set to ${MY_USER}"

مخزن کارگاه را کلون کنید.

git clone https://github.com/GoogleCloudPlatform/anthos-service-mesh-workshop.git ${WORKDIR}/asm

۵. راه‌اندازی زیرساخت - گردش کار کاربر

هدف: تأیید زیرساخت و نصب Istio

نصب ابزار کارگاهی
مخزن کارگاه کلون
نصب Infrastructure را تأیید کنید
نصب k8s-repo تأیید کنید
نصب Istio را تأیید کنید

دستورالعمل‌های آزمایشگاهی روش کپی و چسباندن

دریافت اطلاعات کاربر

مدیری که کارگاه را راه‌اندازی می‌کند باید اطلاعات نام کاربری و رمز عبور را در اختیار کاربر قرار دهد. همه پروژه‌های کاربر با پیشوند نام کاربری مشخص می‌شوند، به عنوان مثال برای کاربر user001@yourcompany.com ، شناسه پروژه مدیر terraform، user001-200131-01-tf-abcde خواهد بود و به همین ترتیب برای بقیه پروژه‌ها. هر کاربر فقط به محیط کارگاه خود دسترسی دارد.

ابزار مورد نیاز برای کارگاه

جی‌کلاود (نسخه >= ۲۷۰)
کوبکتل
sed (با sed در Cloud Shell/Linux کار می‌کند و نه Mac OS)
git (مطمئن شوید که به‌روز هستید)
sudo apt update
sudo apt install git
جی کیو
envsub
سفارشی‌سازی
پی وی

دسترسی به پروژه مدیریت Terraform

پس از تکمیل اسکریپت bootstrap_workshop.sh، یک پوشه GCP برای هر کاربر درون سازمان ایجاد می‌شود. درون این پوشه، یک پروژه terraform admin ایجاد می‌شود. پروژه terraform admin برای ایجاد بقیه منابع GCP مورد نیاز برای این کارگاه استفاده می‌شود. اسکریپت setup-terraform-admin-project.sh APIهای مورد نیاز را در پروژه terraform admin فعال می‌کند. Cloud Build برای اعمال برنامه‌های Terraform استفاده می‌شود. از طریق این اسکریپت، شما به حساب سرویس Cloud Build نقش‌های IAM مناسب را می‌دهید تا بتواند منابع را در GCP ایجاد کند. در نهایت، یک backend از راه دور در یک سطل Google Cloud Storage (GCS) پیکربندی می‌شود تا حالت‌های Terraform را برای همه منابع GCP ذخیره کند.

برای مشاهده وظایف Cloud Build در پروژه terraform admin، به شناسه پروژه terraform admin نیاز دارید. این شناسه در سطل GCS admin که در اسکریپت bootstrap مشخص شده است، ذخیره می‌شود. اگر اسکریپت bootstrap را برای چندین کاربر اجرا کنید، تمام شناسه‌های پروژه terraform admin در سطل GCS قرار می‌گیرند.

با کلیک روی لینک زیر، Cloud Shell را باز کنید (اگر از قبل از بخش تنظیمات و آماده‌سازی آزمایشگاه باز نشده است).

پوسته ابری

kustomize را (اگر از قبل نصب نشده است) در پوشه $HOME/bin نصب کنید و پوشه $HOME/bin را به $PATH اضافه کنید.

mkdir -p $HOME/bin
cd $HOME/bin
curl -s "https://raw.githubusercontent.com/\
kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash
cd $HOME
export PATH=$PATH:${HOME}/bin
echo "export PATH=$PATH:$HOME/bin" >> $HOME/.bashrc

pv را نصب کنید و آن را به $HOME/bin/pv منتقل کنید.

sudo apt-get update && sudo apt-get -y install pv
sudo mv /usr/bin/pv ${HOME}/bin/pv

خط فرمان bash خود را به‌روزرسانی کنید.

cp $WORKDIR/asm/scripts/krompt.bash $HOME/.krompt.bash
echo "export PATH=\$PATH:\$HOME/bin" >> $HOME/.asm-workshop.bash
echo "source $HOME/.krompt.bash" >> $HOME/.asm-workshop.bash

alias asm-init='source $HOME/.asm-workshop.bash' >> $HOME/.bashrc
echo "source $HOME/.asm-workshop.bash" >> $HOME/.bashrc
source $HOME/.bashrc

تأیید کنید که با حساب کاربری مورد نظر خود وارد gcloud شده‌اید.

echo "Check logged in user output from the next command is $MY_USER"
gcloud config list account --format=json | jq -r .core.account

با اجرای دستور زیر، شناسه پروژه ادمین Terraform خود را دریافت کنید:

export TF_ADMIN=$(gcloud projects list | grep tf- | awk '{ print $1 }')
echo $TF_ADMIN

تمام منابع مرتبط با کارگاه به عنوان متغیر در یک فایل vars.sh ذخیره می‌شوند که در یک سطل GCS در پروژه terraform admin ذخیره می‌شود. فایل vars.sh را برای پروژه terraform admin خود دریافت کنید.

mkdir $WORKDIR/asm/vars
gsutil cp gs://$TF_ADMIN/vars/vars.sh $WORKDIR/asm/vars/vars.sh
echo "export WORKDIR=$WORKDIR" >> $WORKDIR/asm/vars/vars.sh

روی پیوند نمایش داده شده کلیک کنید تا صفحه ساخت ابری برای پروژه مدیریت Terraform باز شود و تأیید کنید که ساخت با موفقیت انجام شده است.

source $WORKDIR/asm/vars/vars.sh
echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_ADMIN}"

اگر برای اولین بار به Cloud Console دسترسی پیدا می‌کنید، با شرایط خدمات Google موافقت کنید.

اکنون که به صفحه Cloud Build نگاه می‌کنید، روی لینک History از منوی سمت چپ کلیک کنید و روی آخرین ساخت کلیک کنید تا جزئیات اعمال اولیه Terraform را مشاهده کنید. منابع زیر به عنوان بخشی از اسکریپت Terraform ایجاد شده‌اند. همچنین می‌توانید به نمودار معماری بالا مراجعه کنید.

۴ پروژه GCP در سازمان. حساب صورتحساب ارائه شده با هر پروژه مرتبط است.
یکی از پروژه‌ها network host project برای VPC مشترک است. هیچ منبع دیگری در این پروژه ایجاد نمی‌شود.
یکی از پروژه‌ها، ops project است که برای کلاسترهای GKE صفحه کنترل Istio استفاده می‌شود.
دو پروژه، دو تیم توسعه مختلف را نشان می‌دهند که روی سرویس‌های مربوطه خود کار می‌کنند.
دو خوشه GKE در هر یک از سه پروژه ops ، dev1 و dev2 ایجاد می‌شوند.
یک مخزن CSR با نام k8s-repo ایجاد می‌شود که شامل شش پوشه برای فایل‌های مانیفست Kubernetes است. یک پوشه به ازای هر کلاستر GKE. این مخزن برای استقرار مانیفست‌های Kubernetes در کلاسترها به روش GitOps استفاده می‌شود.
یک تریگر Cloud Build ایجاد می‌شود تا هر زمان که کامیتی به شاخه اصلی k8s-repo ایجاد شود، مانیفست‌های Kubernetes را از پوشه‌های مربوطه در خوشه‌های GKE مستقر کند.

پس از اتمام ساخت در terraform admin project ساخت دیگری در پروژه ops آغاز خواهد شد. روی پیوند نمایش داده شده کلیک کنید تا صفحه Cloud Build برای ops project باز شود و تأیید کنید که ساخت Cloud در k8s-repo با موفقیت به پایان رسیده است.

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_VAR_ops_project_name}"

تأیید نصب

فایل‌های kubeconfig را برای همه کلاسترها ایجاد کنید. اسکریپت زیر را اجرا کنید.

$WORKDIR/asm/scripts/setup-gke-vars-kubeconfig.sh

این اسکریپت یک فایل kubeconfig جدید در پوشه gke با نام kubemesh ایجاد می‌کند.

متغیر KUBECONFIG را طوری تغییر دهید که به فایل جدید kubeconfig اشاره کند.

source $WORKDIR/asm/vars/vars.sh
export KUBECONFIG=$WORKDIR/asm/gke/kubemesh

متغیرهای vars.sh و KUBECONFIG را به فایل .bashrc در Cloud Shell اضافه کنید تا هر بار که Cloud Shell مجدداً راه‌اندازی می‌شود، این متغیرها نیز فراخوانی شوند.

echo "source ${WORKDIR}/asm/vars/vars.sh" >> $HOME/.bashrc
echo "export KUBECONFIG=${WORKDIR}/asm/gke/kubemesh" >> $HOME/.bashrc

زمینه‌های خوشه‌بندی خود را فهرست کنید. باید شش خوشه ببینید.

kubectl config view -ojson | jq -r '.clusters[].name'

    `Output (do not copy)`

gke_tf05-01-ops_us-central1_gke-asm-2-r2-prod
gke_tf05-01-ops_us-west1_gke-asm-1-r1-prod
gke_tf05-02-dev1_us-west1-a_gke-1-apps-r1a-prod
gke_tf05-02-dev1_us-west1-b_gke-2-apps-r1b-prod
gke_tf05-03-dev2_us-central1-a_gke-3-apps-r2a-prod
gke_tf05-03-dev2_us-central1-b_gke-4-apps-r2b-prod

نصب Istio را تأیید کنید

با بررسی اینکه همه پادها در حال اجرا هستند و کارها به پایان رسیده‌اند، مطمئن شوید که Istio روی هر دو کلاستر نصب شده است.

kubectl --context ${OPS_GKE_1} get pods -n istio-system
kubectl --context ${OPS_GKE_2} get pods -n istio-system

    `Output (do not copy)`

NAME                                      READY   STATUS    RESTARTS   AGE
grafana-5f798469fd-z9f98                  1/1     Running   0          6m21s
istio-citadel-568747d88-qdw64             1/1     Running   0          6m26s
istio-egressgateway-8f454cf58-ckw7n       1/1     Running   0          6m25s
istio-galley-6b9495645d-m996v             2/2     Running   0          6m25s
istio-ingressgateway-5df799fdbd-8nqhj     1/1     Running   0          2m57s
istio-pilot-67fd786f65-nwmcb              2/2     Running   0          6m24s
istio-policy-74cf89cb66-4wrpl             2/2     Running   1          6m25s
istio-sidecar-injector-759bf6b4bc-mw4vf   1/1     Running   0          6m25s
istio-telemetry-77b6dfb4ff-zqxzz          2/2     Running   1          6m24s
istio-tracing-cd67ddf8-n4d7k              1/1     Running   0          6m25s
istiocoredns-5f7546c6f4-g7b5c             2/2     Running   0          6m39s
kiali-7964898d8c-5twln                    1/1     Running   0          6m23s
prometheus-586d4445c7-xhn8d               1/1     Running   0          6m25s

    `Output (do not copy)`

NAME                                      READY   STATUS    RESTARTS   AGE
grafana-5f798469fd-2s8k4                  1/1     Running   0          59m
istio-citadel-568747d88-87kdj             1/1     Running   0          59m
istio-egressgateway-8f454cf58-zj9fs       1/1     Running   0          60m
istio-galley-6b9495645d-qfdr6             2/2     Running   0          59m
istio-ingressgateway-5df799fdbd-2c9rc     1/1     Running   0          60m
istio-pilot-67fd786f65-nzhx4              2/2     Running   0          59m
istio-policy-74cf89cb66-4bc7f             2/2     Running   3          59m
istio-sidecar-injector-759bf6b4bc-grk24   1/1     Running   0          59m
istio-telemetry-77b6dfb4ff-6zr94          2/2     Running   4          60m
istio-tracing-cd67ddf8-grs9g              1/1     Running   0          60m
istiocoredns-5f7546c6f4-gxd66             2/2     Running   0          60m
kiali-7964898d8c-nhn52                    1/1     Running   0          59m
prometheus-586d4445c7-xr44v               1/1     Running   0          59m

مطمئن شوید که Istio روی هر دو کلاستر dev1 نصب شده است. فقط Citadel، sidecar-injector و coredns در کلاسترهای dev1 اجرا می‌شوند. آنها یک صفحه کنترل Istio را که در کلاستر ops-1 اجرا می‌شود، به اشتراک می‌گذارند.

kubectl --context ${DEV1_GKE_1} get pods -n istio-system
kubectl --context ${DEV1_GKE_2} get pods -n istio-system

مطمئن شوید که Istio روی هر دو کلاستر dev2 نصب شده است. فقط Citadel، sidecar-injector و coredns در کلاسترهای dev2 اجرا می‌شوند. آنها یک صفحه کنترل Istio را که در کلاستر ops-2 اجرا می‌شود، به اشتراک می‌گذارند.

kubectl --context ${DEV2_GKE_1} get pods -n istio-system
kubectl --context ${DEV2_GKE_2} get pods -n istio-system

    `Output (do not copy)`

NAME                                      READY   STATUS    RESTARTS   AGE
istio-citadel-568747d88-4lj9b             1/1     Running   0          66s
istio-sidecar-injector-759bf6b4bc-ks5br   1/1     Running   0          66s
istiocoredns-5f7546c6f4-qbsqm             2/2     Running   0          78s

تأیید کشف سرویس برای صفحات کنترل مشترک

به صورت اختیاری، تأیید کنید که اسرار مستقر شده‌اند.

kubectl --context ${OPS_GKE_1} get secrets -l istio/multiCluster=true -n istio-system
kubectl --context ${OPS_GKE_2} get secrets -l istio/multiCluster=true -n istio-system

    `Output (do not copy)`

For OPS_GKE_1:
NAME                  TYPE     DATA   AGE
gke-1-apps-r1a-prod   Opaque   1      8m7s
gke-2-apps-r1b-prod   Opaque   1      8m7s
gke-3-apps-r2a-prod   Opaque   1      44s
gke-4-apps-r2b-prod   Opaque   1      43s

For OPS_GKE_2:
NAME                  TYPE     DATA   AGE
gke-1-apps-r1a-prod   Opaque   1      40s
gke-2-apps-r1b-prod   Opaque   1      40s
gke-3-apps-r2a-prod   Opaque   1      8m4s
gke-4-apps-r2b-prod   Opaque   1      8m4s

در این کارگاه، شما از یک VPC مشترک واحد استفاده می‌کنید که در آن همه خوشه‌های GKE ایجاد می‌شوند. برای کشف سرویس‌ها در سراسر خوشه‌ها، از فایل‌های kubeconfig (برای هر یک از خوشه‌های برنامه) که به عنوان رمز در خوشه‌های ops ایجاد شده‌اند، استفاده می‌کنید. Pilot با پرس و جو از سرور Kube API خوشه‌های برنامه (که از طریق رمزهای بالا احراز هویت شده‌اند) از این رمزها برای کشف سرویس‌ها استفاده می‌کند. می‌بینید که هر دو خوشه ops می‌توانند با استفاده از رمزهای ایجاد شده توسط kubeconfig به همه خوشه‌های برنامه احراز هویت شوند. خوشه‌های Ops می‌توانند به طور خودکار با استفاده از فایل‌های kubeconfig به عنوان یک روش رمز، سرویس‌ها را کشف کنند. این امر مستلزم آن است که Pilot در خوشه‌های ops به سرور Kube API همه خوشه‌های دیگر دسترسی داشته باشد. اگر Pilot نتواند به سرورهای Kube API دسترسی پیدا کند، شما به صورت دستی سرویس‌های راه دور را به عنوان ServiceEntries اضافه می‌کنید. می‌توانید ServiceEntries را به عنوان ورودی‌های DNS در رجیستری سرویس خود در نظر بگیرید. ServiceEntries یک سرویس را با استفاده از یک نام DNS کاملاً واجد شرایط ( FQDN ) و یک آدرس IP که می‌توان به آن دسترسی پیدا کرد، تعریف می‌کند. برای اطلاعات بیشتر به مستندات Istio Multicluster مراجعه کنید.

۶. توضیح مخزن زیرساخت

ساخت زیرساخت ابری

منابع GCP برای کارگاه با استفاده از Cloud Build و یک مخزن CSR infrastructure ساخته می‌شوند. شما همین الان یک اسکریپت bootstrap (واقع در scripts/bootstrap_workshop.sh ) را از ترمینال محلی خود اجرا کردید. اسکریپت bootstrap یک پوشه GCP، یک پروژه terraform admin و مجوزهای IAM مناسب برای حساب سرویس Cloud Build ایجاد می‌کند. پروژه Terraform admin برای ذخیره حالت‌های terraform، گزارش‌ها و اسکریپت‌های متفرقه استفاده می‌شود. این پروژه شامل infrastructure و مخزن‌های CSR k8s_repo است. این مخزن‌ها در بخش بعدی به تفصیل توضیح داده شده‌اند. هیچ منبع کارگاه دیگری در پروژه terraform admin ساخته نشده است. حساب سرویس Cloud Build در پروژه terraform admin برای ساخت منابع برای کارگاه استفاده می‌شود.

یک فایل cloudbuild.yaml که در پوشه infrastructure قرار دارد، برای ساخت منابع GCP برای کارگاه استفاده می‌شود. این فایل یک تصویر سازنده سفارشی با تمام ابزارهای مورد نیاز برای ایجاد منابع GCP ایجاد می‌کند. این ابزارها شامل gcloud SDK، terraform و سایر ابزارهای کاربردی مانند python، git، jq و غیره هستند. تصویر سازنده سفارشی، terraform plan اجرا می‌کند و برای هر منبع apply . فایل‌های terraform هر منبع در پوشه‌های جداگانه قرار دارند (جزئیات در بخش بعدی). منابع به صورت تک تک و به ترتیب نحوه ساخت معمول آنها ساخته می‌شوند (به عنوان مثال، یک پروژه GCP قبل از ایجاد منابع در پروژه ساخته می‌شود). لطفاً برای جزئیات بیشتر، فایل cloudbuild.yaml را بررسی کنید.

Cloud Build هر زمان که یک commit به مخزن infrastructure repo) ارسال شود، فعال می‌شود. هر تغییری که در زیرساخت ایجاد شود، به صورت Infrastructure as code (IaC) ذخیره شده و در مخزن ثبت می‌شود. وضعیت کارگاه شما همیشه در این مخزن ذخیره می‌شود.

ساختار پوشه - تیم‌ها، محیط‌ها و منابع

مخزن زیرساخت، منابع زیرساخت GCP را برای کارگاه تنظیم می‌کند. این مخزن در پوشه‌ها و زیرپوشه‌ها ساختار یافته است. پوشه‌های پایه درون مخزن، team را نشان می‌دهند که منابع خاص GCP را در اختیار دارند. لایه بعدی پوشه‌ها، environment خاص تیم را نشان می‌دهند (برای مثال dev، stage، prod). لایه بعدی پوشه‌ها درون محیط، resource خاص را نشان می‌دهند (برای مثال host_project، gke_clusters و غیره). اسکریپت‌ها و فایل‌های terraform مورد نیاز در پوشه‌های منابع وجود دارند.

چهار نوع تیم زیر در این کارگاه حضور دارند:

زیرساخت - نماینده تیم زیرساخت ابری است. آنها مسئول ایجاد منابع GCP برای همه تیم‌های دیگر هستند. آنها از پروژه مدیریت Terraform برای منابع خود استفاده می‌کنند. خود مخزن زیرساخت و همچنین فایل‌های وضعیت Terraform (که در زیر توضیح داده شده است) در پروژه مدیریت Terraform قرار دارند. این منابع توسط یک اسکریپت bash در طول فرآیند بوت‌استرپ ایجاد می‌شوند (برای جزئیات بیشتر به ماژول 0 - گردش کار مدیر مراجعه کنید).
شبکه - نماینده تیم شبکه است. آنها مسئول VPC و منابع شبکه هستند. آنها مالک منابع GCP زیر هستند.
host project - نشان‌دهنده پروژه میزبان VPC مشترک است.
shared VPC - نشان دهنده VPC مشترک، زیرشبکه‌ها، محدوده‌های IP ثانویه، مسیرها و قوانین فایروال است.
عملیات - نماینده تیم عملیات/توسعه عملیات است. آنها منابع زیر را در اختیار دارند.
ops project - نشان‌دهنده یک پروژه برای تمام منابع عملیات است.
gke clusters - یک خوشه ops GKE به ازای هر منطقه. صفحه کنترل Istio در هر یک از خوشه‌های ops GKE نصب شده است.
k8s-repo - یک مخزن CSR که شامل مانیفست‌های GKE برای همه خوشه‌های GKE است.
apps - نماینده تیم‌های برنامه کاربردی است. این کارگاه دو تیم به نام‌های app1 و app2 را شبیه‌سازی می‌کند. آنها منابع زیر را در اختیار دارند.
app projects - هر تیم اپلیکیشن، مجموعه‌ای از پروژه‌های خود را دارد. این به آنها اجازه می‌دهد تا صورتحساب و IAM را برای پروژه خاص خود کنترل کنند.
gke clusters - اینها خوشه‌های برنامه‌ای هستند که کانتینرها/پادهای برنامه در آنها اجرا می‌شوند.
gce instances - به صورت اختیاری، اگر برنامه‌هایی داشته باشند که روی نمونه‌های GCE اجرا می‌شوند. در این کارگاه، app1 چند نمونه GCE دارد که بخشی از برنامه روی آنها اجرا می‌شود.

در این کارگاه، یک اپلیکیشن (اپلیکیشن فروشگاه Hipster) هم اپلیکیشن ۱ و هم اپلیکیشن ۲ را نمایش می‌دهد.

ارائه دهنده، وضعیت‌ها و خروجی‌ها - بک‌اندها و وضعیت‌های مشترک

ارائه‌دهندگان google و google-beta در آدرس gcp/[environment]/gcp/provider.tf قرار دارند. فایل provider.tf در هر پوشه منبع به صورت سیم‌لینک شده است. این به شما امکان می‌دهد به جای مدیریت جداگانه ارائه‌دهندگان برای هر منبع، ارائه‌دهنده را در یک مکان تغییر دهید.

هر منبع حاوی یک فایل backend.tf است که محل فایل tfstate منبع را تعریف می‌کند. این فایل backend.tf از یک الگو (واقع در templates/backend.tf_tmpl ) با استفاده از یک اسکریپت (واقع در scripts/setup_terraform_admin_project ) تولید شده و سپس در پوشه منبع مربوطه قرار می‌گیرد. سطل‌های ذخیره‌سازی ابری گوگل (GCS) برای backendها استفاده می‌شوند. نام پوشه سطل GCS با نام منبع مطابقت دارد. همه backendهای منبع در پروژه terraform admin قرار دارند.

منابعی که مقادیر وابسته به هم دارند، حاوی یک فایل output.tf هستند. مقادیر خروجی مورد نیاز در فایل tfstate که در backend برای آن منبع خاص تعریف شده است، ذخیره می‌شوند. به عنوان مثال، برای ایجاد یک خوشه GKE در یک پروژه، باید شناسه پروژه را بدانید. شناسه پروژه از طریق output.tf به فایل tfstate ارسال می‌شود که می‌تواند از طریق منبع داده terraform_remote_state در منبع خوشه GKE مورد استفاده قرار گیرد.

فایل shared_state یک منبع داده terraform_remote_state است که به فایل tfstate یک منبع اشاره می‌کند. یک فایل (یا فایل‌های) shared_state_[resource_name].tf در پوشه‌های منابعی وجود دارند که به خروجی‌هایی از منابع دیگر نیاز دارند. برای مثال، در پوشه منبع ops_gke ، فایل‌های shared_state از منابع ops_project و shared_vpc وجود دارند، زیرا برای ایجاد خوشه‌های GKE در پروژه ops به شناسه پروژه و جزئیات VPC نیاز دارید. فایل‌های shared_state از یک الگو (واقع در templates/shared_state.tf_tmpl ) با استفاده از یک اسکریپت (واقع در scripts/setup_terraform_admin_project ) تولید می‌شوند. تمام فایل‌های shared_state منابع در پوشه gcp/[environment]/shared_states قرار دارند. فایل‌های shared_state مورد نیاز در پوشه‌های منابع مربوطه به صورت سیم‌لینک قرار گرفته‌اند. قرار دادن تمام فایل‌های shared_state در یک پوشه و لینک کردن آن‌ها به پوشه‌های منبع مناسب، مدیریت تمام فایل‌های state را در یک مکان واحد آسان می‌کند.

متغیرها

تمام مقادیر منابع به عنوان متغیرهای محیطی ذخیره می‌شوند. این متغیرها (به عنوان دستورات خروجی) در فایلی به نام vars.sh که در یک سطل GCS در پروژه terraform admin قرار دارد، ذخیره می‌شوند. این فایل شامل شناسه سازمان، حساب صورتحساب، شناسه‌های پروژه، جزئیات خوشه GKE و غیره است. می‌توانید vars.sh را از هر ترمینالی دانلود و سورس کنید تا مقادیر لازم برای تنظیمات خود را دریافت کنید.

متغیرهای Terraform در vars.sh به عنوان TF_VAR_[variable name] ذخیره می‌شوند. این متغیرها برای تولید فایل variables.tfvars در پوشه منبع مربوطه استفاده می‌شوند. فایل variables.tfvars شامل تمام متغیرها به همراه مقادیرشان است. فایل variables.tfvars از یک فایل الگو در همان پوشه با استفاده از یک اسکریپت (واقع در scripts/setup_terraform_admin_project ) تولید می‌شود.

توضیح مخزن K8s

k8s_repo یک مخزن CSR (جدا از مخزن infrastructure) است که در پروژه مدیریت Terraform قرار دارد. این مخزن برای ذخیره و اعمال مانیفست‌های GKE به تمام خوشه‌های GKE استفاده می‌شود. k8s_repo توسط infrastructure Cloud Build ایجاد می‌شود (برای جزئیات بیشتر به بخش قبلی مراجعه کنید). در طول فرآیند اولیه infrastructure Cloud Build، در مجموع شش خوشه GKE ایجاد می‌شود. در k8s_repo ، شش پوشه ایجاد می‌شود. هر پوشه (نامی که با نام خوشه GKE مطابقت دارد) مربوط به یک خوشه GKE است که حاوی فایل‌های مانیفست منابع مربوطه خود است. مشابه ساخت زیرساخت، Cloud Build برای اعمال مانیفست‌های Kubernetes به تمام خوشه‌های GKE با استفاده از k8s_repo استفاده می‌شود. Cloud Build هر زمان که یک commit به مخزن k8s_repo وجود داشته باشد، فعال می‌شود. مشابه infrastructure، تمام مانیفست‌های Kubernetes به عنوان کد در مخزن k8s_repo ذخیره می‌شوند و وضعیت هر خوشه GKE همیشه در پوشه مربوطه خود ذخیره می‌شود.

به عنوان بخشی از ساخت زیرساخت اولیه، k8s_repo ایجاد شده و Istio روی تمام کلاسترها نصب می‌شود.

پروژه‌ها، خوشه‌های GKE و فضاهای نام

منابع موجود در این کارگاه به پروژه‌های GCP مختلف تقسیم می‌شوند. پروژه‌ها باید با ساختار سازمانی (یا تیمی) شرکت شما مطابقت داشته باشند. تیم‌هایی (در سازمان شما) که مسئول پروژه‌ها/محصولات/منابع مختلف هستند، از پروژه‌های GCP مختلفی استفاده می‌کنند. داشتن پروژه‌های جداگانه به شما این امکان را می‌دهد که مجموعه‌های جداگانه‌ای از مجوزهای IAM ایجاد کنید و صورتحساب‌ها را در سطح پروژه مدیریت کنید. علاوه بر این، سهمیه‌ها نیز در سطح پروژه مدیریت می‌شوند.

پنج تیم در این کارگاه حضور دارند که هر کدام پروژه خود را دارند.

تیم زیرساختی که منابع GCP را می‌سازد، از Terraform admin project استفاده می‌کند. آن‌ها زیرساخت را به صورت کد در یک مخزن CSR (به نام infrastructure ) مدیریت می‌کنند و تمام اطلاعات وضعیت Terraform مربوط به منابع ساخته شده در GCP را در سطل‌های GCS ذخیره می‌کنند. آن‌ها دسترسی به مخزن CSR و سطل‌های GCS وضعیت Terraform را کنترل می‌کنند.
تیم شبکه‌ای که VPC مشترک را می‌سازد، از host project استفاده می‌کند. این پروژه شامل VPC، زیرشبکه‌ها، مسیرها و قوانین فایروال است. داشتن یک VPC مشترک به آنها اجازه می‌دهد تا شبکه‌بندی منابع GCP را به صورت مرکزی مدیریت کنند. همه پروژه‌ها از این VPC مشترک برای شبکه‌بندی استفاده می‌کردند.
تیم ops/platform که کلاسترهای GKE و صفحات کنترل ASM/Istio را می‌سازد، از ops project استفاده می‌کند. آنها چرخه حیات کلاسترهای GKE و شبکه سرویس را مدیریت می‌کنند. آنها مسئول مقاوم‌سازی کلاسترها، مدیریت انعطاف‌پذیری و مقیاس پلتفرم Kubernetes هستند. در این کارگاه، شما از روش gitops برای استقرار منابع در Kubernetes استفاده می‌کنید. یک مخزن CSR (به نام k8s_repo ) در پروژه ops وجود دارد.
Lastly, dev1 and dev2 teams (represent two development teams) that build applications use their own dev1 and dev2 projects . These are the applications and services you provide to your customers. These are built on the platform that the ops team manages. The resources (Deployments, Services etc) are pushed to the k8s_repo and get deployed to the appropriate clusters. It is important to note that this workshop does not focus on CI/CD best practices and tooling. You use Cloud Build to automate deploying Kubernetes resources to the GKE clusters directly. In real world production scenarios, you would use a proper CI/CD solution to deploy applications to GKE clusters.

There are two types of GKE clusters in this workshop.

Ops clusters - used by the ops team to run devops tools. In this workshop, they run the ASM/Istio control plane to manage the service mesh.
Application (apps) clusters - used by the development teams to run applications. In this workshop, the Hipster shop app is used.

Separating the ops/admin tooling from the clusters running the application allows you to manage the life cycle of each resource independently. The two types of clusters also exist in different projects pertaining to the team/product that uses them which makes IAM permissions also easier to manage.

There are a total of six GKE clusters. Two regional ops clusters are created in the ops project. ASM/Istio control plane is installed on both ops clusters. Each ops cluster is in a different region. In addition, there are four zonal application clusters. These are created in their own projects. This workshop simulates two development teams each with their own projects. Each project contains two app clusters. App clusters are zonal clusters in different zones. The four app clusters are located in two regions and four zones. This way you get regional and zonal redundancy.

The application used in this workshop, the Hipster Shop app, is deployed on all four app clusters. Each microservice lives in its own namespace in every app cluster. Hipster shop app Deployments (Pods) are not deployed on the ops clusters. However, the namespaces and Service resources for all microservices are also created in the ops clusters. ASM/Istio control plane uses the Kubernetes service registries for service discovery. In the absence of Services (in the ops clusters), you would have to manually create ServiceEntries for each service running in the app cluster.

You deploy a 10-tier microservices application in this workshop. The application is a web-based e-commerce app called " Hipster Shop " where users can browse items, add them to the cart, and purchase them.

Kubernetes manifests and k8s_repo

You use the k8s_repo to add Kubernetes resources to all GKE clusters. You do this by copying Kubernetes manifests and committing to the k8s_repo . All commits to the k8s_repo trigger a Cloud Build job which deploys the Kubernetes manifests to the respective cluster. Each cluster's manifest is located in a separate folder named the same as the cluster name.

The six cluster names are:

gke-asm-1-r1-prod - the regional ops cluster in region 1
gke-asm-2-r2-prod - the regional ops cluster in region 2
gke-1-apps-r1a-prod - the app cluster in region 1 zone a
gke-2-apps-r1b-prod - the app cluster in region 1 zone b
gke-3-apps-r2a-prod - the app cluster in region 2 zone a
gke-4-apps-r2b-prod - the app cluster in region 2 zone b

The k8s_repo has folders corresponding to these clusters. Any manifest placed in these folders get applied to the corresponding GKE cluster. Manifests for each cluster are placed in sub-folders (within the cluster's main folder) for ease of management. In this workshop, you use Kustomize to keep track of resources that get deployed. Please refer to the Kustomize official documentation for more details.

7. Deploy the Sample App

Objective: Deploy Hipster shop app on apps clusters

Clone k8s-repo
Copy Hipster shop manifests to all apps clusters
Create Services for Hipster shop app in the ops clusters
Setup loadgenerators in the ops clusters to test global connectivity
Verify secure connectivity to the Hipster shop app

Copy-and-Paste Method Lab Instructions

Clone the ops project source repo

As part of the initial Terraform infrastructure build, the k8s-repo is already created in the ops project.

Create an empty directory for git repo:

mkdir $WORKDIR/k8s-repo

Init git repo, add remote and pull master from remote repo:

cd $WORKDIR/k8s-repo
git init && git remote add origin \
https://source.developers.google.com/p/$TF_VAR_ops_project_name/r/k8s-repo

Set local git local configuration.

git config --local user.email $MY_USER
git config --local user.name "K8s repo user"
git config --local \
credential.'https://source.developers.google.com'.helper gcloud.sh
git pull origin master

Copy manifests, commit and push

Copy the Hipster Shop namespaces and services to the source repo for all clusters.

cp -r $WORKDIR/asm/k8s_manifests/prod/app/namespaces \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/namespaces \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/namespaces \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/namespaces \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/namespaces \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/namespaces \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app/.

cp -r $WORKDIR/asm/k8s_manifests/prod/app/services \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/services \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/services \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/services \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/services \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app/services \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app/.

Copy the app folder kustomization.yaml to all clusters.

cp $WORKDIR/asm/k8s_manifests/prod/app/kustomization.yaml \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app/
cp $WORKDIR/asm/k8s_manifests/prod/app/kustomization.yaml \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/
cp $WORKDIR/asm/k8s_manifests/prod/app/kustomization.yaml \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/
cp $WORKDIR/asm/k8s_manifests/prod/app/kustomization.yaml \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/
cp $WORKDIR/asm/k8s_manifests/prod/app/kustomization.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app/
cp $WORKDIR/asm/k8s_manifests/prod/app/kustomization.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app/

Copy the Hipster Shop Deployments, RBAC and PodSecurityPolicy to the source repo for the apps clusters.

cp -r $WORKDIR/asm/k8s_manifests/prod/app/deployments \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/deployments \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/deployments \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/deployments \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/

cp -r $WORKDIR/asm/k8s_manifests/prod/app/rbac \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/rbac \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/rbac \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/rbac \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/podsecuritypolicies \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/podsecuritypolicies \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/podsecuritypolicies \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/
cp -r $WORKDIR/asm/k8s_manifests/prod/app/podsecuritypolicies \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/

Remove the cartservice deployment, rbac and podsecuritypolicy from all but one dev cluster. Hipstershop was not built for multi-cluster deployment, so to avoid inconsistent results, we are using just one cartservice.

rm $WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/deployments/app-cart-service.yaml
rm $WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/podsecuritypolicies/cart-psp.yaml
rm $WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/app/rbac/cart-rbac.yaml

rm $WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/deployments/app-cart-service.yaml
rm $WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/podsecuritypolicies/cart-psp.yaml
rm $WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app/rbac/cart-rbac.yaml

rm $WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/deployments/app-cart-service.yaml
rm $WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/podsecuritypolicies/cart-psp.yaml
rm $WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/app/rbac/cart-rbac.yaml

Add cartservice deployment, rbac and podsecuritypolicy to kustomization.yaml in the first dev cluster only.

cd ${WORKDIR}/k8s-repo/${DEV1_GKE_1_CLUSTER}/app
cd deployments && kustomize edit add resource app-cart-service.yaml
cd ../podsecuritypolicies && kustomize edit add resource cart-psp.yaml
cd ../rbac && kustomize edit add resource cart-rbac.yaml
cd ${WORKDIR}/asm

Remove podsecuritypolicies, deployments and rbac directories from ops clusters kustomization.yaml

sed -i -e '/- deployments\//d' -e '/- podsecuritypolicies\//d' \
  -e '/- rbac\//d' \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app/kustomization.yaml
sed -i -e '/- deployments\//d' -e '/- podsecuritypolicies\//d' \
  -e '/- rbac\//d' \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app/kustomization.yaml

Replace the PROJECT_ID in the RBAC manifests.

sed -i 's/\${PROJECT_ID}/'${TF_VAR_dev1_project_name}'/g' \
${WORKDIR}/k8s-repo/${DEV1_GKE_1_CLUSTER}/app/rbac/*
sed -i 's/\${PROJECT_ID}/'${TF_VAR_dev1_project_name}'/g' \
${WORKDIR}/k8s-repo/${DEV1_GKE_2_CLUSTER}/app/rbac/*
sed -i 's/\${PROJECT_ID}/'${TF_VAR_dev2_project_name}'/g' \
${WORKDIR}/k8s-repo/${DEV2_GKE_1_CLUSTER}/app/rbac/*
sed -i 's/\${PROJECT_ID}/'${TF_VAR_dev2_project_name}'/g' \
${WORKDIR}/k8s-repo/${DEV2_GKE_2_CLUSTER}/app/rbac/*

Copy the IngressGateway and VirtualService manifests to the source repo for the ops clusters.

cp -r $WORKDIR/asm/k8s_manifests/prod/app-ingress/* \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-ingress/
cp -r $WORKDIR/asm/k8s_manifests/prod/app-ingress/* \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-ingress/

Copy the Config Connector resources to one of clusters in each project.

cp -r $WORKDIR/asm/k8s_manifests/prod/app-cnrm/* \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-cnrm/
cp -r $WORKDIR/asm/k8s_manifests/prod/app-cnrm/* \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app-cnrm/
cp -r $WORKDIR/asm/k8s_manifests/prod/app-cnrm/* \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app-cnrm/

Replace the PROJECT_ID in the Config Connector manifests.

sed -i 's/${PROJECT_ID}/'$TF_VAR_ops_project_name'/g' \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-cnrm/*
sed -i 's/${PROJECT_ID}/'$TF_VAR_dev1_project_name'/g' \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/app-cnrm/*
sed -i 's/${PROJECT_ID}/'$TF_VAR_dev2_project_name'/g' \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/app-cnrm/*

Copy loadgenerator manifests (Deployment, PodSecurityPolicy and RBAC) to the ops clusters. The Hipster shop app is exposed using a global Google Cloud Load Balancer (GCLB). GCLB receives client traffic (destined to frontend ) and sends it to the closest instance of the Service. Putting loadgenerator on both ops clusters will ensure traffic to being sent to both Istio Ingress gateways running in the ops clusters. Load balancing is explained in detail in the following section.

cp -r $WORKDIR/asm/k8s_manifests/prod/app-loadgenerator/. \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-loadgenerator/.
cp -r $WORKDIR/asm/k8s_manifests/prod/app-loadgenerator/. \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-loadgenerator/.

Replace the ops project ID in the loadgenerator manifests for both ops clusters.

sed -i 's/OPS_PROJECT_ID/'$TF_VAR_ops_project_name'/g'  \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-loadgenerator/loadgenerator-deployment.yaml
sed -i 's/OPS_PROJECT_ID/'$TF_VAR_ops_project_name'/g' \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-loadgenerator/loadgenerator-rbac.yaml
sed -i 's/OPS_PROJECT_ID/'$TF_VAR_ops_project_name'/g' \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-loadgenerator/loadgenerator-deployment.yaml
sed -i 's/OPS_PROJECT_ID/'$TF_VAR_ops_project_name'/g' \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-loadgenerator/loadgenerator-rbac.yaml

Add the loadgenerator resources to kustomization.yaml for both ops clusters.

cd $WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-loadgenerator/
kustomize edit add resource loadgenerator-psp.yaml
kustomize edit add resource loadgenerator-rbac.yaml
kustomize edit add resource loadgenerator-deployment.yaml

cd $WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-loadgenerator/
kustomize edit add resource loadgenerator-psp.yaml
kustomize edit add resource loadgenerator-rbac.yaml
kustomize edit add resource loadgenerator-deployment.yaml

Commit to k8s-repo .

cd $WORKDIR/k8s-repo
git add . && git commit -am "create app namespaces and install hipster shop"
git push --set-upstream origin master

View the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_VAR_ops_project_name}"

Verify Application deployment

Verify pods in all application namespaces except cart are in Running state in all dev clusters.

for ns in ad checkout currency email frontend payment product-catalog recommendation shipping; do
  kubectl --context $DEV1_GKE_1 get pods -n $ns;
  kubectl --context $DEV1_GKE_2 get pods -n $ns;
  kubectl --context $DEV2_GKE_1 get pods -n $ns;
  kubectl --context $DEV2_GKE_2 get pods -n $ns;
done;

Output (do not copy)

NAME                               READY   STATUS    RESTARTS   AGE
currencyservice-5c5b8876db-pvc6s   2/2     Running   0          13m
NAME                               READY   STATUS    RESTARTS   AGE
currencyservice-5c5b8876db-xlkl9   2/2     Running   0          13m
NAME                               READY   STATUS    RESTARTS   AGE
currencyservice-5c5b8876db-zdjkg   2/2     Running   0          115s
NAME                               READY   STATUS    RESTARTS   AGE
currencyservice-5c5b8876db-l748q   2/2     Running   0          82s

NAME                            READY   STATUS    RESTARTS   AGE
emailservice-588467b8c8-gk92n   2/2     Running   0          13m
NAME                            READY   STATUS    RESTARTS   AGE
emailservice-588467b8c8-rvzk9   2/2     Running   0          13m
NAME                            READY   STATUS    RESTARTS   AGE
emailservice-588467b8c8-mt925   2/2     Running   0          117s
NAME                            READY   STATUS    RESTARTS   AGE
emailservice-588467b8c8-klqn7   2/2     Running   0          84s

NAME                        READY   STATUS    RESTARTS   AGE
frontend-64b94cf46f-kkq7d   2/2     Running   0          13m
NAME                        READY   STATUS    RESTARTS   AGE
frontend-64b94cf46f-lwskf   2/2     Running   0          13m
NAME                        READY   STATUS    RESTARTS   AGE
frontend-64b94cf46f-zz7xs   2/2     Running   0          118s
NAME                        READY   STATUS    RESTARTS   AGE
frontend-64b94cf46f-2vtw5   2/2     Running   0          85s

NAME                              READY   STATUS    RESTARTS   AGE
paymentservice-777f6c74f8-df8ml   2/2     Running   0          13m
NAME                              READY   STATUS    RESTARTS   AGE
paymentservice-777f6c74f8-bdcvg   2/2     Running   0          13m
NAME                              READY   STATUS    RESTARTS   AGE
paymentservice-777f6c74f8-jqf28   2/2     Running   0          117s
NAME                              READY   STATUS    RESTARTS   AGE
paymentservice-777f6c74f8-95x2m   2/2     Running   0          86s

NAME                                     READY   STATUS    RESTARTS   AGE
productcatalogservice-786dc84f84-q5g9p   2/2     Running   0          13m
NAME                                     READY   STATUS    RESTARTS   AGE
productcatalogservice-786dc84f84-n6lp8   2/2     Running   0          13m
NAME                                     READY   STATUS    RESTARTS   AGE
productcatalogservice-786dc84f84-gf9xl   2/2     Running   0          119s
NAME                                     READY   STATUS    RESTARTS   AGE
productcatalogservice-786dc84f84-v7cbr   2/2     Running   0          86s

NAME                                     READY   STATUS    RESTARTS   AGE
recommendationservice-5fdf959f6b-2ltrk   2/2     Running   0          13m
NAME                                     READY   STATUS    RESTARTS   AGE
recommendationservice-5fdf959f6b-dqd55   2/2     Running   0          13m
NAME                                     READY   STATUS    RESTARTS   AGE
recommendationservice-5fdf959f6b-jghcl   2/2     Running   0          119s
NAME                                     READY   STATUS    RESTARTS   AGE
recommendationservice-5fdf959f6b-kkspz   2/2     Running   0          87s

NAME                              READY   STATUS    RESTARTS   AGE
shippingservice-7bd5f569d-qqd9n   2/2     Running   0          13m
NAME                              READY   STATUS    RESTARTS   AGE
shippingservice-7bd5f569d-xczg5   2/2     Running   0          13m
NAME                              READY   STATUS    RESTARTS   AGE
shippingservice-7bd5f569d-wfgfr   2/2     Running   0          2m
NAME                              READY   STATUS    RESTARTS   AGE
shippingservice-7bd5f569d-r6t8v   2/2     Running   0          88s

Verify pods in cart namespace are in Running state in first dev cluster only.

kubectl --context $DEV1_GKE_1 get pods -n cart;

Output (do not copy)

NAME                           READY   STATUS    RESTARTS   AGE
cartservice-659c9749b4-vqnrd   2/2     Running   0          17m

Access the Hipster Shop app

Global load balancing

You now have Hipster Shop app deployed to all four app clusters. These clusters are in two regions and four zones. Clients can access the Hipster shop app by accessing the frontend service. The frontend service runs on all four app clusters. A Google Cloud Load Balancer ( GCLB ) is used to get client traffic to all four instances of the frontend service.

Istio Ingress gateways only run in the ops clusters and act as a regional load balancer to the two zonal application clusters within the region. GCLB uses the two Istio ingress gateways (running in the two ops clusters) as backends to the global frontend service. The Istio Ingress gateways receive the client traffic from the GCLB and then send the client traffic onwards to the frontend Pods running in the application clusters.

Alternatively, you can put Istio Ingress gateways on the application clusters directly and the GCLB can use those as backends.

GKE Autoneg controller

Istio Ingress gateway Kubernetes Service registers itself as a backend to the GCLB using Network Endpoint Groups (NEGs). NEGs allow for container-native load balancing using GCLBs. NEGs are created through a special annotation on a Kubernetes Service, so it can register itself to the NEG Controller. Autoneg controller is a special GKE controller that automates the creation of NEGs as well as assigning them as backends to a GCLB using Service annotations. Istio control planes including the Istio ingress gateways are deployed during the initial infrastructure Terraform Cloud Build. The GCLB and autoneg configuration is done as part of the initial Terraform infrastructure Cloud Build.

Secure Ingress using Cloud Endpoints and managed certs

GCP Managed certs are used to secure the client traffic to the frontend GCLB service. GCLB uses managed certs for the global frontend service and the certificate is terminated at the GCLB. In this workshop, you use Cloud Endpoints as the domain for the managed cert. Alternatively, you can use your domain and a DNS name for the frontend to create GCP managed certs.

To access the Hipster shop, click on the link output of the following command.

echo "https://frontend.endpoints.$TF_VAR_ops_project_name.cloud.goog"

You can check that the certificate is valid by clicking the lock symbol in the URL bar of your Chrome tab.

Verify global load balancing

As part of the application deployment, load generators were deployed in both ops clusters that generate test traffic to the GCLB Hipster shop Cloud Endpoints link. Verify that the GCLB is receiving traffic and sending to both Istio Ingress gateways.

Get the GCLB > Monitoring link for the ops project where the Hipster shop GCLB is created.

echo "https://console.cloud.google.com/net-services/loadbalancing/details/http/istio-ingressgateway?project=$TF_VAR_ops_project_name&cloudshell=false&tab=monitoring&duration=PT1H"

Change from All backends to istio-ingressgateway from the Backend dropdown menu as shown below.

Note traffic going to both istio-ingressgateways .

There are three NEGs created per istio-ingressgateway . Since the ops clusters are regional clusters, one NEG is created for each zone in the region. The istio-ingressgateway Pods, however, run in a single zone per region. Traffic is shown going to the istio-ingressgateway Pods.

Load generators are running in both ops clusters simulating client traffic from the two regions they are in. The load generated in the ops cluster region 1 is being sent to istio-ingressgateway in region 2. Likewise, the load generated in ops cluster region 2 is being sent to istio-ingressgateway in region 2.

8. Observability with Stackdriver

Objective: Connect Istio telemetry to Stackdriver and validate.

Install istio-telemetry resources
Create/update Istio Services dashboards
View container logs
View distributed tracing in Stackdriver

Copy-and-Paste Method Lab Instructions

One of Istio's major features is built-in observability ("o11y"). This means that even with black-box, uninstrumented containers, operators can still observe the traffic going in and out of these containers, providing services to customers. This observation takes the shape of a few different methods: metrics, logs, and traces.

We will also utilize the built-in load generation system in Hipster Shop. Observability doesn't work very well in a static system with no traffic, so load generation helps us see how it works. This load is already running, now we'll just be able to see it.

Install the istio to stackdriver config file.

cd $WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/istio-telemetry
kustomize edit add resource istio-telemetry.yaml

cd $WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/istio-telemetry
kustomize edit add resource istio-telemetry.yaml

Commit to k8s-repo.

cd $WORKDIR/k8s-repo
git add . && git commit -am "Install istio to stackdriver configuration"
git push

View the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_VAR_ops_project_name}"

Verify the Istio → Stackdriver integration Get the Stackdriver Handler CRD.

kubectl --context $OPS_GKE_1 get handler -n istio-system

The output should show a handler named stackdriver:

NAME            AGE
kubernetesenv   12d
prometheus      12d
stackdriver     69s      # <== NEW!

Verify that the Istio metrics export to Stackdriver is working. Click the link output from this command:

echo "https://console.cloud.google.com/monitoring/metrics-explorer?cloudshell=false&project=$TF_VAR_ops_project_name"

You will be prompted to create a new Workspace, named after the Ops project, just choose OK. If it prompts you about the new UI, just dismiss the dialog.

In the Metrics Explorer, under "Find resource type and metric" type " istio " to see there are options like "Server Request Count" on the "Kubernetes Container" resource type. This shows us that the metrics are flowing from the mesh into Stackdriver.

(You will have to Group By destination_service_name label if you want to see the lines below.)

Visualizing metrics with Dashboards:

Now that our metrics are in the Stackdriver APM system, we want a way to visualize them. In this section, we will install a pre-built dashboard which shows us the three of the four " Golden Signals " of metrics: Traffic (Requests per second), Latency (in this case, 99th and 50th percentile), and Errors (we're excluding Saturation in this example).

Istio's Envoy proxy gives us several metrics , but these are a good set to start with. (exhaustive list is here ). Note that each metric has a set of labels that can be used for filtering, aggregating, such as: destination_service, source_workload_namespace, response_code, istio_tcp_received_bytes_total, etc).

Now let's add our pre-canned metrics dashboard . We are going to be using the Dashboard API directly. This is something you wouldn't normally do by hand-generating API calls, it would be part of an automation system, or you would build the dashboard manually in the web UI. This will get us started quickly:

sed -i 's/OPS_PROJECT/'${TF_VAR_ops_project_name}'/g' \
$WORKDIR/asm/k8s_manifests/prod/app-telemetry/services-dashboard.json
OAUTH_TOKEN=$(gcloud auth application-default print-access-token)
curl -X POST -H "Authorization: Bearer $OAUTH_TOKEN" -H "Content-Type: application/json" \
https://monitoring.googleapis.com/v1/projects/$TF_VAR_ops_project_name/dashboards \
 -d @$WORKDIR/asm/k8s_manifests/prod/app-telemetry/services-dashboard.json

Navigate to the output link below to view the newly added "Services dashboard".

echo "https://console.cloud.google.com/monitoring/dashboards/custom/servicesdash?cloudshell=false&project=$TF_VAR_ops_project_name"

We could edit the dashboard in-place using the UX, but in our case we are going to quickly add a new graph using the API. In order to do that, you should pull down the latest version of the dashboard, apply your edits, then push it back up using the HTTP PATCH method.

You can get an existing dashboard by querying the monitoring API. Get the existing dashboard that was just added:

curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" -H "Content-Type: application/json" \
https://monitoring.googleapis.com/v1/projects/$TF_VAR_ops_project_name/dashboards/servicesdash > /tmp/services-dashboard.json

Add a new graph: (50th %ile latency): [ API reference ] Now we can add a new graph widget to our dashboard in code. This change can be reviewed by peers and checked into version control. Here is a widget to add that shows 50%ile latency (median latency).

Try editing the dashboard you just got, adding a new stanza:

NEW_CHART=${WORKDIR}/asm/k8s_manifests/prod/app-telemetry/new-chart.json
jq --argjson newChart "$(<$NEW_CHART)" '.gridLayout.widgets += [$newChart]' /tmp/services-dashboard.json > /tmp/patched-services-dashboard.json

Update the existing services dashboard:

curl -X PATCH -H "Authorization: Bearer $OAUTH_TOKEN" -H "Content-Type: application/json" \
https://monitoring.googleapis.com/v1/projects/$TF_VAR_ops_project_name/dashboards/servicesdash \
 -d @/tmp/patched-services-dashboard.json

View the updated dashboard by navigating to the following output link:

echo "https://console.cloud.google.com/monitoring/dashboards/custom/servicesdash?cloudshell=false&project=$TF_VAR_ops_project_name"

Do some simple Logs Analysis.

Istio provides a set of structured logs for all in-mesh network traffic and uploads them to Stackdriver Logging to allow cross-cluster analysis in one powerful tool. Logs are annotated with service-level metadata such as the cluster, container, app, connection_id, etc.

An example log entry (in this case, Envoy proxy's accesslog) might look like this (trimmed):

*** DO NOT PASTE *** 
 logName: "projects/PROJECTNAME-11932-01-ops/logs/server-tcp-accesslog-stackdriver.instance.istio-system" 
labels: {
  connection_id: "fbb46826-96fd-476c-ac98-68a9bd6e585d-1517191"   
  destination_app: "redis-cart"   
  destination_ip: "10.16.1.7"   
  destination_name: "redis-cart-6448dcbdcc-cj52v"   
  destination_namespace: "cart"   
  destination_owner: "kubernetes://apis/apps/v1/namespaces/cart/deployments/redis-cart"   
  destination_workload: "redis-cart"   
  source_ip: "10.16.2.8"   
  total_received_bytes: "539"   
  total_sent_bytes: "569" 
...  
 }

View your logs here:

echo "https://console.cloud.google.com/logs/viewer?cloudshell=false&project=$TF_VAR_ops_project_name"

You can view Istio's control plane logs by selecting Resource > Kubernetes Container, and searching on "pilot" —

Here, we can see the Istio Control Plane pushing proxy config to the sidecar proxies for each sample app service. "CDS," "LDS," and "RDS" represent different Envoy APIs ( more information ).

Beyond Istio's logs, you can also find container logs as well as infrastructure or other GCP services logs all in the same interface. Here are some sample logs queries for GKE. The logs viewer also allows you to create metrics out of logs (eg: "count every error that matches some string") which can be used on a dashboard or as part of an alert. Logs can also be streamed to other analysis tools such as BigQuery.

Some sample filters for hipster shop:

resource.type="k8s_container" labels.destination_app="productcatalogservice"

resource.type="k8s_container" resource.labels.namespace_name="cart"

Check out Distributed Traces.

Now that you're working with a distributed system, debugging needs a new tool: Distributed Tracing . This tool allows you to discover statistics about how your services are interacting (such as finding outlying slow events in the picture below), as well as dive into raw sample traces to investigate the details of what is really going on.

The Timeline View shows all requests over time, graphed by their latency, or time spent between initial request, through the Hipster stack, to finally respond to the end user. The higher up the dots, the slower (and less-happy!) the user's experience.

You can click on a dot to find the detailed Waterfall View of that particular request. This ability to find the raw details of a particular request (not just aggregate statistics) is vital to understanding the interplay between services, especially when hunting down rare, but bad, interactions between services.

The Waterfall View should be familiar to anyone who has used a debugger, but in this case instead of showing time spent in different processes of a single application, this is showing time spent traversing our mesh, between services, running in separate containers.

Here you can find your Traces:

echo "https://console.cloud.google.com/traces/overview?cloudshell=false&project=$TF_VAR_ops_project_name"

An example screenshot of the tool:

9. Mutual TLS Authentication

Objective: Secure connectivity between microservices (AuthN).

Enable mesh wide mTLS
Verify mTLS by inspecting logs

Copy-and-Paste Method Lab Instructions

Now that our apps are installed and Observability is set up, we can start securing the connections between services and make sure it keeps working.

For example, we can see on the Kiali dashboard that our services are not using MTLS (no "lock" icon). But the traffic is flowing and the system is working fine. Our StackDriver Golden Metrics dashboard is giving us some peace of mind that things are working, overall.

Check MeshPolicy in ops clusters. Note mTLS is PERMISSIVE allowing for both encrypted and non-mTLS traffic.

kubectl --context $OPS_GKE_1 get MeshPolicy -o json | jq '.items[].spec'
kubectl --context $OPS_GKE_2 get MeshPolicy -o json | jq '.items[].spec'

    `Output (do not copy)`

{
  "peers": [
    {
      "mtls": {
        "mode": "PERMISSIVE"
      }
    }
  ]
}

Istio is configured on all clusters using the Istio operator, which uses the IstioControlPlane custom resource (CR). We will configure mTLS in all clusters by updating the IstioControlPlane CR and updating the k8s-repo. Setting global > mTLS > enabled: true in the IstioControlPlane CR results in the following two changes to the Istio control plane:

MeshPolicy is set to turn on mTLS mesh wide for all Services running in all clusters.
A DestinationRule is created to allow ISTIO_MUTUAL traffic between Services running in all clusters.

We will apply a kustomize patch to the istioControlPlane CR to enable mTLS cluster wide. Copy the patch to relevant dir for all clusters and add a kustomize patch.

cp -r $WORKDIR/asm/k8s_manifests/prod/app-mtls/mtls-kustomize-patch-replicated.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/istio-controlplane/mtls-kustomize-patch.yaml
cd $WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/istio-controlplane
kustomize edit add patch mtls-kustomize-patch.yaml

cp -r $WORKDIR/asm/k8s_manifests/prod/app-mtls/mtls-kustomize-patch-replicated.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/istio-controlplane/mtls-kustomize-patch.yaml
cd $WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/istio-controlplane
kustomize edit add patch mtls-kustomize-patch.yaml

cp -r $WORKDIR/asm/k8s_manifests/prod/app-mtls/mtls-kustomize-patch-shared.yaml \
$WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/istio-controlplane/mtls-kustomize-patch.yaml
cd $WORKDIR/k8s-repo/$DEV1_GKE_1_CLUSTER/istio-controlplane
kustomize edit add patch mtls-kustomize-patch.yaml

cp -r $WORKDIR/asm/k8s_manifests/prod/app-mtls/mtls-kustomize-patch-shared.yaml \
$WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/istio-controlplane/mtls-kustomize-patch.yaml
cd $WORKDIR/k8s-repo/$DEV1_GKE_2_CLUSTER/istio-controlplane
kustomize edit add patch mtls-kustomize-patch.yaml

cp -r $WORKDIR/asm/k8s_manifests/prod/app-mtls/mtls-kustomize-patch-shared.yaml \
$WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/istio-controlplane/mtls-kustomize-patch.yaml
cd $WORKDIR/k8s-repo/$DEV2_GKE_1_CLUSTER/istio-controlplane
kustomize edit add patch mtls-kustomize-patch.yaml

cp -r $WORKDIR/asm/k8s_manifests/prod/app-mtls/mtls-kustomize-patch-shared.yaml \
$WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/istio-controlplane/mtls-kustomize-patch.yaml
cd $WORKDIR/k8s-repo/$DEV2_GKE_2_CLUSTER/istio-controlplane
kustomize edit add patch mtls-kustomize-patch.yaml

Commit to k8s-repo.

cd $WORKDIR/k8s-repo
git add . && git commit -am "turn mTLS on"
git push

View the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_VAR_ops_project_name}"

Verify mTLS

Check MeshPolicy once more in ops clusters. Note mTLS is no longer PERMISSIVE and will only allow for mTLS traffic.

kubectl --context $OPS_GKE_1 get MeshPolicy -o json | jq .items[].spec
kubectl --context $OPS_GKE_2 get MeshPolicy -o json | jq .items[].spec

Output (do not copy):

{
  "peers": [
    {
      "mtls": {}
    }
  ]
}

Describe the DestinationRule created by the Istio operator controller.

kubectl --context $OPS_GKE_1 get DestinationRule default -n istio-system -o json | jq '.spec'
kubectl --context $OPS_GKE_2 get DestinationRule default -n istio-system -o json | jq '.spec'

Output (do not copy):

{
    host: '*.local',
    trafficPolicy: {
      tls: {
        mode: ISTIO_MUTUAL
      }
   }
}

We can also see the move from HTTP to HTTPS in the logs.

We can expose this particular field from the logs in the UI by clicking one one log entry and then clicking on the value of the field you want to display, in our case, click on "http" next to "protocol:

This results in a nice way to visualize the changeover.:

10. Canary Deployments

Objective: Rollout a new version of the frontend Service.

Rollout frontend-v2 (next production version) Service in one region
Use DestinationRules and VirtualServices to slowly steer traffic to frontend-v2
Verify GitOps deployment pipeline by inspecting series of commits to the k8s-repo

Copy-and-Paste Method Lab Instructions

A canary deployment is a progressive rollout of a new service. In a canary deployment, you send an increasing amount of traffic to the new version, while still sending the remaining percentage to the current version. A common pattern is to perform a canary analysis at each stage of traffic splitting, and compare the "golden signals" of the new version (latency, error rate, saturation) against a baseline. This helps prevent outages, and ensure the stability of the new "v2" service at every stage of traffic splitting.

In this section, you will learn how to use Cloud Build and Istio traffic policies to create a basic canary deployment for a new version of the frontend service.

First, we'll run the Canary pipeline in the DEV1 region (us-west1), and roll out frontend v2 on both clusters in that region. Second, we'll run the Canary pipeline in the DEV2 region (us-central), and deploy v2 onto both clusters in that region. Running the pipeline on regions in order, versus in parallel across all regions, helps avoid global outages caused by bad configuration, or by bugs in the v2 app itself.

Note : we'll manually trigger the Canary pipeline in both regions, but in production, you would use an automated trigger, for instance based on a new Docker image tag pushed to a registry.

From Cloud Shell, define some env variables to simplify running the rest of the commands.

CANARY_DIR="$WORKDIR/asm/k8s_manifests/prod/app-canary/"
K8S_REPO="$WORKDIR/k8s-repo"

Run the repo_setup.sh script, to copy the baseline manifests into k8s-repo.

$CANARY_DIR/repo-setup.sh

The following manifests are copied:

frontend-v2 deployment
frontend-v1 patch (to include the "v1" label, and an image with a "/version" endpoint)
respy , a small pod that will print HTTP response distribution, and help us visualize the canary deployment in real time.
frontend Istio DestinationRule - splits the frontend Kubernetes Service into two subsets, v1 and v2, based on the "version" deployment label
frontend Istio VirtualService - routes 100% of traffic to frontend v1. This overrides the Kubernetes Service default round-robin behavior, which would immediately send 50% of all Dev1 regional traffic to frontend v2.

Commit changes to k8s_repo:

cd $K8S_REPO 
git add . && git commit -am "frontend canary setup"
git push

View the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_VAR_ops_project_name}"

Navigate to Cloud Build in the console for the OPS1 project. Wait for the Cloud Build pipeline to complete, then get pods in the frontend namespace in both DEV1 clusters. You should see the following:

watch -n 1 kubectl --context $DEV1_GKE_1 get pods -n frontend

Output (do not copy)

NAME                           READY   STATUS    RESTARTS   AGE
frontend-578b5c5db6-h9567      2/2     Running   0          59m
frontend-v2-54b74fc75b-fbxhc   2/2     Running   0          2m26s
respy-5f4664b5f6-ff22r         2/2     Running   0          2m26s

We will use tmux to split our cloudshell window into 2 panes:

The bottom pane will be running the watch command to observe the HTTP response distribution for the frontend service.
The top pane will be running the actual canary pipeline script.

Run the command to split the cloud shell window and execute the watch command in the bottom pane.

RESPY_POD=$(kubectl --context $DEV1_GKE_1 get pod \
-n frontend -l app=respy -o jsonpath='{..metadata.name}')
export TMUX_SESSION=$(tmux display-message -p '#S')
tmux split-window -d -t $TMUX_SESSION:0 -p33 \
-v "export KUBECONFIG=$WORKDIR/asm/gke/kubemesh; \
kubectl --context $DEV1_GKE_1 exec -n frontend -it \
$RESPY_POD -c respy /bin/sh -- -c 'watch -n 1 ./respy \
--u http://frontend:80/version --c 10 --n 500'; sleep 2"

Output (do not copy)

500 requests to http://frontend:80/version...
+----------+-------------------+
| RESPONSE | % OF 500 REQUESTS |
+----------+-------------------+
| v1       | 100.0%            |
|          |                   |
+----------+-------------------+

Execute the canary pipeline on the Dev1 region. We provide a script that updates frontend-v2 traffic percentages in the VirtualService (updating weights to 20%, 50%, 80%, then 100%). Between updates, the script waits for the Cloud Build pipeline to complete. Run the canary deployment script for the Dev1 region. Note - this script takes about 10 minutes to complete.

K8S_REPO=$K8S_REPO CANARY_DIR=$CANARY_DIR \
OPS_DIR=$OPS_GKE_1_CLUSTER OPS_CONTEXT=$OPS_GKE_1 \
${CANARY_DIR}/auto-canary.sh

You can see traffic splitting in real time in the bottom window where you're running the respy command. For instance, at the 20% mark :

Output (do not copy)

500 requests to http://frontend:80/version...
+----------+-------------------+
| RESPONSE | % OF 500 REQUESTS |
+----------+-------------------+
| v1       | 79.4%             |
|          |                   |
| v2       | 20.6%             |
|          |                   |
+----------+-------------------+

Once the Dev2 rollout completes for frontend-v2, you should see a success message at the end of the script:
```
 Output (do not copy) 
```

✅ 100% successfully deployed
🌈 frontend-v2 Canary Complete for gke-asm-1-r1-prod

And all frontend traffic from a Dev2 pod should be going to frontend-v2:
```
 Output (do not copy) 
```

500 requests to http://frontend:80/version...
+----------+-------------------+
| RESPONSE | % OF 500 REQUESTS |
+----------+-------------------+
| v2       | 100.0%            |
|          |                   |
+----------+-------------------+

Close the split pane.

tmux respawn-pane -t ${TMUX_SESSION}:0.1 -k 'exit'

Navigate to Cloud Source Repos at the link generated.

echo https://source.developers.google.com/p/$TF_VAR_ops_project_name/r/k8s-repo

You should see a separate commit for each traffic percentage, with the most recent commit at the top of the list:

Now, you will repeat the same process for the Dev2 region. Note that the Dev2 region is still "locked" on v1. This is because in the baseline repo_setup script, we pushed a VirtualService to explicitly send all traffic to v1. This way, we were able to safely do a regional canary on Dev1, and make sure it ran successfully before rolling out the new version globally.

Run the command to split the cloud shell window and execute the watch command in the bottom pane.

RESPY_POD=$(kubectl --context $DEV2_GKE_1 get pod \
-n frontend -l app=respy -o jsonpath='{..metadata.name}')
export TMUX_SESSION=$(tmux display-message -p '#S')
tmux split-window -d -t $TMUX_SESSION:0 -p33 \
-v "export KUBECONFIG=$WORKDIR/asm/gke/kubemesh; \
kubectl --context $DEV2_GKE_1 exec -n frontend -it \
$RESPY_POD -c respy /bin/sh -- -c 'watch -n 1 ./respy \
--u http://frontend:80/version --c 10 --n 500'; sleep 2"

Output (do not copy)

500 requests to http://frontend:80/version...
+----------+-------------------+
| RESPONSE | % OF 500 REQUESTS |
+----------+-------------------+
| v1       | 100.0%            |
|          |                   |
+----------+-------------------+

Execute the canary pipeline on the Dev2 region. We provide a script that updates frontend-v2 traffic percentages in the VirtualService (updating weights to 20%, 50%, 80%, then 100%). Between updates, the script waits for the Cloud Build pipeline to complete. Run the canary deployment script for the Dev1 region. Note - this script takes about 10 minutes to complete.

K8S_REPO=$K8S_REPO CANARY_DIR=$CANARY_DIR \
OPS_DIR=$OPS_GKE_2_CLUSTER OPS_CONTEXT=$OPS_GKE_2 \
${CANARY_DIR}/auto-canary.sh

Output (do not copy)

500 requests to http://frontend:80/version...
+----------+-------------------+
| RESPONSE | % OF 500 REQUESTS |
+----------+-------------------+
| v1       | 100.0%            |
|          |                   |
+----------+-------------------+

From the Respy pod in Dev2, watch traffic from Dev2 pods move progressively from frontend v1 to v2. Once the script completes, you should see:

Output (do not copy)

500 requests to http://frontend:80/version...
+----------+-------------------+
| RESPONSE | % OF 500 REQUESTS |
+----------+-------------------+
| v2       | 100.0%            |
|          |                   |
+----------+-------------------+

Close the split pane.

tmux respawn-pane -t ${TMUX_SESSION}:0.1 -k 'exit'

This section introduced how to use Istio for regional canary deployments. In production, instead of a manual script, you might automatically trigger this canary script as a Cloud Build pipeline, using a trigger such as a new tagged image pushed to a container registry. You would also want to add canary analysis in between each step, analyzing v2's latency and error rate against a predefined safety threshold, before sending over more traffic.

11. Authorization Policies

Objective: Set up RBAC between microservices (AuthZ).

Create AuthorizationPolicy to DENY access to a microservice
Create AuthorizationPolicy to ALLOW specific access to a microservice

Copy-and-Paste Method Lab Instructions

Unlike a monolithic application that might be running in one place, globally-distributed microservices apps make calls across network boundaries. This means more points of entry into your applications, and more opportunities for malicious attacks. And because Kubernetes pods have transient IPs, traditional IP-based firewall rules are no longer adequate to secure access between workloads. In a microservices architecture, a new approach to security is needed. Building on Kubernetes security building blocks like service accounts , Istio provides a flexible set of security policies for your applications.

Istio policies cover both authentication and authorization. Authentication verifies identity (is this server who they say they are?), and authorization verifies permissions (is this client allowed to do that?). We covered Istio authentication in the mutual TLS section in Module 1 (MeshPolicy). In this section, we will learn how to use Istio authorization policies to control access to one of our application workloads, currencyservice .

First, we'll deploy an AuthorizationPolicy across all 4 Dev clusters, closing off all access to currencyservice, and triggering an error in the frontend. Then, we will allow only the frontend service to access currencyservice.

Inspect the contents of currency-deny-all.yaml . This policy uses Deployment label selectors to restrict access to the currencyservice. Notice how there is no spec field - this means this policy will deny all access to the selected service.

cat $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-deny-all.yaml

Output (do not copy)

apiVersion: "security.istio.io/v1beta1"
kind: "AuthorizationPolicy"
metadata:
  name: "currency-policy"
  namespace: currency
spec:
  selector:
    matchLabels:
      app: currencyservice

Copy the currency policy into k8s-repo, for the ops clusters in both regions.

cp $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-deny-all.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-authorization/currency-policy.yaml
cd $WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-authorization
kustomize edit add resource currency-policy.yaml
cp $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-deny-all.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-authorization/currency-policy.yaml
cd $WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-authorization
kustomize edit add resource currency-policy.yaml

Push changes.

cd $WORKDIR/k8s-repo 
git add . && git commit -am "AuthorizationPolicy - currency: deny all"
git push

Check the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo https://console.cloud.google.com/cloud-build/builds?project=$TF_VAR_ops_project_name

After the build finishes successfully, try to reach the hipstershop frontend in a browser on the following link:

echo "https://frontend.endpoints.$TF_VAR_ops_project_name.cloud.goog"

You should see an Authorization error from currencyservice:

Let's investigate how the currency service is enforcing this AuthorizationPolicy. First, enable trace-level logs on the Envoy proxy for one of the currency pods, since blocked authorization calls aren't logged by default.

CURRENCY_POD=$(kubectl --context $DEV1_GKE_2 get pod -n currency | grep currency| awk '{ print $1 }')
kubectl --context $DEV1_GKE_2 exec -it $CURRENCY_POD -n \
currency -c istio-proxy -- curl -X POST \
"http://localhost:15000/logging?level=trace"

Get the RBAC (authorization) logs from the currency service's sidecar proxy. You should see an "enforced denied" message, indicating that the currencyservice is set to block all inbound requests.

kubectl --context $DEV1_GKE_2 logs -n currency $CURRENCY_POD \
-c istio-proxy | grep -m 3 rbac

Output (do not copy)

[Envoy (Epoch 0)] [2020-01-30 00:45:50.815][22][debug][rbac] [external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:67] checking request: remoteAddress: 10.16.5.15:37310, localAddress: 10.16.3.8:7000, ssl: uriSanPeerCertificate: spiffe://cluster.local/ns/frontend/sa/frontend, subjectPeerCertificate: , headers: ':method', 'POST'
[Envoy (Epoch 0)] [2020-01-30 00:45:50.815][22][debug][rbac] [external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:118] enforced denied
[Envoy (Epoch 0)] [2020-01-30 00:45:50.815][22][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1354] [C115][S17310331589050212978] Sending local reply with details rbac_access_denied

Now, let's allow the frontend – but not the other backend services – to access currencyservice. Open currency-allow-frontend.yaml and inspect its contents. Note that we've added the following rule:

cat ${WORKDIR}/asm/k8s_manifests/prod/app-authorization/currency-allow-frontend.yaml

Output (do not copy)

rules:
 - from:
   - source:
       principals: ["cluster.local/ns/frontend/sa/frontend"]

Here, we are whitelisting a specific source.principal (client) to access currency service. This source.principal is defined by is Kubernetes Service Account. In this case, the service account we are whitelisting is the frontend service account in the frontend namespace.

Note: when using Kubernetes Service Accounts in Istio AuthorizationPolicies, you must first enable cluster-wide mutual TLS, as we did in Module 1. This is to ensure that service account credentials are mounted into requests.

Copy over the updated currency policy

cp $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-allow-frontend.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-authorization/currency-policy.yaml
cp $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-allow-frontend.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-authorization/currency-policy.yaml

Push changes.

cd $WORKDIR/k8s-repo
git add . && git commit -am "AuthorizationPolicy - currency: allow frontend"
git push

View the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo https://console.cloud.google.com/cloud-build/builds?project=$TF_VAR_ops_project_name

After the build finishes successfully, open the Hipstershop frontend again. This time you should see no errors in the homepage - this is because the frontend is explicitly allowed to access the current service.
Now, try to execute a checkout, by adding items to your cart and clicking "place order." This time, you should see a price-conversion error from currency service - this is because we have only whitelisted the frontend, so the checkoutservice is still unable to access currencyservice.

Finally, let's allow the checkout service access to currency, by adding another rule to our currencyservice AuthorizationPolicy. Note that we are only opening up currency access to the two services that need to access it - frontend and checkout. The other backends will still be blocked.
Open currency-allow-frontend-checkout.yaml and inspect its contents. Notice that the list of rules functions as a logical OR - currency will accept only requests from workloads with either of these two service accounts.

cat ${WORKDIR}/asm/k8s_manifests/prod/app-authorization/currency-allow-frontend-checkout.yaml

Output (do not copy)

apiVersion: "security.istio.io/v1beta1"
kind: "AuthorizationPolicy"
metadata:
  name: "currency-policy"
  namespace: currency
spec:
  selector:
    matchLabels:
      app: currencyservice
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/frontend/sa/frontend"]
  - from:
    - source:
        principals: ["cluster.local/ns/checkout/sa/checkout"]

Copy the final authorization policy to k8s-repo.

cp $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-allow-frontend-checkout.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_1_CLUSTER/app-authorization/currency-policy.yaml
cp $WORKDIR/asm/k8s_manifests/prod/app-authorization/currency-allow-frontend-checkout.yaml \
$WORKDIR/k8s-repo/$OPS_GKE_2_CLUSTER/app-authorization/currency-policy.yaml

Push changes

cd $WORKDIR/k8s-repo 
git add . && git commit -am "AuthorizationPolicy - currency: allow frontend and checkout"
git push

View the status of the Ops project Cloud Build in a previously opened tab or by clicking the following link:

echo https://console.cloud.google.com/cloud-build/builds?project=$TF_VAR_ops_project_name

After the build finishes successfully, try to execute a checkout - it should work successfully.

This section walked through how to use Istio Authorization Policies to enforce granular access control at the per-service level. In production, you might create one AuthorizationPolicy per service, and (for instance) use an allow-all policy to let all workloads in the same namespace access each other.

12. Infrastructure Scaling

Objective: Scale infrastructure by adding new region, project, and clusters.

Clone the infrastructure repo
Update the terraform files to create new resources
2 subnets in the new region (one for the ops project and one for the new project)
New ops cluster in new region (in the new subnet)
New Istio control plane for the new region
2 apps clusters in the new project in the new region
Commit to infrastructure repo
Verify installation

Copy-and-Paste Method Lab Instructions

There are a number of ways to scale a platform. You can add more compute by adding nodes to existing clusters. You can add more clusters in a region. Or you can add more regions to the platform. The decision on what aspect of the platform to scale depends upon the requirements. For example, if you have clusters in all three zones in a region, perhaps adding more nodes (or node pools) to existing cluster may suffice. However, if you have clusters in two of three zones in a single region, then adding a new cluster in the third zone gives you scaling and an additional fault domain (ie a new zone). Another reason for adding a new cluster in a region might be the need to create a single tenant cluster - for regulatory or compliance reasons (for example PCI, or a database cluster that houses PII information). As your business and services expand, adding new regions become inevitable to provide services closer to the clients.

The current platform consists of two regions and clusters in two zones per region. You can think of scaling the platform in two ways:

Vertically - within each region by adding more compute. This is done either by adding more nodes (or node pools) to existing clusters or by adding new clusters within the region. This is done via the infrastructure repo. The simplest path is adding nodes to existing clusters. No additional configuration is required. Adding new clusters may require additional subnets (and secondary ranges), adding appropriate firewall rules, adding the new clusters to the regional ASM/Istio service mesh control plane and deploying application resources to the new clusters.
Horizontally - by adding more regions. The current platform gives you a regional template. It consists on a regional ops cluster where the ASM/Istio control please resides and two (or more) zonal application clusters where application resources are deployed.

In this workshop, you scale the platform "horizontally" as it encompasses the vertical use case steps as well. In order to horizontally, scale the platform by adding a new region (r3) to the platform, the following resources need to be added:

Subnets in the host project shared VPC in region r3 for the new ops and application clusters.
A regional ops cluster in region r3 where the ASM/Istio control plane resides.
Two zonal application clusters in two zones on region r3.
Update to the k8s-repo:
Deploy ASM/Istio control plane resources to the ops cluster in region r3.
Deploy ASM/Istio shared control plane resources to the app clusters in region r3.
While you don't need to create a new project, the steps in the workshop demonstrate adding a new project dev3 to cover the use case of adding a new team to the platform.

Infrastructure repo is used to add new resources stated above.

In Cloud Shell, navigate to WORKDIR and clone the infrastructure repo.

mkdir -p $WORKDIR/infra-repo
cd $WORKDIR/infra-repo
git init && git remote add origin https://source.developers.google.com/p/${TF_ADMIN}/r/infrastructure
git config --local user.email ${MY_USER}
git config --local user.name "infra repo user"
git config --local credential.'https://source.developers.google.com'.helper gcloud.sh
git pull origin master

Clone the workshop source repo add-proj branch into the add-proj-repo directory.

cd $WORKDIR
git clone https://github.com/GoogleCloudPlatform/anthos-service-mesh-workshop.git add-proj-repo -b add-proj

Copy files from the add-proj branch in the source workshop repo. The add-proj branch contains the changes for this section.

cp -r $WORKDIR/add-proj-repo/infrastructure/* $WORKDIR/infra-repo/

Replace the infrastructure directory in the add-proj repo directory with a symlink to the infra-repo directory to allow the scripts on the branch to run.

rm -rf $WORKDIR/add-proj-repo/infrastructure
ln -s $WORKDIR/infra-repo $WORKDIR/add-proj-repo/infrastructure

Run the add-project.sh script to copy the shared states and vars to the new project directory structure.

$WORKDIR/add-proj-repo/scripts/add-project.sh app3 $WORKDIR/asm $WORKDIR/infra-repo

Commit and push changes to create new project

cd $WORKDIR/infra-repo
git add .
git status
git commit -m "add new project" && git push origin master

The commit triggers the infrastructure repo to deploy the infrastructure with the new resources. View the Cloud Build progress by clicking on the output of the following link and navigating to the latest build at the top.

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_ADMIN}"

The last step of the infrastructure Cloud Build creates new Kubernetes resources in the k8s-repo . This triggers the Cloud Build in the k8s-repo (in the ops project). The new Kubernetes resources are for the three new clusters added in the previous step. ASM/Istio control plane and shared control plane resources are added to the new clusters with the k8s-repo Cloud Build.

After the infrastructure Cloud Build successfully finishes, navigate to the k8s-repo latest Cloud Build run by clicking on the following output link.

echo "https://console.cloud.google.com/cloud-build/builds?project=${TF_VAR_ops_project_name}"

Run the following script to add the new clusters to the vars and kubeconfig file.

$WORKDIR/add-proj-repo/scripts/setup-gke-vars-kubeconfig-add-proj.sh $WORKDIR/asm

Change the KUBECONFIG variable to point to the new kubeconfig file.

source $WORKDIR/asm/vars/vars.sh
export KUBECONFIG=$WORKDIR/asm/gke/kubemesh

List your cluster contexts. You should see eight clusters.

kubectl config view -ojson | jq -r '.clusters[].name'

    `Output (do not copy)`

gke_user001-200204-05-dev1-49tqc4_us-west1-a_gke-1-apps-r1a-prod
gke_user001-200204-05-dev1-49tqc4_us-west1-b_gke-2-apps-r1b-prod
gke_user001-200204-05-dev2-49tqc4_us-central1-a_gke-3-apps-r2a-prod
gke_user001-200204-05-dev2-49tqc4_us-central1-b_gke-4-apps-r2b-prod
gke_user001-200204-05-dev3-49tqc4_us-east1-b_gke-5-apps-r3b-prod
gke_user001-200204-05-dev3-49tqc4_us-east1-c_gke-6-apps-r3c-prod
gke_user001-200204-05-ops-49tqc4_us-central1_gke-asm-2-r2-prod
gke_user001-200204-05-ops-49tqc4_us-east1_gke-asm-3-r3-prod
gke_user001-200204-05-ops-49tqc4_us-west1_gke-asm-1-r1-prod

Verify Istio Installation

Ensure Istio is installed on the new ops cluster by checking all pods are running and jobs have completed.

kubectl --context $OPS_GKE_3 get pods -n istio-system

    `Output (do not copy)`

NAME                                      READY   STATUS    RESTARTS   AGE
grafana-5f798469fd-72g6w                  1/1     Running   0          5h12m
istio-citadel-7d8595845-hmmvj             1/1     Running   0          5h12m
istio-egressgateway-779b87c464-rw8bg      1/1     Running   0          5h12m
istio-galley-844ddfc788-zzpkl             2/2     Running   0          5h12m
istio-ingressgateway-59ccd6574b-xfj98     1/1     Running   0          5h12m
istio-pilot-7c8989f5cf-5plsg              2/2     Running   0          5h12m
istio-policy-6674bc7678-2shrk             2/2     Running   3          5h12m
istio-sidecar-injector-7795bb5888-kbl5p   1/1     Running   0          5h12m
istio-telemetry-5fd7cbbb47-c4q7b          2/2     Running   2          5h12m
istio-tracing-cd67ddf8-2qwkd              1/1     Running   0          5h12m
istiocoredns-5f7546c6f4-qhj9k             2/2     Running   0          5h12m
kiali-7964898d8c-l74ww                    1/1     Running   0          5h12m
prometheus-586d4445c7-x9ln6               1/1     Running   0          5h12m

Ensure Istio is installed on both dev3 clusters. Only Citadel, sidecar-injector and coredns run in the dev3 clusters. They share an Istio controlplane running in the ops-3 cluster.

kubectl --context $DEV3_GKE_1 get pods -n istio-system
kubectl --context $DEV3_GKE_2 get pods -n istio-system

    `Output (do not copy)`

NAME                                      READY   STATUS    RESTARTS   AGE
istio-citadel-568747d88-4lj9b             1/1     Running   0          66s
istio-sidecar-injector-759bf6b4bc-ks5br   1/1     Running   0          66s
istiocoredns-5f7546c6f4-qbsqm             2/2     Running   0          78s

Verify service discovery for shared control planes

Verify the secrets are deployed in all ops clusters for all six application clusters.

kubectl --context $OPS_GKE_1 get secrets -l istio/multiCluster=true -n istio-system
kubectl --context $OPS_GKE_2 get secrets -l istio/multiCluster=true -n istio-system
kubectl --context $OPS_GKE_3 get secrets -l istio/multiCluster=true -n istio-system

    `Output (do not copy)`

NAME                  TYPE     DATA   AGE
gke-1-apps-r1a-prod   Opaque   1      14h
gke-2-apps-r1b-prod   Opaque   1      14h
gke-3-apps-r2a-prod   Opaque   1      14h
gke-4-apps-r2b-prod   Opaque   1      14h
gke-5-apps-r3b-prod   Opaque   1      5h12m
gke-6-apps-r3c-prod   Opaque   1      5h12m

13. Circuit Breaking

Objective: Implement a Circuit Breaker for the shipping Service.

Create a DestinationRule for the shipping Service to implement a circuit breaker
Use fortio (a load gen utility) to validate circuit breaker for the shipping Service by force tripping the circuit

Fast Track Script Lab Instructions

Fast Track Script Lab is coming soon!!

Copy-and-Paste Method Lab Instructions

Now that we've learned some basic monitoring and troubleshooting strategies for Istio-enabled services, let's look at how Istio helps you improve the resilience of your services, reducing the amount of troubleshooting you'll have to do in the first place.

A microservices architecture introduces the risk of cascading failures , where the failure of one service can propagate to its dependencies, and the dependencies of those dependencies, causing a "ripple effect" outage that can potentially affect end-users. Istio provides a Circuit Breaker traffic policy to help you isolate services, protecting downstream (client-side) services from waiting on failing services, and protecting upstream (server-side) services from a sudden flood of downstream traffic when they do come back online. Overall, using Circuit Breakers can help you avoid all your services failing their SLOs because of one backend service that is hanging.

The Circuit Breaker pattern is named for an electrical switch that can "trip" when too much electricity flows through, protecting devices from overload. In an Istio setup , this means that Envoy is the circuit breaker, keeping track of the number of pending requests for a service. In this default closed state, requests flow through Envoy uninterrupted.

But when the number of pending requests exceeds your defined threshold, the circuit breaker trips (opens), and Envoy immediately returns an error. This allows the server to fail fast for the client, and prevents the server application code from receiving the client's request when overloaded.

Then, after your defined timeout, Envoy moves to a half open state, where the server can start receiving requests again in a probationary way, and if it can successfully respond to requests, the circuit breaker closes again, and requests to the server begin to flow again.

This diagram summarizes the Istio circuit breaker pattern. The blue rectangles represent Envoy, the blue-filled circle represents the client, and the white-filled circles represent the server container:

You can define Circuit Breaker policies using Istio DestinationRules. In this section, we'll apply the following policy to enforce a circuit breaker for the shipping service:

Output (do not copy)

apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "shippingservice-shipping-destrule"
  namespace: "shipping"
spec:
  host: "shippingservice.shipping.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      consecutiveErrors: 1
      interval: 1s
      baseEjectionTime: 10s
      maxEjectionPercent: 100

There are two DestinationRule fields to note here. connectionPool defines the number of connections this service will allow. The outlierDetection field is where we configure how Envoy will determine the threshold at which to open the circuit breaker. Here, every second (interval), Envoy will count the number of errors it received from the server container. If it exceeds the consecutiveErrors threshold, the Envoy circuit breaker will open, and 100% of productcatalog pods will be shielded from new client requests for 10 seconds. Once the Envoy circuit breaker is open (ie. active), clients will receive 503 (Service Unavailable) errors. Let's see this in action.

Set environment variables for the k8s-repo and asm dir to simplify commands.

export K8S_REPO="${WORKDIR}/k8s-repo"
export ASM="${WORKDIR}/asm"

Update the k8s-repo

cd $WORKDIR/k8s-repo
git pull
cd $WORKDIR

Update the shipping service DestinationRule on both Ops clusters.

cp $ASM/k8s_manifests/prod/istio-networking/app-shipping-circuit-breaker.yaml ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/app-shipping-circuit-breaker.yaml
cp $ASM/k8s_manifests/prod/istio-networking/app-shipping-circuit-breaker.yaml ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/app-shipping-circuit-breaker.yaml

cd ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/; kustomize edit add resource app-shipping-circuit-breaker.yaml
cd ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/; kustomize edit add resource app-shipping-circuit-breaker.yaml

Copy a Fortio load generator pod into the GKE_1 cluster in the Dev1 region. This is the client pod we'll use to "trip" the circuit breaker for shippingservice.

cp $ASM/k8s_manifests/prod/app/deployments/app-fortio.yaml ${K8S_REPO}/${DEV1_GKE_1_CLUSTER}/app/deployments/
cd ${K8S_REPO}/${DEV1_GKE_1_CLUSTER}/app/deployments; kustomize edit add resource app-fortio.yaml

Commit changes.

cd $K8S_REPO 
git add . && git commit -am "Circuit Breaker: shippingservice"
git push
cd $ASM

Wait for Cloud Build to complete.
Back in Cloud Shell, use the fortio pod to send gRPC traffic to shippingservice with 1 concurrent connection, 1000 requests total - this will not trip the circuit breaker, because we have not exceeded the connectionPool settings yet.

FORTIO_POD=$(kubectl --context ${DEV1_GKE_1} get pod -n shipping | grep fortio | awk '{ print $1 }')

kubectl --context ${DEV1_GKE_1} exec -it $FORTIO_POD -n shipping -c fortio /usr/bin/fortio -- load -grpc -c 1 -n 1000 -qps 0 shippingservice.shipping.svc.cluster.local:50051

Output (do not copy)

Health SERVING : 1000
All done 1000 calls (plus 0 warmup) 4.968 ms avg, 201.2 qps

Now run fortio again, increasing the number of concurrent connections to 2, but keeping the total number of requests constant. We should see up to two-thirds of the requests return an "overflow" error, because the circuit breaker has been tripped: in the policy we defined, only 1 concurrent connection is allowed in a 1-second interval.

kubectl --context ${DEV1_GKE_1} exec -it $FORTIO_POD -n shipping -c fortio /usr/bin/fortio -- load -grpc -c 2 -n 1000 -qps 0 shippingservice.shipping.svc.cluster.local:50051

Output (do not copy)

18:46:16 W grpcrunner.go:107> Error making grpc call: rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: overflow
...

Health ERROR : 625
Health SERVING : 375
All done 1000 calls (plus 0 warmup) 12.118 ms avg, 96.1 qps

Envoy keeps track of the number of connections it dropped when the circuit breaker is active, with the upstream_rq_pending_overflow metric. Let's find this in the fortio pod:

kubectl --context ${DEV1_GKE_1} exec -it $FORTIO_POD -n shipping -c istio-proxy  -- sh -c 'curl localhost:15000/stats' | grep shipping | grep pending

Output (do not copy)

cluster.outbound|50051||shippingservice.shipping.svc.cluster.local.circuit_breakers.default.rq_pending_open: 0
cluster.outbound|50051||shippingservice.shipping.svc.cluster.local.circuit_breakers.high.rq_pending_open: 0
cluster.outbound|50051||shippingservice.shipping.svc.cluster.local.upstream_rq_pending_active: 0
cluster.outbound|50051||shippingservice.shipping.svc.cluster.local.upstream_rq_pending_failure_eject: 9
cluster.outbound|50051||shippingservice.shipping.svc.cluster.local.upstream_rq_pending_overflow: 565
cluster.outbound|50051||shippingservice.shipping.svc.cluster.local.upstream_rq_pending_total: 1433

Clean up by removing the circuit breaker policy from both regions.

kubectl --context ${OPS_GKE_1} delete destinationrule shippingservice-circuit-breaker -n shipping 
rm ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/app-shipping-circuit-breaker.yaml
cd ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/; kustomize edit remove resource app-shipping-circuit-breaker.yaml
 

kubectl --context ${OPS_GKE_2} delete destinationrule shippingservice-circuit-breaker -n shipping 
rm ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/app-shipping-circuit-breaker.yaml
cd ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/; kustomize edit remove resource app-shipping-circuit-breaker.yaml
cd $K8S_REPO; git add .; git commit -m "Circuit Breaker: cleanup"; git push origin master

This section demonstrated how to set up a single circuit breaker policy for a service. A best practice is to set up a circuit breaker for any upstream (backend) service that has the potential to hang. By applying Istio circuit breaker policies, you help isolate your microservices, build fault tolerance into your architecture, and reduce the risk of cascading failures under high load.

14. Fault Injection

Objective: Test the resilience of the recommendation Service by introducing delays (before it is pushed to production).

Create a VirtualService for the recommendation Service to introduce a 5s delay
Test the delay using fortio load generator
Remove the delay in the VirtualService and validate

Fast Track Script Lab Instructions

Fast Track Script Lab is coming soon!!

Copy-and-Paste Method Lab Instructions

Adding circuit breaker policies to your services is one way to build resilience against services in production. But circuit breaking results in faults — potentially user-facing errors — which is not ideal. To get ahead of these error cases, and better predict how your downstream services might respond when backends do return errors, you can adopt chaos testing in a staging environment. Chaos testing is the practice of deliberately breaking your services, in order to analyze weak points in the system and improve fault tolerance. You can also use chaos testing to identify ways to mitigate user-facing errors when backends fail - for instance, by displaying a cached result in a frontend.

Using Istio for fault injection is helpful because you can use your production release images, and add the fault at the network layer, instead of modifying source code. In production, you might use a full-fledged chaos testing tool to test resilience at the Kubernetes/compute layer in addition to the network layer.

You can use Istio for chaos testing by applying a VirtualService with the "fault" field. Istio supports two kinds of faults: delay faults (inject a timeout) and abort faults (inject HTTP errors). In this example, we'll inject a 5-second delay fault into the recommendations service . But this time instead of using a circuit breaker to "fail fast" against this hanging service, we will force downstream services to endure the full timeout.

Navigate into the fault injection directory.

export K8S_REPO="${WORKDIR}/k8s-repo"
export ASM="${WORKDIR}/asm/" 
cd $ASM

Open k8s_manifests/prod/istio-networking/app-recommendation-vs-fault.yaml to inspect its contents. Notice that Istio has an option to inject the fault into a percentage of the requests - here, we'll introduce a timeout into all recommendationservice requests.

Output (do not copy)

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: recommendation-delay-fault
spec:
  hosts:
  - recommendationservice.recommendation.svc.cluster.local
  http:
  - route:
    - destination:
        host: recommendationservice.recommendation.svc.cluster.local
    fault:
      delay:
        percentage:
          value: 100
        fixedDelay: 5s

Copy the VirtualService into k8s_repo. We'll inject the fault globally, across both regions.

cp $ASM/k8s_manifests/prod/istio-networking/app-recommendation-vs-fault.yaml ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/app-recommendation-vs-fault.yaml
cd ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/; kustomize edit add resource app-recommendation-vs-fault.yaml

cp $ASM/k8s_manifests/prod/istio-networking/app-recommendation-vs-fault.yaml ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/app-recommendation-vs-fault.yaml
cd ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/; kustomize edit add resource app-recommendation-vs-fault.yaml

Push changes

cd $K8S_REPO 
git add . && git commit -am "Fault Injection: recommendationservice"
git push
cd $ASM

Wait for Cloud Build to complete.
Exec into the fortio pod deployed in the circuit breaker section, and send some traffic to recommendationservice.

FORTIO_POD=$(kubectl --context ${DEV1_GKE_1} get pod -n shipping | grep fortio | awk '{ print $1 }')

kubectl --context ${DEV1_GKE_1} exec -it $FORTIO_POD -n shipping -c fortio /usr/bin/fortio -- load -grpc -c 100 -n 100 -qps 0 recommendationservice.recommendation.svc.cluster.local:8080

    Once the fortio command is complete, you should see responses averaging 5s:

Output (do not copy)

Ended after 5.181367359s : 100 calls. qps=19.3
Aggregated Function Time : count 100 avg 5.0996506 +/- 0.03831 min 5.040237641 max 5.177559818 sum 509.965055

Another way to see the fault we injected in action is open the frontend in a web browser, and click on any product. A product page should take 5 extra seconds to load, since it fetches the recommendations that are displayed at the bottom of the page.
Clean up by removing the fault injection service from both Ops clusters.

kubectl --context ${OPS_GKE_1} delete virtualservice recommendation-delay-fault -n recommendation 
rm ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/app-recommendation-vs-fault.yaml
cd ${K8S_REPO}/${OPS_GKE_1_CLUSTER}/istio-networking/; kustomize edit remove resource app-recommendation-vs-fault.yaml

kubectl --context ${OPS_GKE_2} delete virtualservice recommendation-delay-fault -n recommendation 
rm ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/app-recommendation-vs-fault.yaml
cd ${K8S_REPO}/${OPS_GKE_2_CLUSTER}/istio-networking/; kustomize edit remove resource app-recommendation-vs-fault.yaml

Push changes:

cd $K8S_REPO 
git add . && git commit -am "Fault Injection cleanup / restore"
git push
cd $ASM

15. Monitoring the Istio Control Plane

ASM installs four important control plane components: Pilot, Mixer, Galley and Citadel. Each sends its relevant monitoring metrics to Prometheus, and ASM ships with Grafana dashboards that let operators visualize this monitoring data and assess the health and performance of the control plane.

Viewing the Dashboards

Port-forward your Grafana service installed with Istio

kubectl --context ${OPS_GKE_1} -n istio-system port-forward svc/grafana 3000:3000 >> /dev/null

Open Grafana in your browser
Click on the "Web Preview" icon on the top right corner of your Cloud Shell Window
Click Preview on port 3000 (Note: if the port is not 3000, click on change port and select port 3000)
This will open a tab in your browser with a URL similar to " BASE_URL/?orgId=1&authuser=0&environment_id=default "
View available dashboards
Modify the URL to " BASE_URL/dashboard "
Click on "istio" folder to view available dashboards
Click on any of those dashboards to view the performance of that component. We'll look at the important metrics for each component in the following sections.

Monitoring Pilot

Pilot is the control plane component that distributes networking and policy configuration to the data plane (the Envoy proxies). Pilot tends to scale with the number of workloads and deployments, although not necessarily with the amount of traffic to those workloads. An unhealthy Pilot can:

consume more resources than necessary (CPU and/or RAM)
result in delays in pushing updated configuration information to Envoys

Note: if Pilot is down, or if there are delays, your workloads still serve traffic.

Navigate to " BASE_URL/dashboard/db/istio-pilot-dashboard " in your browser to view Pilot metrics.

Important monitored metrics

میزان استفاده از منابع

Use the Istio Performance and Scalability page as your guide for acceptable usage numbers. Contact GCP support if you see significantly more sustained resource usage than this.

Pilot Push Information

This section monitors Pilots pushes of configuration to your Envoy proxies.

Pilot Pushes shows the type of configuration pushed at any given time.
ADS Monitoring shows the number of Virtual Services, Services and Connected Endpoints in the system.
Clusters with no known endpoints shows endpoints that have been configured but do not have any instances running (which may indicate external services, such as *.googleapis.com).
Pilot Errors show the number of errors encountered over time.
Conflicts show the number of conflicts which are ambiguous configuration on listeners.

If you have Errors or Conflicts, you have bad or inconsistent configuration for one or more of your services. See Troubleshooting the data plane for information.

Envoy Information

This section contains information about the Envoy proxies contacting the control plane. Contact GCP support if you see repeated XDS Connection Failures.

Monitoring Mixer

Mixer is the component that funnels telemetry from the Envoy proxies to telemetry backends (typically Prometheus, Stackdriver, etc). In this capacity, it is not in the data plane. It is deployed as two Kubernetes Jobs (called Mixer) deployed with two different service names (istio-telemetry and istio-policy).

Mixer can also be used to integrate with policy systems. In this capacity, Mixer does affect the data plane, as policy checks to Mixer that fail block access to your services.

Mixer tends to scale with volume of traffic.

Navigate to " BASE_URL/dashboard/db/istio-mixer-dashboard " in your browser to view Mixer metrics.

Important monitored metrics

میزان استفاده از منابع

Use the Istio Performance and Scalability page as your guide for acceptable usage numbers. Contact GCP support if you see significantly more sustained resource usage than this.

Mixer Overview

Response Duration is an important metric. While reports to Mixer telemetry are not in the datapath, if these latencies are high it will definitely slow down sidecar proxy performance. You should expect the 90th percentile to be in the single-digit milliseconds, and the 99th percentile to be under 100ms.

Adapter Dispatch Duration indicates the latency Mixer is experiencing in calling adapters (through which it sends information to telemetry and logging systems). High latencies here will absolutely affect performance on the mesh. Again, p90 latencies should be under 10ms.

Monitoring Galley

Galley is Istio's configuration validation, ingestion, processing and distribution component. It conveys configuration from the Kubernetes API server to Pilot. Like Pilot, it tends to scale with the number of services and endpoints in the system.

Navigate to " BASE_URL/dashboard/db/istio-galley-dashboard " in your browser to view Galley metrics.

Important monitored metrics

Resource Validation

The most important metric to follow which indicates the number of resources of various types like Destination rules, Gateways and Service entries that are passing or failing validation.

Connected clients

Indicates how many clients are connected to Galley; typically this will be 3 (pilot, istio-telemetry, istio-policy) and will scale as those components scale.

16. Troubleshooting Istio

Troubleshooting the data plane

If your Pilot dashboard indicates that you have configuration issues, you should examine PIlot logs or use istioctl to find configuration problems.

To examine Pilot logs, run kubectl -n istio-system logs istio-pilot-69db46c598-45m44 discovery, replacing istio-pilot-... with the pod identifier for the Pilot instance you want to troubleshoot.

In the resulting log, search for a Push Status message. For example:

2019-11-07T01:16:20.451967Z        info        ads        Push Status: {
    "ProxyStatus": {
        "pilot_conflict_outbound_listener_tcp_over_current_tcp": {
            "0.0.0.0:443": {
                "proxy": "cartservice-7555f749f-k44dg.hipster",
                "message": "Listener=0.0.0.0:443 AcceptedTCP=accounts.google.com,*.googleapis.com RejectedTCP=edition.cnn.com TCPServices=2"
            }
        },
        "pilot_duplicate_envoy_clusters": {
            "outbound|15443|httpbin|istio-egressgateway.istio-system.svc.cluster.local": {
                "proxy": "sleep-6c66c7765d-9r85f.default",
                "message": "Duplicate cluster outbound|15443|httpbin|istio-egressgateway.istio-system.svc.cluster.local found while pushing CDS"
            },
            "outbound|443|httpbin|istio-egressgateway.istio-system.svc.cluster.local": {
                "proxy": "sleep-6c66c7765d-9r85f.default",
                "message": "Duplicate cluster outbound|443|httpbin|istio-egressgateway.istio-system.svc.cluster.local found while pushing CDS"
            },
            "outbound|80|httpbin|istio-egressgateway.istio-system.svc.cluster.local": {
                "proxy": "sleep-6c66c7765d-9r85f.default",
                "message": "Duplicate cluster outbound|80|httpbin|istio-egressgateway.istio-system.svc.cluster.local found while pushing CDS"
            }
        },
        "pilot_eds_no_instances": {
            "outbound_.80_._.frontend-external.hipster.svc.cluster.local": {},
            "outbound|443||*.googleapis.com": {},
            "outbound|443||accounts.google.com": {},
            "outbound|443||metadata.google.internal": {},
            "outbound|80||*.googleapis.com": {},
            "outbound|80||accounts.google.com": {},
            "outbound|80||frontend-external.hipster.svc.cluster.local": {},
            "outbound|80||metadata.google.internal": {}
        },
        "pilot_no_ip": {
            "loadgenerator-778c8489d6-bc65d.hipster": {
                "proxy": "loadgenerator-778c8489d6-bc65d.hipster"
            }
        }
    },
    "Version": "o1HFhx32U4s="
}

The Push Status will indicate any issues that occurred when trying to push the configuration to Envoy proxies – in this case, we see several "Duplicate cluster" messages, which indicate duplicate upstream destinations.

For assistance in diagnosing problems, contact Google Cloud support with issues.

Finding configuration errors

In order to use istioctl to analyze your configuration, run istioctl experimental analyze -k --context $OPS_GKE_1 . This will perform an analysis of configuration in your system, indicate any problems along with any suggested changes. See documentation for a full list of configuration errors that this command can detect.

17. Cleanup

An administrator runs the cleanup_workshop.sh script to delete resources created by the bootstrap_workshop.sh script. You need the following pieces of information for the cleanup script to run.

Organization name - for example yourcompany.com
Workshop ID - in the form YYMMDD-NN for example 200131-01
Admin GCS bucket - defined in the bootstrap script.

Open Cloud Shell, perform all actions below in Cloud Shell. Click on the link below.

CLOUD SHELL

Verify you are logged into gcloud with the intended Admin user.

gcloud config list

Navigate you the asm folder.

cd ${WORKDIR}/asm

Define your Organization name and workshop ID to be deleted.

export ORGANIZATION_NAME=<ORGANIZATION NAME>
export ASM_WORKSHOP_ID=<WORKSHOP ID>
export ADMIN_STORAGE_BUCKET=<ADMIN CLOUD STORAGE BUCKET>

Run the cleanup script as follows.

./scripts/cleanup_workshop.sh --workshop-id ${ASM_WORKSHOP_ID} --admin-gcs-bucket ${ADMIN_STORAGE_BUCKET} --org-name ${ORGANIZATION_NAME}

کارگاه مش سرویس Anthos: راهنمای آزمایشگاه با مجموعه‌ها، منظم بمانید ذخیره و طبقه‌بندی محتوا براساس اولویت‌های شما.

۱. کارگاه آلفا

۲. مرور کلی

نمودار معماری

دستور کار

اسلایدها

پیش‌نیازها

۳. راه‌اندازی زیرساخت - گردش کار مدیریتی

اسکریپت کارگاه بوت‌استرپ توضیح داده شد

مجوزهای مدیریتی مورد نیاز برای بوت‌استرپ کردن کارگاه

طرحواره کاربری و مجوزهای انجام کارگاه

ابزار مورد نیاز برای کارگاه

کارگاهی برای خودتان راه‌اندازی کنید (راه‌اندازی تک‌کاربره)

راه‌اندازی کارگاه برای چندین کاربر (راه‌اندازی چند کاربره)

۴. آماده‌سازی و چیدمان آزمایشگاه

مسیر آزمایشگاهی خود را انتخاب کنید

تنظیم سریع اسکریپت

دریافت اطلاعات کاربر

۵. راه‌اندازی زیرساخت - گردش کار کاربر

هدف: تأیید زیرساخت و نصب Istio

دستورالعمل‌های آزمایشگاهی روش کپی و چسباندن

دریافت اطلاعات کاربر

ابزار مورد نیاز برای کارگاه

دسترسی به پروژه مدیریت Terraform

تأیید نصب

نصب Istio را تأیید کنید

تأیید کشف سرویس برای صفحات کنترل مشترک

۶. توضیح مخزن زیرساخت

ساخت زیرساخت ابری

ساختار پوشه - تیم‌ها، محیط‌ها و منابع

ارائه دهنده، وضعیت‌ها و خروجی‌ها - بک‌اندها و وضعیت‌های مشترک

متغیرها

توضیح مخزن K8s

پروژه‌ها، خوشه‌های GKE و فضاهای نام

Kubernetes manifests and k8s_repo

7. Deploy the Sample App

Objective: Deploy Hipster shop app on apps clusters

Copy-and-Paste Method Lab Instructions

Clone the ops project source repo

Copy manifests, commit and push

Verify Application deployment

Access the Hipster Shop app

Global load balancing

GKE Autoneg controller

Secure Ingress using Cloud Endpoints and managed certs

Verify global load balancing

8. Observability with Stackdriver

Objective: Connect Istio telemetry to Stackdriver and validate.

Copy-and-Paste Method Lab Instructions

9. Mutual TLS Authentication

Objective: Secure connectivity between microservices (AuthN).

Copy-and-Paste Method Lab Instructions

Verify mTLS

10. Canary Deployments

Objective: Rollout a new version of the frontend Service.

Copy-and-Paste Method Lab Instructions

11. Authorization Policies

Objective: Set up RBAC between microservices (AuthZ).

Copy-and-Paste Method Lab Instructions

12. Infrastructure Scaling

Objective: Scale infrastructure by adding new region, project, and clusters.

Copy-and-Paste Method Lab Instructions

Verify Istio Installation

Verify service discovery for shared control planes

13. Circuit Breaking

Objective: Implement a Circuit Breaker for the shipping Service.

Fast Track Script Lab Instructions

Copy-and-Paste Method Lab Instructions

14. Fault Injection

Objective: Test the resilience of the recommendation Service by introducing delays (before it is pushed to production).

Fast Track Script Lab Instructions

Copy-and-Paste Method Lab Instructions

15. Monitoring the Istio Control Plane

Viewing the Dashboards

Monitoring Pilot

Important monitored metrics

Monitoring Mixer

Important monitored metrics

Monitoring Galley

Important monitored metrics

کارگاه مش سرویس Anthos: راهنمای آزمایشگاه