TensorFlow.js - 從 2D 資料進行預測

1. 簡介

在本程式碼研究室中，您將訓練模型，根據描述一組車輛的數字資料進行預測。

本練習將示範訓練多種不同模型時常用的步驟，但會使用小型資料集和簡易 (淺層) 模型。主要目標在於協助您熟悉 TensorFlow.js 訓練模型的基本術語、概念和語法，並提供有關進一步探索和學習的踏板。

由於我們訓練模型來預測連續數字，因此這項工作有時也稱為「迴歸」工作。為了訓練模型，我們會提供許多輸入內容範例和正確輸出內容。這項功能稱為「監督式學習」。

建構目標

您將建立一個使用 TensorFlow.js 在瀏覽器中訓練模型的網頁。有「馬力」模型會學習預測「每加侖英里數」。

如要這麼做，您需要：

載入資料並準備用於訓練。
定義模型的架構。
訓練模型並在訓練期間監控成效。
進行預測以評估訓練過的模型。

課程內容

為機器學習準備資料的最佳做法，包括重組和正規化。
使用 tf.layers API 建立模型的 TensorFlow.js 語法。
如何使用 tfjs-vis 程式庫監控瀏覽器內訓練。

軟硬體需求

使用最新版本的 Chrome 或其他新版瀏覽器。
文字編輯器，可在您的電腦本機或透過 Codepen 或 Glitch 等網路執行。
具備 HTML、CSS、JavaScript 和 Chrome 開發人員工具 (或您偏好的瀏覽器開發人員工具) 的知識。
大致瞭解類神經網路的概念。如需相關簡介或複習，請觀看這部 3blue1brown 前的影片，或是這部 Ashi Krishnan 推出的「Deep Learning in JavaScript」影片。

2. 做好準備

建立 HTML 網頁並加入 JavaScript

將下列程式碼複製到名為 的 HTML 檔案中

index.html

<!DOCTYPE html>
<html>
<head>
  <title>TensorFlow.js Tutorial</title>

  <!-- Import TensorFlow.js -->
  <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
  <!-- Import tfjs-vis -->
  <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-vis@1.0.2/dist/tfjs-vis.umd.min.js"></script>
</head>
<body>
  <!-- Import the main script file -->
  <script src="script.js"></script>
</body>
</html>

建立程式碼的 JavaScript 檔案

在與上述 HTML 檔案相同的資料夾中，建立名為 script.js 的檔案，然後將下列程式碼加入其中。

console.log('Hello TensorFlow');

測試

HTML 和 JavaScript 檔案建立完畢後，請測試這些檔案。在瀏覽器中開啟 index.html 檔案，然後開啟開發人員工具控制台。

如果一切正常，開發人員工具控制台中應該會顯示兩個全域變數：

tf 是 TensorFlow.js 程式庫的參照
tfvis 是 tfjs-vis 程式庫的參照

開啟瀏覽器的開發人員工具，控制台輸出內容中應該會顯示 Hello TensorFlow 訊息。如果有的話，就可以進行下一個步驟。

3. 載入輸入資料、設定格式並以視覺化方式呈現輸入資料

首先，我們要載入訓練模型，設定格式並以視覺化方式呈現訓練模型的資料。

我們將載入「cars」資料集它包含每輛車的許多不同功能。針對本教學課程，我們只想擷取馬力和每加侖的英里數資料。

將下列程式碼加入您的

script.js 檔案

/**
 * Get the car data reduced to just the variables we are interested
 * and cleaned of missing data.
 */
async function getData() {
  const carsDataResponse = await fetch('https://storage.googleapis.com/tfjs-tutorials/carsData.json');
  const carsData = await carsDataResponse.json();
  const cleaned = carsData.map(car => ({
    mpg: car.Miles_per_Gallon,
    horsepower: car.Horsepower,
  }))
  .filter(car => (car.mpg != null && car.horsepower != null));

  return cleaned;
}

這樣也會一併移除任何未定義每加侖或馬力值的項目。我們還要用散佈圖繪製這項資料，以便瞭解資料的樣子

請將以下程式碼加到您的

script.js 檔案。

async function run() {
  // Load and plot the original input data that we are going to train on.
  const data = await getData();
  const values = data.map(d => ({
    x: d.horsepower,
    y: d.mpg,
  }));

  tfvis.render.scatterplot(
    {name: 'Horsepower v MPG'},
    {values},
    {
      xLabel: 'Horsepower',
      yLabel: 'MPG',
      height: 300
    }
  );

  // More code will be added below
}

document.addEventListener('DOMContentLoaded', run);

重新整理頁面時。您應該會在頁面左側看到一個含有資料的散佈圖的面板。看起來應該會像這樣

這個面板稱為 visor，由 tfjs-vis 提供。這個工具可讓您輕鬆顯示視覺化內容。

一般來說，處理資料時，建議您設法查看資料，並視需要加以清除。在此情況下，我們必須從 carsData 中移除缺少所有必填欄位的項目。將資料以視覺化的方式呈現，可讓模型瞭解資料是否有任何結構。

從上圖可看出，馬力與 MPG 之間呈現負相關，例如馬力越強，車輛每加侖的里程數通常較少。

瞭解我們的工作

輸入資料現在看起來會像這樣。

...
{
  "mpg":15,
  "horsepower":165,
},
{
  "mpg":18,
  "horsepower":150,
},
{
  "mpg":16,
  "horsepower":150,
},
...

我們的目標是訓練一個會使用「一個數字」和「馬力」為模型的模型，並學習如何預測「每加侖英里數」這個數字。請記得，一對一對應，這對下一個部分來說至關重要。

我們將將這些範例 (馬力和 MPG) 提供給類神經網路，從這些範例中學習這個公式 (或函數) 來預測 MPG 提供的馬力。我們從具有正確答案的例子中學習稱為監督式學習。

4. 定義模型架構

在本節中，我們將撰寫程式碼來說明模型架構。模型架構是一種精緻做法，可說明「模型在執行時會執行的函式」，或「Google 模型會使用哪些演算法計算答案」。

機器學習模型是會接收輸入內容並產生輸出內容的演算法。使用類神經網路時，演算法是一組具有「權重」的神經元層(數字) 控管其輸出內容。訓練程序將學習這些權重的理想值。

將下列函式加入

script.js 檔案來定義模型架構。

function createModel() {
  // Create a sequential model
  const model = tf.sequential();

  // Add a single input layer
  model.add(tf.layers.dense({inputShape: [1], units: 1, useBias: true}));

  // Add an output layer
  model.add(tf.layers.dense({units: 1, useBias: true}));

  return model;
}

這是我們在 tensorflow.js 中能定義的最簡單模型之一，我們稍後會逐一詳細介紹。

將模型執行個體化

const model = tf.sequential();

這會將 tf.Model 物件執行個體化。這個模型的輸入內容直接向下流向輸出，因此是 sequential。其他類型的模型可以包含分支版本，甚至是多個輸入和輸出內容，但在多數情況下，模型會依序處理。依序模型較容易使用 API。

新增圖層

model.add(tf.layers.dense({inputShape: [1], units: 1, useBias: true}));

這樣會在網路中新增輸入層，這個圖層會自動連接至含有一個隱藏單元的 dense 層。dense 圖層是一種圖層，可以將輸入內容乘以矩陣 (稱為「權重」)，然後在結果中加入數字 (稱為「偏誤」)。由於這是網路的第一層，我們必須定義 inputShape。inputShape 為 [1]，因為我們輸入的數值是 1 (特定車輛的馬力)。

units 可設定圖層的權重矩陣大小。將此值設為 1，即表示資料的每個輸入特徵都會有 1 個權重。

model.add(tf.layers.dense({units: 1}));

上述程式碼會建立輸出層。我們已將 units 設為 1，因為我們要輸出 1 號碼。

建立執行個體

將下列程式碼加到

run 函式。

// Create the model
const model = createModel();
tfvis.show.modelSummary({name: 'Model Summary'}, model);

這將建立模型的執行個體，並在網頁上顯示各圖層的摘要。

5. 準備用於訓練的資料

為了讓 TensorFlow.js 能發揮實際效能，訓練機器學習模型時，我們必須將資料轉換為「張量」。我們還會針對資料執行多項轉換，包括重組和正規化。

將下列程式碼加入您的

script.js 檔案

/**
 * Convert the input data to tensors that we can use for machine
 * learning. We will also do the important best practices of _shuffling_
 * the data and _normalizing_ the data
 * MPG on the y-axis.
 */
function convertToTensor(data) {
  // Wrapping these calculations in a tidy will dispose any
  // intermediate tensors.

  return tf.tidy(() => {
    // Step 1. Shuffle the data
    tf.util.shuffle(data);

    // Step 2. Convert data to Tensor
    const inputs = data.map(d => d.horsepower)
    const labels = data.map(d => d.mpg);

    const inputTensor = tf.tensor2d(inputs, [inputs.length, 1]);
    const labelTensor = tf.tensor2d(labels, [labels.length, 1]);

    //Step 3. Normalize the data to the range 0 - 1 using min-max scaling
    const inputMax = inputTensor.max();
    const inputMin = inputTensor.min();
    const labelMax = labelTensor.max();
    const labelMin = labelTensor.min();

    const normalizedInputs = inputTensor.sub(inputMin).div(inputMax.sub(inputMin));
    const normalizedLabels = labelTensor.sub(labelMin).div(labelMax.sub(labelMin));

    return {
      inputs: normalizedInputs,
      labels: normalizedLabels,
      // Return the min/max bounds so we can use them later.
      inputMax,
      inputMin,
      labelMax,
      labelMin,
    }
  });
}

現在就來詳細說明

重組資料

// Step 1. Shuffle the data
tf.util.shuffle(data);

我們在此處隨機排列提供給訓練演算法的範例順序。重組十分重要，因為在訓練期間，系統會將用來訓練模型的資料集細分為較小的子集 (稱為「批次」)。隨機排序可協助每個批次的資料分佈情形中提供多種資料。這麼做可協助模型：

無法學習完全依賴資料動態饋給的順序
不必區分子群組中的結構 (例如，如果模型的前半段只看到高馬力汽車，就可能學習關係，不適用於其他資料集)。

轉換為張量

// Step 2. Convert data to Tensor
const inputs = data.map(d => d.horsepower)
const labels = data.map(d => d.mpg);

const inputTensor = tf.tensor2d(inputs, [inputs.length, 1]);
const labelTensor = tf.tensor2d(labels, [labels.length, 1]);

我們製作兩個陣列，一個用於輸入範例 (馬力項目)，另一個代表真實輸出值 (在機器學習中稱為標籤)。

接著將每個陣列資料轉換為 2d 張量。張量的形狀會是 [num_examples, num_features_per_example]。以下提供 inputs.length 範例，每個範例都具備 1 輸入特徵 (馬力)。

將資料正規化

//Step 3. Normalize the data to the range 0 - 1 using min-max scaling
const inputMax = inputTensor.max();
const inputMin = inputTensor.min();
const labelMax = labelTensor.max();
const labelMin = labelTensor.min();

const normalizedInputs = inputTensor.sub(inputMin).div(inputMax.sub(inputMin));
const normalizedLabels = labelTensor.sub(labelMin).div(labelMax.sub(labelMin));

接下來，我們要做另一項機器學習訓練最佳做法。我們會將資料正規化。在此，我們會使用最小值到最大縮放功能，將資料正規化為數值範圍 0-1。正規化十分重要，因為透過 tensorflow.js 建構的許多機器學習模型，其內部都是專為處理過小的數字而設計。將資料正規化後，納入 0 to 1 或 -1 to 1 的常見範圍。如果讓資料習慣在合理範圍內進行正規化，您將更能成功訓練模型。

傳回資料和正規化邊界

return {
  inputs: normalizedInputs,
  labels: normalizedLabels,
  // Return the min/max bounds so we can use them later.
  inputMax,
  inputMin,
  labelMax,
  labelMin,
}

我們希望保留訓練期間用於正規化的值，以便不正規化輸出資料，回到原本的量表，並讓我們以同樣的方式將未來的輸入資料正規化。

6. 訓練模型

建立模型執行個體並以「張量」表示資料後，我們就有一切準備開始訓練程序。

將下列函式複製到

script.js 檔案。

async function trainModel(model, inputs, labels) {
  // Prepare the model for training.
  model.compile({
    optimizer: tf.train.adam(),
    loss: tf.losses.meanSquaredError,
    metrics: ['mse'],
  });

  const batchSize = 32;
  const epochs = 50;

  return await model.fit(inputs, labels, {
    batchSize,
    epochs,
    shuffle: true,
    callbacks: tfvis.show.fitCallbacks(
      { name: 'Training Performance' },
      ['loss', 'mse'],
      { height: 200, callbacks: ['onEpochEnd'] }
    )
  });
}

來詳細說明

為訓練做好準備

// Prepare the model for training.
model.compile({
  optimizer: tf.train.adam(),
  loss: tf.losses.meanSquaredError,
  metrics: ['mse'],
});

我們必須「編譯」先訓練模型為此，我們必須指定一些非常重要的項目：

optimizer：這種演算法會在模型看到樣本時控管其更新。TensorFlow.js 中有許多最佳化器。我們在這裡挑選了 adam 最佳化工具，它在實務上非常有效，而且不需要設定。
loss：這個函式將告知模型每個已顯示的批次 (資料子集) 學習成效。這裡，我們使用 meanSquaredError，比較模型產生的預測結果與真實值。

const batchSize = 32;
const epochs = 50;

接著要選擇批量和幾個訓練週期：

batchSize 是指模型在每次訓練疊代時看到的資料子集大小。常見的批量介於 32 到 512 之間。並非所有問題的批量都適合，但是這不在本教學課程的討論範圍內，我們將解說各種批量的數學動機。
epochs 是指模型查看您提供的整個資料集的次數。在這個資料集裡，我們會疊代 50 次資料集

啟動火車迴圈

return await model.fit(inputs, labels, {
  batchSize,
  epochs,
  callbacks: tfvis.show.fitCallbacks(
    { name: 'Training Performance' },
    ['loss', 'mse'],
    { height: 200, callbacks: ['onEpochEnd'] }
  )
});

model.fit 是我們會呼叫的函式，用來啟動訓練迴圈。這是非同步函式，因此我們會傳回它提供的承諾，讓呼叫端能判斷訓練完成的時間。

為了監控訓練進度，我們會將一些回呼傳遞至 model.fit。我們使用 tfvis.show.fitCallbacks 產生用於繪製「損失」圖表的函式以及「mse」指標數據

完整的實作範例

現在，我們必須呼叫從 run 函式定義的函式。

請將以下程式碼加到您的

run 函式。

// Convert the data to a form we can use for training.
const tensorData = convertToTensor(data);
const {inputs, labels} = tensorData;

// Train the model
await trainModel(model, inputs, labels);
console.log('Done Training');

重新整理頁面後，您應該在幾秒後看到下列圖表正在更新。

這些程式碼是由我們先前建立的回呼所建立。它們會顯示每個週期結束時的全部資料集平均值。

訓練模型時，我們希望可以看到損失下降。本例中的指標是錯誤的指標，因此我們想看到結果也會下降。

7. 進行預測

模型訓練完成之後，接下來想要進行預測，接著我們來評估模型在低至高馬力數範圍內預測的內容。

在 Script.js 檔案中新增下列函式

function testModel(model, inputData, normalizationData) {
  const {inputMax, inputMin, labelMin, labelMax} = normalizationData;

  // Generate predictions for a uniform range of numbers between 0 and 1;
  // We un-normalize the data by doing the inverse of the min-max scaling
  // that we did earlier.
  const [xs, preds] = tf.tidy(() => {

    const xsNorm = tf.linspace(0, 1, 100);
    const predictions = model.predict(xsNorm.reshape([100, 1]));

    const unNormXs = xsNorm
      .mul(inputMax.sub(inputMin))
      .add(inputMin);

    const unNormPreds = predictions
      .mul(labelMax.sub(labelMin))
      .add(labelMin);

    // Un-normalize the data
    return [unNormXs.dataSync(), unNormPreds.dataSync()];
  });


  const predictedPoints = Array.from(xs).map((val, i) => {
    return {x: val, y: preds[i]}
  });

  const originalPoints = inputData.map(d => ({
    x: d.horsepower, y: d.mpg,
  }));


  tfvis.render.scatterplot(
    {name: 'Model Predictions vs Original Data'},
    {values: [originalPoints, predictedPoints], series: ['original', 'predicted']},
    {
      xLabel: 'Horsepower',
      yLabel: 'MPG',
      height: 300
    }
  );
}

使用上述函式時需要留意的幾個事項。

const xsNorm = tf.linspace(0, 1, 100);
const predictions = model.predict(xsNorm.reshape([100, 1]));

我們會產生 100 個新的「範例」傳送給模型Model.predict 是我們將這些範例輸入模型的方式，請注意，這些形狀的形狀必須和我們訓練時一樣 ([num_examples, num_features_per_example])。

// Un-normalize the data
const unNormXs = xsNorm
  .mul(inputMax.sub(inputMin))
  .add(inputMin);

const unNormPreds = predictions
  .mul(labelMax.sub(labelMin))
  .add(labelMin);

為了將資料回復到原本的範圍 (而不是 0-1)，我們會使用正規化時計算出的值，但只是反轉運算。

return [unNormXs.dataSync(), unNormPreds.dataSync()];

.dataSync() 是可用於取得儲存在張量中之 typedarray 值的方法。這樣我們就能在一般 JavaScript 中處理這些值。這是 .data() 方法的同步版本，通常建議使用。

最後，我們使用 tfjs-vis 繪製原始資料和模型的預測結果。

將下列程式碼加入您的

run 函式。

// Make some predictions using the model and compare them to the
// original data
testModel(model, data, tensorData);

重新整理頁面後，畫面應會顯示如下的內容。

恭喜！您剛剛訓練了一個簡單的機器學習模型。這項工具目前會執行稱為線性迴歸的運算，試著將線條與輸入資料中呈現的趨勢對齊。

8. 主要重點

訓練機器學習模型的步驟包括：

規劃工作：

是迴歸問題還是分類問題？
監督式學習或非監督式學習都可以做到這一點嗎？
輸入資料的形狀為何？輸出資料應呈現什麼形式？

準備資料：

清理資料，可能的話，手動檢查資料是否有模式
先重組資料，再用於訓練
請將資料正規化為類神經網路的合理範圍。通常 0-1 或 -1-1 是適合數值資料的範圍。
將資料轉換為張量

建構並執行模型：

使用 tf.sequential 或 tf.model 定義模型，然後使用 tf.layers.* 新增圖層
選擇最佳化工具 ( adam 通常是個好方法)，以及批量和訓練週期數等參數。
根據問題選擇合適的損失函式，以及評估進度的準確度指標。meanSquaredError 是迴歸問題的常見損失函式。
監控訓練，看看損失是否有下降

評估模型

選擇您在訓練時可以監控的模型評估指標。訓練完畢後，請嘗試進行一些測試預測，以便瞭解預測品質。

9. 額外課程內容：建議做法

實驗變更訓練週期數。您需要多少訓練週期才能平分圖形。
嘗試增加隱藏層中的單位數量。
嘗試在我們新增的第一個隱藏層和最終輸出層「介於」之間，新增更多隱藏層。這些額外圖層的程式碼應如下所示。

model.add(tf.layers.dense({units: 50, activation: 'sigmoid'}));

對這些隱藏層來說，最重要的新好處是導入非線性活化函式，在本例中為 sigmoid 啟動函式。如要進一步瞭解啟動函式，請參閱這篇文章。

請確認您是否能讓模型產生輸出內容，如下圖所示。

TensorFlow.js - 從 2D 資料進行預測 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

1. 簡介

建構目標

課程內容

軟硬體需求

2. 做好準備

建立 HTML 網頁並加入 JavaScript

建立程式碼的 JavaScript 檔案

測試

3. 載入輸入資料、設定格式並以視覺化方式呈現輸入資料

瞭解我們的工作

4. 定義模型架構

將模型執行個體化

新增圖層

建立執行個體

5. 準備用於訓練的資料

重組資料

轉換為張量

將資料正規化

傳回資料和正規化邊界

6. 訓練模型

為訓練做好準備

啟動火車迴圈

完整的實作範例

7. 進行預測

8. 主要重點

9. 額外課程內容：建議做法

TensorFlow.js - 從 2D 資料進行預測