二、模型训练与优化（5）：验证优化后的模型

一、在 PC 上使用 TFLite Python Interpreter 进行推理

1. 安装 TFLite Python Interpreter

2. 编写推理验证脚本

脚本说明：

3. 性能测试（可选）

二、在嵌入式平台（STM32Cube.AI）上验证

在嵌入式平台上对比准确率的方法

三、总结

下面将以 PC 上使用 TFLite Python Interpreter 和 嵌入式平台（STM32Cube.AI） 两个典型场景为例，详细说明如何验证量化后（.tflite）的模型推理精度与性能。

一、在 PC 上使用 TFLite Python Interpreter 进行推理

1. 安装 TFLite Python Interpreter

使用 pip 安装（适用于常见的 Windows/macOS/Linux 环境）：
```
pip install tflite-runtime
```
- 或者安装 tensorflow 本身就可以使用内置的 tensorflow.lite.Interpreter（2.5+ 版本的 TF 通常都包含该功能）。
验证安装：
```
python -c "import tflite_runtime; print(tflite_runtime.__version__)"
```
如果没有报错且输出相应版本，说明安装成功。

2. 编写推理验证脚本

我们可以编写如下 Python 脚本（例如 test_tflite_inference.py）来加载并推理。

import numpy as np
import tflite_runtime.interpreter as tflite# 如果是使用 TensorFlow 2.x 的 tf.lite.Interpreter:
# from tensorflow.lite import Interpreterdef load_mnist_data():# 加载 MNIST 测试集(_, _), (x_test, y_test) = tf.keras.datasets.mnist.load_data()x_test = x_test.astype("float32") / 255.0x_test = x_test.reshape(-1, 28 * 28)return x_test, y_testdef evaluate_tflite_model(tflite_model_path):# 1. 加载 TFLite Interpreterinterpreter = tflite.Interpreter(model_path=tflite_model_path)interpreter.allocate_tensors()# 2. 获取输入、输出张量的索引input_details = interpreter.get_input_details()output_details = interpreter.get_output_details()# 3. 加载 MNIST 测试数据x_test, y_test = load_mnist_data()correct = 0total = len(x_test)for i in range(total):# 取出第 i 个测试样本input_data = x_test[i].reshape(1, 28 * 28).astype(np.float32)  # 有些量化模型需要 np.uint8，这要看具体 input_details['dtype']# 例如:#   input_dtype = input_details[0]['dtype']  # e.g. np.uint8#   input_data = input_data.astype(input_dtype)# 4. 将数据拷入 TFLite 模型的输入张量interpreter.set_tensor(input_details[0]['index'], input_data)# 5. 推理interpreter.invoke()# 6. 获取输出张量并预测标签output_data = interpreter.get_tensor(output_details[0]['index'])  # shape: [1,10]predicted_label = np.argmax(output_data[0])true_label = y_test[i]if predicted_label == true_label:correct += 1accuracy = correct / totalprint(f"TFLite 模型测试集准确率: {accuracy:.4f}")if __name__ == "__main__":import tensorflow as tf  # 用于加载 MNISTevaluate_tflite_model("mnist_model_quant.tflite")

脚本说明：

导入 TFLite Interpreter
- 如果我们安装了 tflite_runtime，则 import tflite_runtime.interpreter as tflite 就能使用。
- 如果我们安装了完整的 TensorFlow，注释掉上面并使用 from tensorflow.lite import Interpreter 即可。
载入测试数据
- 这里直接在脚本里用 tf.keras.datasets.mnist 加载了 MNIST 测试集，并做简单归一化 (x_test / 255.0)。
循环测试
- 逐条将 x_test[i] 输入到 TFLite Interpreter，获取输出并比较预测标签和真实标签。
查看准确率
- 最后打印出 “TFLite 模型测试集准确率”，和我们原先的 Keras float32 模型做对比，可以看到量化引起的精度损失大小。
运行脚本
```
python test_tflite_inference.py
```
- 若一切正常，会输出一个准确率值，比如 0.9740。

3. 性能测试（可选）

如果我们想对比推理速度，可在 for-loop 外部记录时间，然后观察每张图片平均推理时间或整段推理耗时。
在 CPU、GPU 或特定加速器环境下，TFLite 的性能也会有所不同。

二、在嵌入式平台（STM32Cube.AI）上验证

当需要在 STM32 等微控制器上部署量化后的 .tflite 模型，主要流程是：

安装并使用 STM32CubeMX 或 STM32CubeIDE
- 在“Additional Software”中选择并安装 STM32Cube.AI 插件。
创建或打开 STM32CubeMX 工程
- 选择目标 MCU 或开发板（如 STM32F4、STM32H7、STM32L4 等）。
- 启用相关外设（如 UART 调试、SD 卡接口等）。
- 打开 AI 选项卡，导入您的 .tflite 模型。
STM32Cube.AI 转换与生成 C 代码
- 在 STM32CubeMX 中的 AI 选项里点击 “Import Model”，选择 mnist_model_quant.tflite。
- 选择量化选项（如 8-bit），或者保留默认。
- 点击 “Generate Code” 生成对应的 C 源文件和头文件（例如 network.c/h、network_data.c/h 等），并自动在工程里包含 AI 推理接口（如 MX_X_Cube_AI_Init()、MX_X_Cube_AI_Process()）。

集成到 STM32 工程

在生成的 CubeIDE 工程中，我们会看到 AI 推理函数和模型数据文件已经添加；

编写应用层代码，调用 AI 推理 API，如：

#include "network.h"
#include "network_data.h"// 输入/输出 buffer
static ai_float in_data[784];    // MNIST 28x28
static ai_float out_data[10];    // 10 classesvoid MX_X_Cube_AI_Process(void)
{// 准备输入数据 (这里只是演示, 实际上要填入真实的像素值)for(int i=0; i<784; i++){in_data[i] = 0.0f; // 0~1 scale}// 调用推理函数if (ai_run(in_data, out_data) != AI_HANDLE_OK) {// error handle}// 寻找最大概率标签int max_i = 0;float max_val = out_data[0];for(int i=1; i<10; i++){if(out_data[i] > max_val){max_val = out_data[i];max_i = i;}}// 通过串口或 LED 显示分类结果 max_i
}

ai_run 通常是 STM32Cube.AI 生成的模型推理函数（或 ai_network_run / ai_mnist_model_run 等名称，根据工程不同而定）。

编译并下载到开发板
- 在 STM32CubeIDE 中点击 Build & Run，使用 ST-Link 下载程序到板子上。
- 程序运行后，会对输入的 784 个 float 数据进行量化的模型推理，并输出预测结果。
推理速度和内存占用
- 可以通过在 main 循环中多次调用 ai_run() 并记录时间戳，对比量化前和量化后的推理耗时；
- 使用 arm_math.h or HAL_GetTick() 等方法统计推理时间，或者在 SWV/ITM Trace 里查看任务执行时间。
- 也可查看 .map 文件或者调试器，了解 RAM/Flash 占用情况。

在嵌入式平台上对比准确率的方法

通常，我们不会在实际硬件上跑 10,000 张 MNIST 图片，因为存储和时间有限。可以：
1. 随机抽取几十张或几百张图片，通过 UART 或外部存储传给 STM32，观察每张的推理结果，计算准确率。
2. 单张测试作为 Demo，验证实际是否能正确识别各个数字。
3. 离线：在 PC 上已经得到大概率估计 97%+ 的准确率；嵌入式平台只做部分验证确认是否一致（可能略有浮动）。

三、总结

PC 上验证
- 安装 tflite_runtime 或使用 tensorflow.lite.Interpreter；
- 编写脚本加载 .tflite 模型并推理测试集，以查看准确率和推理速度；
- 适合快速验证，不依赖目标硬件特性。
STM32 或其他 MCU 上验证
- 使用 STM32Cube.AI 导入 .tflite 或 .h5；
- 生成 C 代码并集成到固件工程；
- 在真实硬件上测试资源占用、推理速度、以及预测结果是否符合预期。