
视频讲解
rk3399使用阿里推理引擎MNN使用cpu和gpu进行benchmark,OpenCL效果不佳?
背景
MNN是阿里开源的推理引擎,今天测试一下在rk3399平台上的benchmark怎么样?
 alibaba/MNN: MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba (github.com)
首先git clone
git clone git@github.com:alibaba/MNN.git
创建build目录
cd MNN
mkdir build
cd build
cmake配置
注意交叉编译器以及opencl库的使用方式,是使用系统opencl库还是使用wrap进行dlopen加载
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DMNN_BUILD_DEMO=ON \
-DMNN_BUILD_BENCHMARK=true \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_VERSION=1 \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DMNN_OPENCL=ON \
-DMNN_USE_SYSTEM_LIB=ON \
-DCMAKE_C_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-g++make -j32
部署
然后将build目录下的libMNN.so以及benchmark.out和上级目录下的benchmark的model放到一起,同时libMNN.so需要放到rk3399的lib目录下
sudo cp libMNN.so /lib
sudo cp -rf ../benchmark/model .
然后运行benchmark测试,第二个参数:loop测试次数,第4个参数:0代表使用cpu,3代表使用opencl
cpu测试
firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 0clear
MNN benchmark
Forward type: CPU thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] SqueezeNetV1.0.mnn          max =   86.128 ms  min =   86.128 ms  avg =   86.128 ms
[ - ] MobileNetV2_224.mnn         max =   42.041 ms  min =   42.041 ms  avg =   42.041 ms
[ - ] inception-v3.mnn            max =  505.111 ms  min =  505.111 ms  avg =  505.111 ms
[ - ] mobilenetV3.mnn             max =   13.533 ms  min =   13.533 ms  avg =   13.533 ms
[ - ] nasnet.mnn                  max =  145.489 ms  min =  145.489 ms  avg =  145.489 ms
[ - ] mobilenet-v1-1.0.mnn        max =   66.624 ms  min =   66.624 ms  avg =   66.624 ms
[ - ] squeezenetv1.1.mnn          max =   40.437 ms  min =   40.437 ms  avg =   40.437 ms
[ - ] resnet-v2-50.mnn            max =  308.836 ms  min =  308.836 ms  avg =  308.836 ms
gpu测试
firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
arm_release_ver of this libmali is 'r18p0-01rel0', rk_so_ver is '4'.[ - ] SqueezeNetV1.0.mnn          max =  159.619 ms  min =  159.619 ms  avg =  159.619 ms
[ - ] MobileNetV2_224.mnn         max =  126.671 ms  min =  126.671 ms  avg =  126.671 ms
[ - ] inception-v3.mnn            max =  800.436 ms  min =  800.436 ms  avg =  800.436 ms
[ - ] mobilenetV3.mnn             max =   61.661 ms  min =   61.661 ms  avg =   61.661 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] nasnet.mnn                  max =  140.189 ms  min =  140.189 ms  avg =  140.189 ms
[ - ] mobilenet-v1-1.0.mnn        max =   98.918 ms  min =   98.918 ms  avg =   98.918 ms
[ - ] squeezenetv1.1.mnn          max =  121.158 ms  min =  121.158 ms  avg =  121.158 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] resnet-v2-50.mnn            max =  428.075 ms  min =  428.075 ms  avg =  428.075 ms
结论
可以看到,gpu使用上很慢且存在算子的问题,实际上在rk3568上测试opencl很流畅且没有问题,这里留下问题,之后探究