Tensorflow Lite 采坑记（三）

2023-11-23 02:54:56

Tensorflow Lite 代码解析 – 模型加载与执行

1. 整体流程

其Java API的核心类是Interpreter.java，其具体实现是在NativeInterpreterWrappter.java，而最终是调用到Native的nativeinterpreterwraptter_jni.h，自此就进入C++实现的逻辑。
在这里插入图片描述

2. 模型文件的加载和解析

在这里插入图片描述
（1）加载模型文件

//nativeinterpreter_jni.cc的createModel()函数是加载和解析文件的入口
JNIEXPORT jlong JNICALL
Java_org_tensorflow_lite_NativeInterpreterWrapper_createModel(JNIEnv* env, jclass clazz, jstring model_file, jlong error_handle) {BufferErrorReporter* error_reporter =convertLongToErrorReporter(env, error_handle);if (error_reporter == nullptr) return 0;const char* path = env->GetStringUTFChars(model_file, nullptr);std::unique_ptr verifier;verifier.reset(new JNIFlatBufferVerifier());//读取并解析文件关键代码auto model = tflite::FlatBufferModel::VerifyAndBuildFromFile(path, verifier.get(), error_reporter);if (!model) {throwException(env, kIllegalArgumentException,"Contents of %s does not encode a valid ""TensorFlowLite model: %s",path, error_reporter->CachedErrorMessage());env->ReleaseStringUTFChars(model_file, path);return 0;}env->ReleaseStringUTFChars(model_file, path);return reinterpret_cast(model.release());
}

上述逻辑中，最关键之处在于

auto model = tflite::FlatBufferModel::VerifyAndBuildFromFile(path, verifier.get(), error_reporter);

这行代码的作用是读取并解析文件

//代码在model.cc文件中
std::unique_ptr FlatBufferModel::VerifyAndBuildFromFile(const char* filename, TfLiteVerifier* verifier,ErrorReporter* error_reporter) {error_reporter = ValidateErrorReporter(error_reporter);std::unique_ptr model;//读取文件auto allocation = GetAllocationFromFile(filename, /*mmap_file=*/true,error_reporter, /*use_nnapi=*/true);if (verifier &&!verifier->Verify(static_cast(allocation->base()),allocation->bytes(), error_reporter)) {return model;}//用FlatBuffers库解析文件model.reset(new FlatBufferModel(allocation.release(), error_reporter));if (!model->initialized()) model.reset();return model;
}

（2）解析模型文件

上面的流程将模型文件读到FlatBuffers的Model数据结构中，具体数据结构定义可以见schema.fbs。接下去，需要文件中的数据映射成对应可以执行的op数据结构。这个工作主要由InterpreterBuilder完成。

/nativeinterpreter_jni.cc的createInterpreter()函数是将模型文件映射成可以执行的op的入口函数。
JNIEXPORT jlong JNICALL
Java_org_tensorflow_lite_NativeInterpreterWrapper_createInterpreter(JNIEnv* env, jclass clazz, jlong model_handle, jlong error_handle,jint num_threads) {tflite::FlatBufferModel* model = convertLongToModel(env, model_handle);if (model == nullptr) return 0;BufferErrorReporter* error_reporter =convertLongToErrorReporter(env, error_handle);if (error_reporter == nullptr) return 0;//先注册op，将op的实现和FlatBuffers中的index关联起来。auto resolver = ::tflite::CreateOpResolver();std::unique_ptr interpreter;//解析FlatBuffers，将配置文件中的内容映射成可以执行的op实例。TfLiteStatus status = tflite::InterpreterBuilder(*model, *(resolver.get()))(&interpreter, static_cast(num_threads));if (status != kTfLiteOk) {throwException(env, kIllegalArgumentException,"Internal error: Cannot create interpreter: %s",error_reporter->CachedErrorMessage());return 0;}// allocates memorystatus = interpreter->AllocateTensors();if (status != kTfLiteOk) {throwException(env, kIllegalStateException,"Internal error: Unexpected failure when preparing tensor allocations:"" %s",error_reporter->CachedErrorMessage());return 0;}return reinterpret_cast(interpreter.release());
}

这里首先调用::tflite::CreateOpResolver()完成op的注册，将op实例和FlatBuffers中的索引对应起来，具体索引见schema_generated.h里面的BuiltinOperator枚举。

//builtin_ops_jni.cc
std::unique_ptr CreateOpResolver() {  // NOLINTreturn std::unique_ptr(new tflite::ops::builtin::BuiltinOpResolver());
}//register.cc
BuiltinOpResolver::BuiltinOpResolver() {AddBuiltin(BuiltinOperator_RELU, Register_RELU());AddBuiltin(BuiltinOperator_RELU_N1_TO_1, Register_RELU_N1_TO_1());AddBuiltin(BuiltinOperator_RELU6, Register_RELU6());AddBuiltin(BuiltinOperator_TANH, Register_TANH());AddBuiltin(BuiltinOperator_LOGISTIC, Register_LOGISTIC());AddBuiltin(BuiltinOperator_AVERAGE_POOL_2D, Register_AVERAGE_POOL_2D());AddBuiltin(BuiltinOperator_MAX_POOL_2D, Register_MAX_POOL_2D());AddBuiltin(BuiltinOperator_L2_POOL_2D, Register_L2_POOL_2D());AddBuiltin(BuiltinOperator_CONV_2D, Register_CONV_2D());AddBuiltin(BuiltinOperator_DEPTHWISE_CONV_2D, Register_DEPTHWISE_CONV_2D()...
}

InterpreterBuilder则负责根据FlatBuffers内容，构建Interpreter对象。

TfLiteStatus InterpreterBuilder::operator()(std::unique_ptr* interpreter, int num_threads) {.....// Flatbuffer model schemas define a list of opcodes independent of the graph.// We first map those to registrations. This reduces string lookups for custom// ops since we only do it once per custom op rather than once per custom op// invocation in the model graph.// Construct interpreter with correct number of tensors and operators.auto* subgraphs = model_->subgraphs();auto* buffers = model_->buffers();if (subgraphs->size() != 1) {error_reporter_->Report("Only 1 subgraph is currently supported.\n");return cleanup_and_error();}const tflite::SubGraph* subgraph = (*subgraphs)[0];auto operators = subgraph->operators();auto tensors = subgraph->tensors();if (!operators || !tensors || !buffers) {error_reporter_->Report("Did not get operators, tensors, or buffers in input flat buffer.\n");return cleanup_and_error();}interpreter->reset(new Interpreter(error_reporter_));if ((**interpreter).AddTensors(tensors->Length()) != kTfLiteOk) {return cleanup_and_error();}// Set num threads(**interpreter).SetNumThreads(num_threads);// Parse inputs/outputs(**interpreter).SetInputs(FlatBufferIntArrayToVector(subgraph->inputs()));(**interpreter).SetOutputs(FlatBufferIntArrayToVector(subgraph->outputs()));// Finally setup nodes and tensorsif (ParseNodes(operators, interpreter->get()) != kTfLiteOk)return cleanup_and_error();if (ParseTensors(buffers, tensors, interpreter->get()) != kTfLiteOk)return cleanup_and_error();std::vector variables;for (int i = 0; i < (*interpreter)->tensors_size(); ++i) {auto* tensor = (*interpreter)->tensor(i);if (tensor->is_variable) {variables.push_back(i);}}(**interpreter).SetVariables(std::move(variables));return kTfLiteOk;
}

（3）模型执行
在这里插入图片描述
经过InterpreterBuilder的工作，模型文件的内容已经解析成可执行的op，存储在interpreter.cc的nodes_and_registration_列表中。剩下的工作就是循环遍历调用每个op的invoke接口。

//interpreter.cc
TfLiteStatus Interpreter::Invoke() {
.....// Invocations are always done in node order.// Note that calling Invoke repeatedly will cause the original memory plan to// be reused, unless either ResizeInputTensor() or AllocateTensors() has been// called.// TODO(b/71913981): we should force recalculation in the presence of dynamic// tensors, because they may have new value which in turn may affect shapes// and allocations.for (int execution_plan_index = 0;execution_plan_index < execution_plan_.size(); execution_plan_index++) {if (execution_plan_index == next_execution_plan_index_to_prepare_) {TF_LITE_ENSURE_STATUS(PrepareOpsAndTensors());TF_LITE_ENSURE(&context_, next_execution_plan_index_to_prepare_ >=execution_plan_index);}int node_index = execution_plan_[execution_plan_index];TfLiteNode& node = nodes_and_registration_[node_index].first;const TfLiteRegistration& registration =nodes_and_registration_[node_index].second;SCOPED_OPERATOR_PROFILE(profiler_, node_index);// TODO(ycling): This is an extra loop through inputs to check if the data// need to be copied from Delegate buffer to raw memory, which is often not// needed. We may want to cache this in prepare to know if this needs to be// done for a node or not.for (int i = 0; i < node.inputs->size; ++i) {int tensor_index = node.inputs->data[i];if (tensor_index == kOptionalTensor) {continue;}TfLiteTensor* tensor = &tensors_[tensor_index];if (tensor->delegate && tensor->delegate != node.delegate &&tensor->data_is_stale) {EnsureTensorDataIsReadable(tensor_index);}}EnsureTensorsVectorCapacity();tensor_resized_since_op_invoke_ = false;//逐个op调用if (OpInvoke(registration, &node) == kTfLiteError) {status = ReportOpError(&context_, node, registration, node_index,"failed to invoke");}// Force execution prep for downstream ops if the latest op triggered the// resize of a dynamic tensor.if (tensor_resized_since_op_invoke_ &&HasDynamicTensor(context_, node.outputs)) {next_execution_plan_index_to_prepare_ = execution_plan_index + 1;}}if (!allow_buffer_handle_output_) {for (int tensor_index : outputs_) {EnsureTensorDataIsReadable(tensor_index);}}return status;
}

2. GPU delegate

Tensorflow Lite的GPU加速是基于Open GL（Android）和Metal（Apple）
GpuDelegate.java做了接口封装，这部分的源码参考lite/delegates/gpu
在执行GPU运算时，数据经常会涉及到从CPU转到GPU的情况，往往这种数据的拷贝与转移会耗费大量的时间，TensorFlow Lite提供了一个解决方案，可以直接读取或直接写入数据到GPU中的硬件缓冲区并绕过可避免的memory copies:

Assuming the camera input is in the GPU memory as GL_TEXTURE_2D, it must be first converted to a shader storage buffer object (SSBO) for OpenGL or to a MTLBuffer object for Metal. One can associate a TfLiteTensor with a user-prepared SSBO or MTLBuffer with TfLiteGpuDelegateBindBufferToTensor() or TfLiteMetalDelegateBindBufferToTensor(), respectively.


// Prepare GPU delegate.
auto* delegate = TfLiteGpuDelegateCreate(nullptr);
interpreter->SetAllowBufferHandleOutput(true);  // disable default gpu->cpu copy
#if defined(__ANDROID__)
if (TfLiteGpuDelegateBindBufferToTensor(delegate, user_provided_input_buffer, interpreter->inputs()[0]) != kTfLiteOk) return;
if (TfLiteGpuDelegateBindBufferToTensor(delegate, user_provided_output_buffer, interpreter->outputs()[0]) != kTfLiteOk) return;
#elif defined(__APPLE__)
if (TfLiteMetalDelegateBindBufferToTensor(delegate, user_provided_input_buffer, interpreter->inputs()[0]) != kTfLiteOk) return;
if (TfLiteMetalDelegateBindBufferToTensor(delegate, user_provided_output_buffer, interpreter->outputs()[0]) != kTfLiteOk) return;
#endif
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return;// Run inference.
if (interpreter->Invoke() != kTfLiteOk) return;

需要注意的是，上述方法需要在Interpreter::ModifyGraphWithDelegate()之前调用。

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > fastadmin采坑总结大全（持续更新）
下一篇 > 字符串常量池和运行时常量池是在堆还是在方法区？

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

Tensorflow Lite 采坑记（三）

Tensorflow Lite 代码解析 – 模型加载与执行

1. 整体流程

2. 模型文件的加载和解析

2. GPU delegate

相关文章