TensorFlow的简介
TensorFlow是一个机器学习框架,其整体架构设计主要分成Client,Master和Worker。解耦的架构使得它具有高度灵活性,使它可以方便地在机器集群上部署。
TensorFlow的代码架构
TensorFlow整体架构如下(图片来自官网)。
Client
Client是算法工程师直接接触使用的。有Python,C++,Java等不同的版本。它的主要作用是:
- 将计算过程定义成计算图。机器学习主要存在命令式和声明式两种不同的编程模型。命令式编程模型就是我们一般的编程方式。声明式模型类似于RxJava那样,先构建一个数据通道,等事件触发时,才会真正有数据喂入,并执行。TensorFlow就是声明式的编程模型。算法工程师利用Client的API,构建一个计算图。
- 提供Session接口执行计算图。
Distributed Master
- 将计算图切分成更小的子计算图。
- 将子计算图进一步切分成更小的计算片段,使之能够并行运行在不同的进程乃至不同的设备上。
- 将计算片段分发给不同的Worker。
- 触发Worker执行分配到的计算任务。
Worker Services
- 调用TensorFlow内核,根据可用的硬件情况执行计算片段。
- 和其他Worker进行交互,发送和接收计算结果。
Kernel Implementations
- 提供细粒度,独立的计算功能(operation),例如加法,减法,字符串切割。
移动端的TensorFlow
在端侧直接执行模型有节省带宽,响应及时,不受网络好坏通断影响更加稳定,无需数据传输更加安全等优点。因此端侧执行模型是有需求的。在移动设备或者其他嵌入式设备上执行TensorFlow,其关注点和云端就有所不同。需要着重注意更低的功耗,更快的速度,更小的size。当前针对移动设备,有TensorFlow Mobile和TensorFlow Lite两种解决方案。TensorFlow Mobile比较早出来,比较稳定,但性能等方面没有针对移动端作过多优化,目前已不推荐使用,预计到2019年初就会被废弃。
根据官网的介绍,TensorFlow Mobile和TensorFlow Lite的主要区别是:
- TensorFlow Lite是TensorFlow Mobile的进化版。在大多数情况下,TensorFlow Lite拥有跟小的二进制大小,更少的依赖以及更好的性能。
- TensorFlow Lite尚在开发阶段,可能存在一些功能尚未补齐。不过官方承诺正在加大力度开发。
- TensorFlow Lite支持的OP比较有限,相比之下TensorFlow Mobile更加全面。
从源码看区别
以上是官网的介绍,然而看这介绍依然比较模糊。TensorFlow Mobile到底精简了啥,它支持哪些OP?TensorFlow Lite在实现上到底有何区别?为搞清这些问题,只有分析源码了。
TensorFlow 代码目录介绍
Tensorflow/core目录包含了TF核心模块代码。
public: API接口头文件目录,用于外部接口调用的API定义,主要是session.h 和tensor_c_api.h。
client: API接口实现文件目录。
platform: OS系统相关接口文件,如file system, env等。
protobuf: 均为.proto文件,用于数据传输时的结构序列化.
common_runtime: 公共运行库,包含session, executor, threadpool, rendezvous, memory管理, 设备分配算法等。
distributed_runtime: 分布式执行模块,如rpc session, rpc master, rpc worker, graph manager。
framework: 包含基础功能模块,如log, memory, tensor
graph: 计算流图相关操作,如construct, partition, optimize, execute等
kernels: 核心Op,如matmul, conv2d, argmax, batch_norm等
lib: 公共基础库,如gif、gtl(google模板库)、hash、histogram等。
ops: 基本ops运算,ops梯度运算,io相关的ops,控制流和数据流操作
Tensorflow/stream_executor目录是并行计算框架,由google stream executor团队开发。
Tensorflow/contrib目录是contributor开发目录,其中android目录下是android版本的TensorFlow mobile。lite目录下正是TensorFlow lite的源码。
Tensroflow/python目录是python API客户端脚本。
Tensorflow/tensorboard目录是可视化分析工具,不仅可以模型可视化,还可以监控模型参数变化。
third_party目录是TF第三方依赖库。
eigen3: eigen矩阵运算库,TF基础ops调用
gpus: 封装了cuda/cudnn编程库
TensorFlow Mobile精简了啥?
TensorFlow采用bazel进行编译,因此我们可以通过查看编译文件来分析区别。
TensorFlow默认的编译配置
===== /tensorflow/BUILD =====
tf_cc_shared_object(
name = "libtensorflow.so",
linkopts = select({
"//tensorflow:darwin": [
"-Wl,-exported_symbols_list", # This line must be directly followed by the exported_symbols.lds file
"$(location //tensorflow/c:exported_symbols.lds)",
"-Wl,-install_name,@rpath/libtensorflow.so",
],
"//tensorflow:windows": [],
"//conditions:default": [
"-z defs",
"-Wl,--version-script", # This line must be directly followed by the version_script.lds file
"$(location //tensorflow/c:version_script.lds)",
],
}),
visibility = ["//visibility:public"],
deps = [
"//tensorflow/c:c_api",
"//tensorflow/c:c_api_experimental",
"//tensorflow/c:exported_symbols.lds",
"//tensorflow/c:version_script.lds",
"//tensorflow/c/eager:c_api",
"//tensorflow/core:tensorflow",
],
)
===== /tensorflow/c/BUILD =====
tf_cuda_library(
name = "c_api",
srcs = [
"c_api.cc",
"c_api_function.cc",
],
hdrs = [
"c_api.h",
],
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = select({
"//tensorflow:android": [
":c_api_internal",
"//tensorflow/core:android_tensorflow_lib_lite",
],
"//conditions:default": [
":c_api_internal",
"//tensorflow/cc/saved_model:loader",
"//tensorflow/cc:gradients",
"//tensorflow/cc:ops",
"//tensorflow/cc:grad_ops",
"//tensorflow/cc:scope_internal",
"//tensorflow/cc:while_loop",
"//tensorflow/core:core_cpu",
"//tensorflow/core:core_cpu_internal",
"//tensorflow/core:framework",
"//tensorflow/core:op_gen_lib",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
],
}) + select({
"//tensorflow:with_xla_support": [
"//tensorflow/compiler/tf2xla:xla_compiler",
"//tensorflow/compiler/jit",
],
"//conditions:default": [],
}),
)
tf_cuda_library(
name = "c_api_experimental",
srcs = [
"c_api_experimental.cc",
],
hdrs = [
"c_api_experimental.h",
],
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = [
":c_api",
":c_api_internal",
"//tensorflow/c/eager:c_api",
"//tensorflow/compiler/jit/legacy_flags:mark_for_compilation_pass_flags",
"//tensorflow/contrib/tpu:all_ops",
"//tensorflow/core:core_cpu",
"//tensorflow/core:framework",
"//tensorflow/core:lib",
"//tensorflow/core:lib_platform",
"//tensorflow/core:protos_all_cc",
],
)
===== /tensorflow/c/eager/BUILD =====
tf_cuda_library(
name = "c_api",
srcs = [
"c_api.cc",
"c_api_debug.cc",
"c_api_internal.h",
],
hdrs = ["c_api.h"],
copts = tf_copts() + tfe_xla_copts(),
visibility = ["//visibility:public"],
deps = select({
"//tensorflow:android": [
"//tensorflow/core:android_tensorflow_lib_lite",
],
"//conditions:default": [
"//tensorflow/c:c_api",
"//tensorflow/c:c_api_internal",
"//tensorflow/core:core_cpu",
"//tensorflow/core/common_runtime/eager:attr_builder",
"//tensorflow/core/common_runtime/eager:context",
"//tensorflow/core/common_runtime/eager:eager_executor",
"//tensorflow/core/common_runtime/eager:execute",
"//tensorflow/core/common_runtime/eager:kernel_and_device",
"//tensorflow/core/common_runtime/eager:tensor_handle",
"//tensorflow/core/common_runtime/eager:copy_to_device_node",
"//tensorflow/core:core_cpu_internal",
"//tensorflow/core:framework",
"//tensorflow/core:framework_internal",
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
"//tensorflow/core:protos_all_cc",
],
}) + select({
"//tensorflow:with_xla_support": [
"//tensorflow/compiler/tf2xla:xla_compiler",
"//tensorflow/compiler/jit",
"//tensorflow/compiler/jit:xla_device",
],
"//conditions:default": [],
}) + [
"//tensorflow/core/common_runtime/eager:eager_operation",
"//tensorflow/core/distributed_runtime/eager:eager_client",
"//tensorflow/core/distributed_runtime/rpc/eager:grpc_eager_client",
"//tensorflow/core/distributed_runtime/rpc:grpc_channel",
"//tensorflow/core/distributed_runtime/rpc:grpc_server_lib",
"//tensorflow/core/distributed_runtime/rpc:grpc_worker_cache",
"//tensorflow/core/distributed_runtime/rpc:grpc_worker_service",
"//tensorflow/core/distributed_runtime/rpc:rpc_rendezvous_mgr",
"//tensorflow/core/distributed_runtime:remote_device",
"//tensorflow/core/distributed_runtime:server_lib",
"//tensorflow/core/distributed_runtime:worker_env",
"//tensorflow/core:gpu_runtime",
],
)
===== /tensorflow/core/BUILD =====
cc_library(
name = "tensorflow",
visibility = ["//visibility:public"],
deps = [
":tensorflow_opensource",
"//tensorflow/core/platform/default/build_config:tensorflow_platform_specific",
],
)
tf_cuda_library(
name = "tensorflow_opensource",
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = [
":all_kernels",
":core",
":direct_session",
":example_parser_configuration",
":gpu_runtime",
":lib",
],
)
cc_library(
name = "all_kernels",
visibility = ["//visibility:public"],
deps = if_dynamic_kernels(
[],
otherwise = [":all_kernels_statically_linked"],
),
)
# This is a link-only library to provide a DirectSession
# implementation of the Session interface.
tf_cuda_library(
name = "direct_session",
copts = tf_copts(),
linkstatic = 1,
visibility = ["//visibility:public"],
deps = [
":direct_session_internal",
],
alwayslink = 1,
)
filegroup(
name = "example_parser_configuration_testdata",
srcs = [
"example/testdata/parse_example_graph_def.pbtxt",
],
)
cc_library(
name = "core",
visibility = ["//visibility:public"],
deps = [
":core_cpu",
":gpu_runtime",
":sycl_runtime",
],
)
cc_library(
name = "lib",
hdrs = [
"lib/bfloat16/bfloat16.h",
"lib/core/arena.h",
"lib/core/bitmap.h",
"lib/core/bits.h",
"lib/core/casts.h",
"lib/core/coding.h",
"lib/core/errors.h",
"lib/core/notification.h",
"lib/core/raw_coding.h",
"lib/core/status.h",
"lib/core/stringpiece.h",
"lib/core/threadpool.h",
"lib/gtl/array_slice.h",
"lib/gtl/cleanup.h",
"lib/gtl/compactptrset.h",
"lib/gtl/flatmap.h",
"lib/gtl/flatset.h",
"lib/gtl/inlined_vector.h",
"lib/gtl/optional.h",
"lib/gtl/priority_queue_util.h",
"lib/hash/crc32c.h",
"lib/hash/hash.h",
"lib/histogram/histogram.h",
"lib/io/buffered_inputstream.h",
"lib/io/compression.h",
"lib/io/inputstream_interface.h",
"lib/io/path.h",
"lib/io/proto_encode_helper.h",
"lib/io/random_inputstream.h",
"lib/io/record_reader.h",
"lib/io/record_writer.h",
"lib/io/table.h",
"lib/io/table_builder.h",
"lib/io/table_options.h",
"lib/math/math_util.h",
"lib/monitoring/collected_metrics.h",
"lib/monitoring/collection_registry.h",
"lib/monitoring/counter.h",
"lib/monitoring/gauge.h",
"lib/monitoring/metric_def.h",
"lib/monitoring/sampler.h",
"lib/random/distribution_sampler.h",
"lib/random/philox_random.h",
"lib/random/random_distributions.h",
"lib/random/simple_philox.h",
"lib/strings/numbers.h",
"lib/strings/proto_serialization.h",
"lib/strings/str_util.h",
"lib/strings/strcat.h",
"lib/strings/stringprintf.h",
":platform_base_hdrs",
":platform_env_hdrs",
":platform_file_system_hdrs",
":platform_other_hdrs",
":platform_port_hdrs",
":platform_protobuf_hdrs",
],
visibility = ["//visibility:public"],
deps = [
":lib_internal",
"@com_google_absl//absl/container:inlined_vector",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/types:optional",
],
)
# This includes implementations of all kernels built into TensorFlow.
cc_library(
name = "all_kernels_statically_linked",
visibility = ["//visibility:private"],
deps = [
"//tensorflow/core/kernels:array",
"//tensorflow/core/kernels:audio",
"//tensorflow/core/kernels:batch_kernels",
"//tensorflow/core/kernels:bincount_op",
"//tensorflow/core/kernels:boosted_trees_ops",
"//tensorflow/core/kernels:candidate_sampler_ops",
"//tensorflow/core/kernels:checkpoint_ops",
"//tensorflow/core/kernels:collective_ops",
"//tensorflow/core/kernels:control_flow_ops",
"//tensorflow/core/kernels:ctc_ops",
"//tensorflow/core/kernels:cudnn_rnn_kernels",
"//tensorflow/core/kernels:data_flow",
"//tensorflow/core/kernels:dataset_ops",
"//tensorflow/core/kernels:decode_proto_op",
"//tensorflow/core/kernels:encode_proto_op",
"//tensorflow/core/kernels:fake_quant_ops",
"//tensorflow/core/kernels:function_ops",
"//tensorflow/core/kernels:functional_ops",
"//tensorflow/core/kernels:grappler",
"//tensorflow/core/kernels:histogram_op",
"//tensorflow/core/kernels:image",
"//tensorflow/core/kernels:io",
"//tensorflow/core/kernels:linalg",
"//tensorflow/core/kernels:list_kernels",
"//tensorflow/core/kernels:lookup",
"//tensorflow/core/kernels:logging",
"//tensorflow/core/kernels:manip",
"//tensorflow/core/kernels:math",
"//tensorflow/core/kernels:multinomial_op",
"//tensorflow/core/kernels:nn",
"//tensorflow/core/kernels:parameterized_truncated_normal_op",
"//tensorflow/core/kernels:parsing",
"//tensorflow/core/kernels:partitioned_function_ops",
"//tensorflow/core/kernels:random_ops",
"//tensorflow/core/kernels:random_poisson_op",
"//tensorflow/core/kernels:remote_fused_graph_ops",
"//tensorflow/core/kernels:required",
"//tensorflow/core/kernels:resource_variable_ops",
"//tensorflow/core/kernels:rpc_op",
"//tensorflow/core/kernels:scoped_allocator_ops",
"//tensorflow/core/kernels:sdca_ops",
"//tensorflow/core/kernels:searchsorted_op",
"//tensorflow/core/kernels:set_kernels",
"//tensorflow/core/kernels:sparse",
"//tensorflow/core/kernels:state",
"//tensorflow/core/kernels:stateless_random_ops",
"//tensorflow/core/kernels:string",
"//tensorflow/core/kernels:summary_kernels",
"//tensorflow/core/kernels:training_ops",
"//tensorflow/core/kernels:word2vec_kernels",
] + tf_additional_cloud_kernel_deps() + if_not_windows([
"//tensorflow/core/kernels:fact_op",
"//tensorflow/core/kernels:array_not_windows",
"//tensorflow/core/kernels:math_not_windows",
"//tensorflow/core/kernels:quantized_ops",
"//tensorflow/core/kernels/neon:neon_depthwise_conv_op",
]) + if_mkl([
"//tensorflow/core/kernels:mkl_concat_op",
"//tensorflow/core/kernels:mkl_conv_op",
"//tensorflow/core/kernels:mkl_cwise_ops_common",
"//tensorflow/core/kernels:mkl_fused_batch_norm_op",
"//tensorflow/core/kernels:mkl_identity_op",
"//tensorflow/core/kernels:mkl_input_conversion_op",
"//tensorflow/core/kernels:mkl_lrn_op",
"//tensorflow/core/kernels:mkl_pooling_ops",
"//tensorflow/core/kernels:mkl_relu_op",
"//tensorflow/core/kernels:mkl_reshape_op",
"//tensorflow/core/kernels:mkl_slice_op",
"//tensorflow/core/kernels:mkl_softmax_op",
"//tensorflow/core/kernels:mkl_transpose_op",
"//tensorflow/core/kernels:mkl_tfconv_op",
"//tensorflow/core/kernels:mkl_aggregate_ops",
]) + if_cuda([
"//tensorflow/core/grappler/optimizers:gpu_swapping_kernels",
"//tensorflow/core/grappler/optimizers:gpu_swapping_ops",
]),
)
TensorFlow Mobile的编译配置
===== tensorflow/contrib/android/BUILD =====
cc_binary(
name = "libtensorflow_inference.so",
srcs = [],
copts = tf_copts() + [
"-ffunction-sections",
"-fdata-sections",
],
linkopts = if_android([
"-landroid",
"-latomic",
"-ldl",
"-llog",
"-lm",
"-z defs",
"-s",
"-Wl,--gc-sections",
"-Wl,--version-script", # This line must be directly followed by LINKER_SCRIPT.
"$(location {})".format(LINKER_SCRIPT),
]),
linkshared = 1,
linkstatic = 1,
tags = [
"manual",
"notap",
],
deps = [
":android_tensorflow_inference_jni",
"//tensorflow/core:android_tensorflow_lib",
LINKER_SCRIPT,
],
)
cc_library(
name = "android_tensorflow_inference_jni",
srcs = if_android([":android_tensorflow_inference_jni_srcs"]),
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = [
"//tensorflow/core:android_tensorflow_lib_lite",
"//tensorflow/java/src/main/native",
],
alwayslink = 1,
)
===== tensorflow/core/BUILD =====
cc_library(
name = "android_tensorflow_lib",
srcs = if_android([":android_op_registrations_and_gradients"]),
copts = tf_copts(),
tags = [
"manual",
"notap",
],
visibility = ["//visibility:public"],
deps = [
":android_tensorflow_lib_lite",
":protos_all_cc_impl",
"//tensorflow/core/kernels:android_tensorflow_kernels",
"//third_party/eigen3",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
)
cc_library(
name = "android_tensorflow_lib_lite",
srcs = if_android(["//tensorflow/core:android_srcs"]),
copts = tf_copts(android_optimization_level_override = None),
linkopts = ["-lz"],
tags = [
"manual",
"notap",
],
visibility = ["//visibility:public"],
deps = [
":mobile_additional_lib_deps",
":protos_all_cc_impl",
":stats_calculator_portable",
"//third_party/eigen3",
"@double_conversion//:double-conversion",
"@nsync//:nsync_cpp",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
)
alias(
name = "android_srcs",
actual = ":mobile_srcs",
visibility = ["//visibility:public"],
)
filegroup(
name = "mobile_srcs",
srcs = [
":mobile_srcs_no_runtime",
":mobile_srcs_only_runtime",
],
visibility = ["//visibility:public"],
)
# Core sources for Android builds.
filegroup(
name = "mobile_srcs_no_runtime",
srcs = [
":protos_all_proto_text_srcs",
":error_codes_proto_text_srcs",
"//tensorflow/core/platform/default/build_config:android_srcs",
] + glob(
[
"client/**/*.cc",
"framework/**/*.h",
"framework/**/*.cc",
"lib/**/*.h",
"lib/**/*.cc",
"platform/**/*.h",
"platform/**/*.cc",
"public/**/*.h",
"util/**/*.h",
"util/**/*.cc",
],
exclude = [
"**/*test.*",
"**/*testutil*",
"**/*testlib*",
"**/*main.cc",
"debug/**/*",
"framework/op_gen_*",
"lib/jpeg/**/*",
"lib/png/**/*",
"lib/gif/**/*",
"util/events_writer.*",
"util/stats_calculator.*",
"util/reporter.*",
"platform/**/cuda_libdevice_path.*",
"platform/default/test_benchmark.*",
"platform/cuda.h",
"platform/google/**/*",
"platform/hadoop/**/*",
"platform/gif.h",
"platform/jpeg.h",
"platform/png.h",
"platform/stream_executor.*",
"platform/windows/**/*",
"user_ops/**/*.cu.cc",
"util/ctc/*.h",
"util/ctc/*.cc",
"util/tensor_bundle/*.h",
"util/tensor_bundle/*.cc",
"common_runtime/gpu/**/*",
"common_runtime/eager/*",
"common_runtime/gpu_device_factory.*",
],
),
visibility = ["//visibility:public"],
)
filegroup(
name = "mobile_srcs_only_runtime",
srcs = [
"//tensorflow/core/kernels:android_srcs",
"//tensorflow/core/util/ctc:android_srcs",
"//tensorflow/core/util/tensor_bundle:android_srcs",
] + glob(
[
"common_runtime/**/*.h",
"common_runtime/**/*.cc",
"graph/**/*.h",
"graph/**/*.cc",
],
exclude = [
"**/*test.*",
"**/*testutil*",
"**/*testlib*",
"**/*main.cc",
"common_runtime/gpu/**/*",
"common_runtime/eager/*",
"common_runtime/gpu_device_factory.*",
"graph/dot.*",
],
),
visibility = ["//visibility:public"],
)
cc_library(
name = "stats_calculator_portable",
srcs = [
"util/stat_summarizer_options.h",
"util/stats_calculator.cc",
],
hdrs = [
"util/stats_calculator.h",
],
copts = tf_copts(),
)
cc_library(
name = "mobile_additional_lib_deps",
deps = tf_additional_lib_deps() + [
"@com_google_absl//absl/strings",
],
)
===== tensorflow/core/kernels/BUILD =====
cc_library(
name = "android_tensorflow_kernels",
srcs = select({
"//tensorflow:android": [
"//tensorflow/core/kernels:android_core_ops",
"//tensorflow/core/kernels:android_extended_ops",
],
"//conditions:default": [],
}),
copts = tf_copts(),
linkopts = select({
"//tensorflow:android": [
"-ldl",
],
"//conditions:default": [],
}),
tags = [
"manual",
"notap",
],
visibility = ["//visibility:public"],
deps = [
"//tensorflow/core:android_tensorflow_lib_lite",
"//tensorflow/core:protos_all_cc_impl",
"//third_party/eigen3",
"//third_party/fft2d:fft2d_headers",
"@fft2d",
"@gemmlowp",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
)
# Core kernels we want on Android. Only a subset of kernels to keep
# base library small.
filegroup(
name = "android_core_ops",
srcs = [
"aggregate_ops.cc",
"aggregate_ops.h",
"aggregate_ops_cpu.h",
"assign_op.h",
"bias_op.cc",
"bias_op.h",
"bounds_check.h",
"cast_op.cc",
"cast_op.h",
"cast_op_impl.h",
"cast_op_impl_bfloat.cc",
"cast_op_impl_bool.cc",
"cast_op_impl_complex128.cc",
"cast_op_impl_complex64.cc",
"cast_op_impl_double.cc",
"cast_op_impl_float.cc",
"cast_op_impl_half.cc",
"cast_op_impl_int16.cc",
"cast_op_impl_int32.cc",
"cast_op_impl_int64.cc",
"cast_op_impl_int8.cc",
"cast_op_impl_uint16.cc",
"cast_op_impl_uint32.cc",
"cast_op_impl_uint64.cc",
"cast_op_impl_uint8.cc",
"concat_lib.h",
"concat_lib_cpu.cc",
"concat_lib_cpu.h",
"concat_op.cc",
"constant_op.cc",
"constant_op.h",
"cwise_ops.h",
"cwise_ops_common.cc",
"cwise_ops_common.h",
"cwise_ops_gradients.h",
"dense_update_functor.cc",
"dense_update_functor.h",
"dense_update_ops.cc",
"example_parsing_ops.cc",
"fill_functor.cc",
"fill_functor.h",
"function_ops.cc",
"function_ops.h",
"gather_functor.h",
"gather_nd_op.cc",
"gather_nd_op.h",
"gather_nd_op_cpu_impl.h",
"gather_nd_op_cpu_impl_0.cc",
"gather_nd_op_cpu_impl_1.cc",
"gather_nd_op_cpu_impl_2.cc",
"gather_nd_op_cpu_impl_3.cc",
"gather_nd_op_cpu_impl_4.cc",
"gather_nd_op_cpu_impl_5.cc",
"gather_nd_op_cpu_impl_6.cc",
"gather_nd_op_cpu_impl_7.cc",
"gather_op.cc",
"identity_n_op.cc",
"identity_n_op.h",
"identity_op.cc",
"identity_op.h",
"immutable_constant_op.cc",
"immutable_constant_op.h",
"matmul_op.cc",
"matmul_op.h",
"no_op.cc",
"no_op.h",
"non_max_suppression_op.cc",
"non_max_suppression_op.h",
"one_hot_op.cc",
"one_hot_op.h",
"ops_util.h",
"pack_op.cc",
"pooling_ops_common.h",
"reshape_op.cc",
"reshape_op.h",
"reverse_sequence_op.cc",
"reverse_sequence_op.h",
"sendrecv_ops.cc",
"sendrecv_ops.h",
"sequence_ops.cc",
"shape_ops.cc",
"shape_ops.h",
"slice_op.cc",
"slice_op.h",
"slice_op_cpu_impl.h",
"slice_op_cpu_impl_1.cc",
"slice_op_cpu_impl_2.cc",
"slice_op_cpu_impl_3.cc",
"slice_op_cpu_impl_4.cc",
"slice_op_cpu_impl_5.cc",
"slice_op_cpu_impl_6.cc",
"slice_op_cpu_impl_7.cc",
"softmax_op.cc",
"softmax_op_functor.h",
"split_lib.h",
"split_lib_cpu.cc",
"split_op.cc",
"split_v_op.cc",
"strided_slice_op.cc",
"strided_slice_op.h",
"strided_slice_op_impl.h",
"strided_slice_op_inst_0.cc",
"strided_slice_op_inst_1.cc",
"strided_slice_op_inst_2.cc",
"strided_slice_op_inst_3.cc",
"strided_slice_op_inst_4.cc",
"strided_slice_op_inst_5.cc",
"strided_slice_op_inst_6.cc",
"strided_slice_op_inst_7.cc",
"unpack_op.cc",
"variable_ops.cc",
"variable_ops.h",
],
)
# Other kernels we may want on Android.
#
# The kernels can be consumed as a whole or in two groups for
# supporting separate compilation. Note that the split into groups
# is entirely for improving compilation time, and not for
# organizational reasons; you should not depend on any
# of those groups independently.
filegroup(
name = "android_extended_ops",
srcs = [
":android_extended_ops_group1",
":android_extended_ops_group2",
":android_quantized_ops",
],
visibility = ["//visibility:public"],
)
filegroup(
name = "android_extended_ops_headers",
srcs = [
"argmax_op.h",
"avgpooling_op.h",
"batch_matmul_op_impl.h",
"batch_norm_op.h",
"control_flow_ops.h",
"conv_2d.h",
"conv_ops.h",
"data_format_ops.h",
"depthtospace_op.h",
"depthwise_conv_op.h",
"fake_quant_ops_functor.h",
"fused_batch_norm_op.h",
"gemm_functors.h",
"image_resizer_state.h",
"initializable_lookup_table.h",
"lookup_table_init_op.h",
"lookup_table_op.h",
"lookup_util.h",
"maxpooling_op.h",
"mfcc.h",
"mfcc_dct.h",
"mfcc_mel_filterbank.h",
"mirror_pad_op.h",
"mirror_pad_op_cpu_impl.h",
"pad_op.h",
"random_op.h",
"reduction_ops.h",
"reduction_ops_common.h",
"relu_op.h",
"relu_op_functor.h",
"reshape_util.h",
"resize_bilinear_op.h",
"resize_nearest_neighbor_op.h",
"reverse_op.h",
"save_restore_tensor.h",
"segment_reduction_ops.h",
"softplus_op.h",
"softsign_op.h",
"spacetobatch_functor.h",
"spacetodepth_op.h",
"spectrogram.h",
"string_util.h",
"tensor_array.h",
"tile_functor.h",
"tile_ops_cpu_impl.h",
"tile_ops_impl.h",
"topk_op.h",
"training_op_helpers.h",
"training_ops.h",
"transpose_functor.h",
"transpose_op.h",
"where_op.h",
"xent_op.h",
],
)
filegroup(
name = "android_extended_ops_group1",
srcs = [
"argmax_op.cc",
"avgpooling_op.cc",
"batch_matmul_op_real.cc",
"batch_norm_op.cc",
"bcast_ops.cc",
"check_numerics_op.cc",
"control_flow_ops.cc",
"conv_2d.h",
"conv_grad_filter_ops.cc",
"conv_grad_input_ops.cc",
"conv_grad_ops.cc",
"conv_grad_ops.h",
"conv_ops.cc",
"conv_ops_fused.cc",
"conv_ops_using_gemm.cc",
"crop_and_resize_op.cc",
"crop_and_resize_op.h",
"cwise_op_abs.cc",
"cwise_op_add_1.cc",
"cwise_op_add_2.cc",
"cwise_op_bitwise_and.cc",
"cwise_op_bitwise_or.cc",
"cwise_op_bitwise_xor.cc",
"cwise_op_div.cc",
"cwise_op_equal_to_1.cc",
"cwise_op_equal_to_2.cc",
"cwise_op_not_equal_to_1.cc",
"cwise_op_not_equal_to_2.cc",
"cwise_op_exp.cc",
"cwise_op_floor.cc",
"cwise_op_floor_div.cc",
"cwise_op_floor_mod.cc",
"cwise_op_greater.cc",
"cwise_op_greater_equal.cc",
"cwise_op_invert.cc",
"cwise_op_isfinite.cc",
"cwise_op_isnan.cc",
"cwise_op_left_shift.cc",
"cwise_op_less.cc",
"cwise_op_less_equal.cc",
"cwise_op_log.cc",
"cwise_op_logical_and.cc",
"cwise_op_logical_not.cc",
"cwise_op_logical_or.cc",
"cwise_op_maximum.cc",
"cwise_op_minimum.cc",
"cwise_op_mul_1.cc",
"cwise_op_mul_2.cc",
"cwise_op_neg.cc",
"cwise_op_pow.cc",
"cwise_op_reciprocal.cc",
"cwise_op_right_shift.cc",
"cwise_op_round.cc",
"cwise_op_rsqrt.cc",
"cwise_op_select.cc",
"cwise_op_sigmoid.cc",
"cwise_op_sign.cc",
"cwise_op_sqrt.cc",
"cwise_op_square.cc",
"cwise_op_squared_difference.cc",
"cwise_op_sub.cc",
"cwise_op_tanh.cc",
"cwise_op_xlogy.cc",
"cwise_op_xdivy.cc",
"data_format_ops.cc",
"decode_wav_op.cc",
"deep_conv2d.cc",
"deep_conv2d.h",
"depthwise_conv_op.cc",
"dynamic_partition_op.cc",
"encode_wav_op.cc",
"fake_quant_ops.cc",
"fifo_queue.cc",
"fifo_queue_op.cc",
"fused_batch_norm_op.cc",
"listdiff_op.cc",
"population_count_op.cc",
"population_count_op.h",
"winograd_transform.h",
":android_extended_ops_headers",
] + select({
":xsmm_convolutions": [
"xsmm_conv2d.h",
"xsmm_conv2d.cc",
],
"//conditions:default": [],
}),
)
filegroup(
name = "android_extended_ops_group2",
srcs = [
"batchtospace_op.cc",
"ctc_decoder_ops.cc",
"decode_bmp_op.cc",
"depthtospace_op.cc",
"dynamic_stitch_op.cc",
"in_topk_op.cc",
"initializable_lookup_table.cc",
"logging_ops.cc",
"lookup_table_init_op.cc",
"lookup_table_op.cc",
"lookup_util.cc",
"lrn_op.cc",
"maxpooling_op.cc",
"mfcc.cc",
"mfcc_dct.cc",
"mfcc_mel_filterbank.cc",
"mfcc_op.cc",
"mirror_pad_op.cc",
"mirror_pad_op_cpu_impl_1.cc",
"mirror_pad_op_cpu_impl_2.cc",
"mirror_pad_op_cpu_impl_3.cc",
"mirror_pad_op_cpu_impl_4.cc",
"mirror_pad_op_cpu_impl_5.cc",
"pad_op.cc",
"padding_fifo_queue.cc",
"padding_fifo_queue_op.cc",
"queue_base.cc",
"queue_op.cc",
"queue_ops.cc",
"random_op.cc",
"reduction_ops_all.cc",
"reduction_ops_any.cc",
"reduction_ops_common.cc",
"reduction_ops_max.cc",
"reduction_ops_mean.cc",
"reduction_ops_min.cc",
"reduction_ops_prod.cc",
"reduction_ops_sum.cc",
"relu_op.cc",
"reshape_util.cc",
"resize_bilinear_op.cc",
"resize_nearest_neighbor_op.cc",
"restore_op.cc",
"reverse_op.cc",
"save_op.cc",
"save_restore_tensor.cc",
"save_restore_v2_ops.cc",
"segment_reduction_ops.cc",
"session_ops.cc",
"softplus_op.cc",
"softsign_op.cc",
"spacetobatch_functor.cc",
"spacetobatch_op.cc",
"spacetodepth_op.cc",
"sparse_fill_empty_rows_op.cc",
"sparse_reshape_op.cc",
"sparse_to_dense_op.cc",
"spectrogram.cc",
"spectrogram_op.cc",
"stack_ops.cc",
"string_join_op.cc",
"string_util.cc",
"summary_op.cc",
"tensor_array.cc",
"tensor_array_ops.cc",
"tile_functor_cpu.cc",
"tile_ops.cc",
"tile_ops_cpu_impl_1.cc",
"tile_ops_cpu_impl_2.cc",
"tile_ops_cpu_impl_3.cc",
"tile_ops_cpu_impl_4.cc",
"tile_ops_cpu_impl_5.cc",
"tile_ops_cpu_impl_6.cc",
"tile_ops_cpu_impl_7.cc",
"topk_op.cc",
"training_op_helpers.cc",
"training_ops.cc",
"transpose_functor_cpu.cc",
"transpose_op.cc",
"unique_op.cc",
"where_op.cc",
"xent_op.cc",
":android_extended_ops_headers",
],
)
TensorFlow Mobile通过编译选项,在完整的TensorFlow基础上进行裁剪,在保留TensorFlow核心功能的同时去掉不必要的代码。例如分布式执行的逻辑,windows平台的兼容逻辑,利用gpu计算的逻辑等等。
TensorFlow Mobile的OP支持完整吗?
TensorFlow Mobile并不包含所有的OP,只有一些核心必要的op,详见上面android_core_ops和android_extended_ops。
TensorFlow Lite在实现上又有啥区别
TensorFlow Lite的源码在tensorflow/contrib/lite目录下。其核心编译逻辑如下
### tensorflow/contrib/lite/BUILD
cc_library(
name = "framework",
srcs = [
"allocation.cc",
"graph_info.cc",
"interpreter.cc",
"model.cc",
"mutable_op_resolver.cc",
"optional_debug_tools.cc",
"stderr_reporter.cc",
] + select({
"//tensorflow:android": [
"nnapi_delegate.cc",
"mmap_allocation.cc",
],
"//tensorflow:windows": [
"nnapi_delegate_disabled.cc",
"mmap_allocation_disabled.cc",
],
"//conditions:default": [
"nnapi_delegate_disabled.cc",
"mmap_allocation.cc",
],
}),
hdrs = [
"allocation.h",
"context.h",
"context_util.h",
"error_reporter.h",
"graph_info.h",
"interpreter.h",
"model.h",
"mutable_op_resolver.h",
"nnapi_delegate.h",
"op_resolver.h",
"optional_debug_tools.h",
"stderr_reporter.h",
],
copts = tflite_copts(),
linkopts = [
] + select({
"//tensorflow:android": [
"-llog",
],
"//conditions:default": [
],
}),
deps = [
":arena_planner",
":graph_info",
":memory_planner",
":schema_fbs_version",
":simple_memory_arena",
":string",
":util",
"//tensorflow/contrib/lite/c:c_api_internal",
"//tensorflow/contrib/lite/core/api",
"//tensorflow/contrib/lite/kernels:eigen_support",
"//tensorflow/contrib/lite/kernels:gemm_support",
"//tensorflow/contrib/lite/nnapi:nnapi_lib",
"//tensorflow/contrib/lite/profiling:profiler",
"//tensorflow/contrib/lite/schema:schema_fbs",
],
)
相比TensorFlow Mobile是对完整TensorFlow的裁减,TensorFlow Lite基本就是重新实现了。从内部实现来说,在TensorFlow内核最基本的OP,Context等数据结构,都是新的。从外在表现来说,模型文件从PB格式改成了FlatBuffers格式,TensorFlow的size有大幅度优化,降至300K,然后提供一个converter将普通TensorFlow模型转化成TensorFlow Lite需要的格式。因此,无论从哪方面看,TensorFlow Lite都是一个新的实现方案。
参考资料
TensorFlow Architecture
TensorFlow Mobile VS TensorFlow Lite
TensorFlow代码解析
TensorFlow Lite