TensorFlow VS TensorFlow Mobile VS TensorFlow Lite

简介: # TensorFlow的简介 TensorFlow是一个机器学习框架,其整体架构设计主要分成Client,Master和Worker。解耦的架构使得它具有高度灵活性,使它可以方便地在机器集群上部署。 ### TensorFlow的代码架构 TensorFlow整体架构如下(图片来自[官网](https://www.tensorflow.org/extend/architecture))

TensorFlow的简介

TensorFlow是一个机器学习框架,其整体架构设计主要分成Client,Master和Worker。解耦的架构使得它具有高度灵活性,使它可以方便地在机器集群上部署。

TensorFlow的代码架构

TensorFlow整体架构如下(图片来自官网)。
image.png

Client

Client是算法工程师直接接触使用的。有Python,C++,Java等不同的版本。它的主要作用是:

  • 将计算过程定义成计算图。机器学习主要存在命令式和声明式两种不同的编程模型。命令式编程模型就是我们一般的编程方式。声明式模型类似于RxJava那样,先构建一个数据通道,等事件触发时,才会真正有数据喂入,并执行。TensorFlow就是声明式的编程模型。算法工程师利用Client的API,构建一个计算图。
  • 提供Session接口执行计算图。
Distributed Master
  • 将计算图切分成更小的子计算图。
  • 将子计算图进一步切分成更小的计算片段,使之能够并行运行在不同的进程乃至不同的设备上。
  • 将计算片段分发给不同的Worker。
  • 触发Worker执行分配到的计算任务。
Worker Services
  • 调用TensorFlow内核,根据可用的硬件情况执行计算片段。
  • 和其他Worker进行交互,发送和接收计算结果。
Kernel Implementations
  • 提供细粒度,独立的计算功能(operation),例如加法,减法,字符串切割。

移动端的TensorFlow

在端侧直接执行模型有节省带宽,响应及时,不受网络好坏通断影响更加稳定,无需数据传输更加安全等优点。因此端侧执行模型是有需求的。在移动设备或者其他嵌入式设备上执行TensorFlow,其关注点和云端就有所不同。需要着重注意更低的功耗,更快的速度,更小的size。当前针对移动设备,有TensorFlow Mobile和TensorFlow Lite两种解决方案。TensorFlow Mobile比较早出来,比较稳定,但性能等方面没有针对移动端作过多优化,目前已不推荐使用,预计到2019年初就会被废弃。
根据官网的介绍,TensorFlow Mobile和TensorFlow Lite的主要区别是:

  • TensorFlow Lite是TensorFlow Mobile的进化版。在大多数情况下,TensorFlow Lite拥有跟小的二进制大小,更少的依赖以及更好的性能。
  • TensorFlow Lite尚在开发阶段,可能存在一些功能尚未补齐。不过官方承诺正在加大力度开发。
  • TensorFlow Lite支持的OP比较有限,相比之下TensorFlow Mobile更加全面。

从源码看区别

以上是官网的介绍,然而看这介绍依然比较模糊。TensorFlow Mobile到底精简了啥,它支持哪些OP?TensorFlow Lite在实现上到底有何区别?为搞清这些问题,只有分析源码了。

TensorFlow 代码目录介绍

Tensorflow/core目录包含了TF核心模块代码。
public: API接口头文件目录,用于外部接口调用的API定义,主要是session.h 和tensor_c_api.h。
client: API接口实现文件目录。
platform: OS系统相关接口文件,如file system, env等。
protobuf: 均为.proto文件,用于数据传输时的结构序列化.
common_runtime: 公共运行库,包含session, executor, threadpool, rendezvous, memory管理, 设备分配算法等。
distributed_runtime: 分布式执行模块,如rpc session, rpc master, rpc worker, graph manager。
framework: 包含基础功能模块,如log, memory, tensor
graph: 计算流图相关操作,如construct, partition, optimize, execute等
kernels: 核心Op,如matmul, conv2d, argmax, batch_norm等
lib: 公共基础库,如gif、gtl(google模板库)、hash、histogram等。
ops: 基本ops运算,ops梯度运算,io相关的ops,控制流和数据流操作

Tensorflow/stream_executor目录是并行计算框架,由google stream executor团队开发。
Tensorflow/contrib目录是contributor开发目录,其中android目录下是android版本的TensorFlow mobile。lite目录下正是TensorFlow lite的源码。
Tensroflow/python目录是python API客户端脚本。
Tensorflow/tensorboard目录是可视化分析工具,不仅可以模型可视化,还可以监控模型参数变化。
third_party目录是TF第三方依赖库。
eigen3: eigen矩阵运算库,TF基础ops调用
gpus: 封装了cuda/cudnn编程库

TensorFlow Mobile精简了啥?

TensorFlow采用bazel进行编译,因此我们可以通过查看编译文件来分析区别。

TensorFlow默认的编译配置
===== /tensorflow/BUILD ===== 
tf_cc_shared_object(
    name = "libtensorflow.so",
    linkopts = select({
        "//tensorflow:darwin": [
            "-Wl,-exported_symbols_list",  # This line must be directly followed by the exported_symbols.lds file
            "$(location //tensorflow/c:exported_symbols.lds)",
            "-Wl,-install_name,@rpath/libtensorflow.so",
        ],
        "//tensorflow:windows": [],
        "//conditions:default": [
            "-z defs",
            "-Wl,--version-script",  #  This line must be directly followed by the version_script.lds file
            "$(location //tensorflow/c:version_script.lds)",
        ],
    }),
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow/c:c_api",
        "//tensorflow/c:c_api_experimental",
        "//tensorflow/c:exported_symbols.lds",
        "//tensorflow/c:version_script.lds",
        "//tensorflow/c/eager:c_api",
        "//tensorflow/core:tensorflow",
    ],
)

===== /tensorflow/c/BUILD ===== 
tf_cuda_library(
    name = "c_api",
    srcs = [
        "c_api.cc",
        "c_api_function.cc",
    ],
    hdrs = [
        "c_api.h",
    ],
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = select({
        "//tensorflow:android": [
            ":c_api_internal",
            "//tensorflow/core:android_tensorflow_lib_lite",
        ],
        "//conditions:default": [
            ":c_api_internal",
            "//tensorflow/cc/saved_model:loader",
            "//tensorflow/cc:gradients",
            "//tensorflow/cc:ops",
            "//tensorflow/cc:grad_ops",
            "//tensorflow/cc:scope_internal",
            "//tensorflow/cc:while_loop",
            "//tensorflow/core:core_cpu",
            "//tensorflow/core:core_cpu_internal",
            "//tensorflow/core:framework",
            "//tensorflow/core:op_gen_lib",
            "//tensorflow/core:protos_all_cc",
            "//tensorflow/core:lib",
            "//tensorflow/core:lib_internal",
        ],
    }) + select({
        "//tensorflow:with_xla_support": [
            "//tensorflow/compiler/tf2xla:xla_compiler",
            "//tensorflow/compiler/jit",
        ],
        "//conditions:default": [],
    }),
)


tf_cuda_library(
    name = "c_api_experimental",
    srcs = [
        "c_api_experimental.cc",
    ],
    hdrs = [
        "c_api_experimental.h",
    ],
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        ":c_api",
        ":c_api_internal",
        "//tensorflow/c/eager:c_api",
        "//tensorflow/compiler/jit/legacy_flags:mark_for_compilation_pass_flags",
        "//tensorflow/contrib/tpu:all_ops",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//tensorflow/core:lib_platform",
        "//tensorflow/core:protos_all_cc",
    ],
)


===== /tensorflow/c/eager/BUILD ===== 
tf_cuda_library(
    name = "c_api",
    srcs = [
        "c_api.cc",
        "c_api_debug.cc",
        "c_api_internal.h",
    ],
    hdrs = ["c_api.h"],
    copts = tf_copts() + tfe_xla_copts(),
    visibility = ["//visibility:public"],
    deps = select({
        "//tensorflow:android": [
            "//tensorflow/core:android_tensorflow_lib_lite",
        ],
        "//conditions:default": [
            "//tensorflow/c:c_api",
            "//tensorflow/c:c_api_internal",
            "//tensorflow/core:core_cpu",
            "//tensorflow/core/common_runtime/eager:attr_builder",
            "//tensorflow/core/common_runtime/eager:context",
            "//tensorflow/core/common_runtime/eager:eager_executor",
            "//tensorflow/core/common_runtime/eager:execute",
            "//tensorflow/core/common_runtime/eager:kernel_and_device",
            "//tensorflow/core/common_runtime/eager:tensor_handle",
            "//tensorflow/core/common_runtime/eager:copy_to_device_node",
            "//tensorflow/core:core_cpu_internal",
            "//tensorflow/core:framework",
            "//tensorflow/core:framework_internal",
            "//tensorflow/core:lib",
            "//tensorflow/core:lib_internal",
            "//tensorflow/core:protos_all_cc",
        ],
    }) + select({
        "//tensorflow:with_xla_support": [
            "//tensorflow/compiler/tf2xla:xla_compiler",
            "//tensorflow/compiler/jit",
            "//tensorflow/compiler/jit:xla_device",
        ],
        "//conditions:default": [],
    }) + [
        "//tensorflow/core/common_runtime/eager:eager_operation",
        "//tensorflow/core/distributed_runtime/eager:eager_client",
        "//tensorflow/core/distributed_runtime/rpc/eager:grpc_eager_client",
        "//tensorflow/core/distributed_runtime/rpc:grpc_channel",
        "//tensorflow/core/distributed_runtime/rpc:grpc_server_lib",
        "//tensorflow/core/distributed_runtime/rpc:grpc_worker_cache",
        "//tensorflow/core/distributed_runtime/rpc:grpc_worker_service",
        "//tensorflow/core/distributed_runtime/rpc:rpc_rendezvous_mgr",
        "//tensorflow/core/distributed_runtime:remote_device",
        "//tensorflow/core/distributed_runtime:server_lib",
        "//tensorflow/core/distributed_runtime:worker_env",
        "//tensorflow/core:gpu_runtime",
    ],
)

===== /tensorflow/core/BUILD ===== 
cc_library(
    name = "tensorflow",
    visibility = ["//visibility:public"],
    deps = [
        ":tensorflow_opensource",
        "//tensorflow/core/platform/default/build_config:tensorflow_platform_specific",
    ],
)


tf_cuda_library(
    name = "tensorflow_opensource",
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        ":all_kernels",
        ":core",
        ":direct_session",
        ":example_parser_configuration",
        ":gpu_runtime",
        ":lib",
    ],
)


cc_library(
    name = "all_kernels",
    visibility = ["//visibility:public"],
    deps = if_dynamic_kernels(
        [],
        otherwise = [":all_kernels_statically_linked"],
    ),
)


# This is a link-only library to provide a DirectSession
# implementation of the Session interface.
tf_cuda_library(
    name = "direct_session",
    copts = tf_copts(),
    linkstatic = 1,
    visibility = ["//visibility:public"],
    deps = [
        ":direct_session_internal",
    ],
    alwayslink = 1,
)

filegroup(
    name = "example_parser_configuration_testdata",
    srcs = [
        "example/testdata/parse_example_graph_def.pbtxt",
    ],
)

cc_library(
    name = "core",
    visibility = ["//visibility:public"],
    deps = [
        ":core_cpu",
        ":gpu_runtime",
        ":sycl_runtime",
    ],
)


cc_library(
    name = "lib",
    hdrs = [
        "lib/bfloat16/bfloat16.h",
        "lib/core/arena.h",
        "lib/core/bitmap.h",
        "lib/core/bits.h",
        "lib/core/casts.h",
        "lib/core/coding.h",
        "lib/core/errors.h",
        "lib/core/notification.h",
        "lib/core/raw_coding.h",
        "lib/core/status.h",
        "lib/core/stringpiece.h",
        "lib/core/threadpool.h",
        "lib/gtl/array_slice.h",
        "lib/gtl/cleanup.h",
        "lib/gtl/compactptrset.h",
        "lib/gtl/flatmap.h",
        "lib/gtl/flatset.h",
        "lib/gtl/inlined_vector.h",
        "lib/gtl/optional.h",
        "lib/gtl/priority_queue_util.h",
        "lib/hash/crc32c.h",
        "lib/hash/hash.h",
        "lib/histogram/histogram.h",
        "lib/io/buffered_inputstream.h",
        "lib/io/compression.h",
        "lib/io/inputstream_interface.h",
        "lib/io/path.h",
        "lib/io/proto_encode_helper.h",
        "lib/io/random_inputstream.h",
        "lib/io/record_reader.h",
        "lib/io/record_writer.h",
        "lib/io/table.h",
        "lib/io/table_builder.h",
        "lib/io/table_options.h",
        "lib/math/math_util.h",
        "lib/monitoring/collected_metrics.h",
        "lib/monitoring/collection_registry.h",
        "lib/monitoring/counter.h",
        "lib/monitoring/gauge.h",
        "lib/monitoring/metric_def.h",
        "lib/monitoring/sampler.h",
        "lib/random/distribution_sampler.h",
        "lib/random/philox_random.h",
        "lib/random/random_distributions.h",
        "lib/random/simple_philox.h",
        "lib/strings/numbers.h",
        "lib/strings/proto_serialization.h",
        "lib/strings/str_util.h",
        "lib/strings/strcat.h",
        "lib/strings/stringprintf.h",
        ":platform_base_hdrs",
        ":platform_env_hdrs",
        ":platform_file_system_hdrs",
        ":platform_other_hdrs",
        ":platform_port_hdrs",
        ":platform_protobuf_hdrs",
    ],
    visibility = ["//visibility:public"],
    deps = [
        ":lib_internal",
        "@com_google_absl//absl/container:inlined_vector",
        "@com_google_absl//absl/strings",
        "@com_google_absl//absl/types:optional",
    ],
)

# This includes implementations of all kernels built into TensorFlow.
cc_library(
    name = "all_kernels_statically_linked",
    visibility = ["//visibility:private"],
    deps = [
        "//tensorflow/core/kernels:array",
        "//tensorflow/core/kernels:audio",
        "//tensorflow/core/kernels:batch_kernels",
        "//tensorflow/core/kernels:bincount_op",
        "//tensorflow/core/kernels:boosted_trees_ops",
        "//tensorflow/core/kernels:candidate_sampler_ops",
        "//tensorflow/core/kernels:checkpoint_ops",
        "//tensorflow/core/kernels:collective_ops",
        "//tensorflow/core/kernels:control_flow_ops",
        "//tensorflow/core/kernels:ctc_ops",
        "//tensorflow/core/kernels:cudnn_rnn_kernels",
        "//tensorflow/core/kernels:data_flow",
        "//tensorflow/core/kernels:dataset_ops",
        "//tensorflow/core/kernels:decode_proto_op",
        "//tensorflow/core/kernels:encode_proto_op",
        "//tensorflow/core/kernels:fake_quant_ops",
        "//tensorflow/core/kernels:function_ops",
        "//tensorflow/core/kernels:functional_ops",
        "//tensorflow/core/kernels:grappler",
        "//tensorflow/core/kernels:histogram_op",
        "//tensorflow/core/kernels:image",
        "//tensorflow/core/kernels:io",
        "//tensorflow/core/kernels:linalg",
        "//tensorflow/core/kernels:list_kernels",
        "//tensorflow/core/kernels:lookup",
        "//tensorflow/core/kernels:logging",
        "//tensorflow/core/kernels:manip",
        "//tensorflow/core/kernels:math",
        "//tensorflow/core/kernels:multinomial_op",
        "//tensorflow/core/kernels:nn",
        "//tensorflow/core/kernels:parameterized_truncated_normal_op",
        "//tensorflow/core/kernels:parsing",
        "//tensorflow/core/kernels:partitioned_function_ops",
        "//tensorflow/core/kernels:random_ops",
        "//tensorflow/core/kernels:random_poisson_op",
        "//tensorflow/core/kernels:remote_fused_graph_ops",
        "//tensorflow/core/kernels:required",
        "//tensorflow/core/kernels:resource_variable_ops",
        "//tensorflow/core/kernels:rpc_op",
        "//tensorflow/core/kernels:scoped_allocator_ops",
        "//tensorflow/core/kernels:sdca_ops",
        "//tensorflow/core/kernels:searchsorted_op",
        "//tensorflow/core/kernels:set_kernels",
        "//tensorflow/core/kernels:sparse",
        "//tensorflow/core/kernels:state",
        "//tensorflow/core/kernels:stateless_random_ops",
        "//tensorflow/core/kernels:string",
        "//tensorflow/core/kernels:summary_kernels",
        "//tensorflow/core/kernels:training_ops",
        "//tensorflow/core/kernels:word2vec_kernels",
    ] + tf_additional_cloud_kernel_deps() + if_not_windows([
        "//tensorflow/core/kernels:fact_op",
        "//tensorflow/core/kernels:array_not_windows",
        "//tensorflow/core/kernels:math_not_windows",
        "//tensorflow/core/kernels:quantized_ops",
        "//tensorflow/core/kernels/neon:neon_depthwise_conv_op",
    ]) + if_mkl([
        "//tensorflow/core/kernels:mkl_concat_op",
        "//tensorflow/core/kernels:mkl_conv_op",
        "//tensorflow/core/kernels:mkl_cwise_ops_common",
        "//tensorflow/core/kernels:mkl_fused_batch_norm_op",
        "//tensorflow/core/kernels:mkl_identity_op",
        "//tensorflow/core/kernels:mkl_input_conversion_op",
        "//tensorflow/core/kernels:mkl_lrn_op",
        "//tensorflow/core/kernels:mkl_pooling_ops",
        "//tensorflow/core/kernels:mkl_relu_op",
        "//tensorflow/core/kernels:mkl_reshape_op",
        "//tensorflow/core/kernels:mkl_slice_op",
        "//tensorflow/core/kernels:mkl_softmax_op",
        "//tensorflow/core/kernels:mkl_transpose_op",
        "//tensorflow/core/kernels:mkl_tfconv_op",
        "//tensorflow/core/kernels:mkl_aggregate_ops",
    ]) + if_cuda([
        "//tensorflow/core/grappler/optimizers:gpu_swapping_kernels",
        "//tensorflow/core/grappler/optimizers:gpu_swapping_ops",
    ]),
)
TensorFlow Mobile的编译配置
===== tensorflow/contrib/android/BUILD =====
cc_binary(
    name = "libtensorflow_inference.so",
    srcs = [],
    copts = tf_copts() + [
        "-ffunction-sections",
        "-fdata-sections",
    ],
    linkopts = if_android([
        "-landroid",
        "-latomic",
        "-ldl",
        "-llog",
        "-lm",
        "-z defs",
        "-s",
        "-Wl,--gc-sections",
        "-Wl,--version-script",  # This line must be directly followed by LINKER_SCRIPT.
        "$(location {})".format(LINKER_SCRIPT),
    ]),
    linkshared = 1,
    linkstatic = 1,
    tags = [
        "manual",
        "notap",
    ],
    deps = [
        ":android_tensorflow_inference_jni",
        "//tensorflow/core:android_tensorflow_lib",
        LINKER_SCRIPT,
    ],
)


cc_library(
    name = "android_tensorflow_inference_jni",
    srcs = if_android([":android_tensorflow_inference_jni_srcs"]),
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow/core:android_tensorflow_lib_lite",
        "//tensorflow/java/src/main/native",
    ],
    alwayslink = 1,
)


===== tensorflow/core/BUILD ===== 
cc_library(
    name = "android_tensorflow_lib",
    srcs = if_android([":android_op_registrations_and_gradients"]),
    copts = tf_copts(),
    tags = [
        "manual",
        "notap",
    ],
    visibility = ["//visibility:public"],
    deps = [
        ":android_tensorflow_lib_lite",
        ":protos_all_cc_impl",
        "//tensorflow/core/kernels:android_tensorflow_kernels",
        "//third_party/eigen3",
        "@protobuf_archive//:protobuf",
    ],
    alwayslink = 1,
)


cc_library(
    name = "android_tensorflow_lib_lite",
    srcs = if_android(["//tensorflow/core:android_srcs"]),
    copts = tf_copts(android_optimization_level_override = None),
    linkopts = ["-lz"],
    tags = [
        "manual",
        "notap",
    ],
    visibility = ["//visibility:public"],
    deps = [
        ":mobile_additional_lib_deps",
        ":protos_all_cc_impl",
        ":stats_calculator_portable",
        "//third_party/eigen3",
        "@double_conversion//:double-conversion",
        "@nsync//:nsync_cpp",
        "@protobuf_archive//:protobuf",
    ],
    alwayslink = 1,
)

alias(
    name = "android_srcs",
    actual = ":mobile_srcs",
    visibility = ["//visibility:public"],
)

filegroup(
    name = "mobile_srcs",
    srcs = [
        ":mobile_srcs_no_runtime",
        ":mobile_srcs_only_runtime",
    ],
    visibility = ["//visibility:public"],
)

# Core sources for Android builds.
filegroup(
    name = "mobile_srcs_no_runtime",
    srcs = [
        ":protos_all_proto_text_srcs",
        ":error_codes_proto_text_srcs",
        "//tensorflow/core/platform/default/build_config:android_srcs",
    ] + glob(
        [
            "client/**/*.cc",
            "framework/**/*.h",
            "framework/**/*.cc",
            "lib/**/*.h",
            "lib/**/*.cc",
            "platform/**/*.h",
            "platform/**/*.cc",
            "public/**/*.h",
            "util/**/*.h",
            "util/**/*.cc",
        ],
        exclude = [
            "**/*test.*",
            "**/*testutil*",
            "**/*testlib*",
            "**/*main.cc",
            "debug/**/*",
            "framework/op_gen_*",
            "lib/jpeg/**/*",
            "lib/png/**/*",
            "lib/gif/**/*",
            "util/events_writer.*",
            "util/stats_calculator.*",
            "util/reporter.*",
            "platform/**/cuda_libdevice_path.*",
            "platform/default/test_benchmark.*",
            "platform/cuda.h",
            "platform/google/**/*",
            "platform/hadoop/**/*",
            "platform/gif.h",
            "platform/jpeg.h",
            "platform/png.h",
            "platform/stream_executor.*",
            "platform/windows/**/*",
            "user_ops/**/*.cu.cc",
            "util/ctc/*.h",
            "util/ctc/*.cc",
            "util/tensor_bundle/*.h",
            "util/tensor_bundle/*.cc",
            "common_runtime/gpu/**/*",
            "common_runtime/eager/*",
            "common_runtime/gpu_device_factory.*",
        ],
    ),
    visibility = ["//visibility:public"],
)

filegroup(
    name = "mobile_srcs_only_runtime",
    srcs = [
        "//tensorflow/core/kernels:android_srcs",
        "//tensorflow/core/util/ctc:android_srcs",
        "//tensorflow/core/util/tensor_bundle:android_srcs",
    ] + glob(
        [
            "common_runtime/**/*.h",
            "common_runtime/**/*.cc",
            "graph/**/*.h",
            "graph/**/*.cc",
        ],
        exclude = [
            "**/*test.*",
            "**/*testutil*",
            "**/*testlib*",
            "**/*main.cc",
            "common_runtime/gpu/**/*",
            "common_runtime/eager/*",
            "common_runtime/gpu_device_factory.*",
            "graph/dot.*",
        ],
    ),
    visibility = ["//visibility:public"],
)

cc_library(
    name = "stats_calculator_portable",
    srcs = [
        "util/stat_summarizer_options.h",
        "util/stats_calculator.cc",
    ],
    hdrs = [
        "util/stats_calculator.h",
    ],
    copts = tf_copts(),
)

cc_library(
    name = "mobile_additional_lib_deps",
    deps = tf_additional_lib_deps() + [
        "@com_google_absl//absl/strings",
    ],
)


===== tensorflow/core/kernels/BUILD ===== 
cc_library(
    name = "android_tensorflow_kernels",
    srcs = select({
        "//tensorflow:android": [
            "//tensorflow/core/kernels:android_core_ops",
            "//tensorflow/core/kernels:android_extended_ops",
        ],
        "//conditions:default": [],
    }),
    copts = tf_copts(),
    linkopts = select({
        "//tensorflow:android": [
            "-ldl",
        ],
        "//conditions:default": [],
    }),
    tags = [
        "manual",
        "notap",
    ],
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow/core:android_tensorflow_lib_lite",
        "//tensorflow/core:protos_all_cc_impl",
        "//third_party/eigen3",
        "//third_party/fft2d:fft2d_headers",
        "@fft2d",
        "@gemmlowp",
        "@protobuf_archive//:protobuf",
    ],
    alwayslink = 1,
)


# Core kernels we want on Android. Only a subset of kernels to keep
# base library small.
filegroup(
    name = "android_core_ops",
    srcs = [
        "aggregate_ops.cc",
        "aggregate_ops.h",
        "aggregate_ops_cpu.h",
        "assign_op.h",
        "bias_op.cc",
        "bias_op.h",
        "bounds_check.h",
        "cast_op.cc",
        "cast_op.h",
        "cast_op_impl.h",
        "cast_op_impl_bfloat.cc",
        "cast_op_impl_bool.cc",
        "cast_op_impl_complex128.cc",
        "cast_op_impl_complex64.cc",
        "cast_op_impl_double.cc",
        "cast_op_impl_float.cc",
        "cast_op_impl_half.cc",
        "cast_op_impl_int16.cc",
        "cast_op_impl_int32.cc",
        "cast_op_impl_int64.cc",
        "cast_op_impl_int8.cc",
        "cast_op_impl_uint16.cc",
        "cast_op_impl_uint32.cc",
        "cast_op_impl_uint64.cc",
        "cast_op_impl_uint8.cc",
        "concat_lib.h",
        "concat_lib_cpu.cc",
        "concat_lib_cpu.h",
        "concat_op.cc",
        "constant_op.cc",
        "constant_op.h",
        "cwise_ops.h",
        "cwise_ops_common.cc",
        "cwise_ops_common.h",
        "cwise_ops_gradients.h",
        "dense_update_functor.cc",
        "dense_update_functor.h",
        "dense_update_ops.cc",
        "example_parsing_ops.cc",
        "fill_functor.cc",
        "fill_functor.h",
        "function_ops.cc",
        "function_ops.h",
        "gather_functor.h",
        "gather_nd_op.cc",
        "gather_nd_op.h",
        "gather_nd_op_cpu_impl.h",
        "gather_nd_op_cpu_impl_0.cc",
        "gather_nd_op_cpu_impl_1.cc",
        "gather_nd_op_cpu_impl_2.cc",
        "gather_nd_op_cpu_impl_3.cc",
        "gather_nd_op_cpu_impl_4.cc",
        "gather_nd_op_cpu_impl_5.cc",
        "gather_nd_op_cpu_impl_6.cc",
        "gather_nd_op_cpu_impl_7.cc",
        "gather_op.cc",
        "identity_n_op.cc",
        "identity_n_op.h",
        "identity_op.cc",
        "identity_op.h",
        "immutable_constant_op.cc",
        "immutable_constant_op.h",
        "matmul_op.cc",
        "matmul_op.h",
        "no_op.cc",
        "no_op.h",
        "non_max_suppression_op.cc",
        "non_max_suppression_op.h",
        "one_hot_op.cc",
        "one_hot_op.h",
        "ops_util.h",
        "pack_op.cc",
        "pooling_ops_common.h",
        "reshape_op.cc",
        "reshape_op.h",
        "reverse_sequence_op.cc",
        "reverse_sequence_op.h",
        "sendrecv_ops.cc",
        "sendrecv_ops.h",
        "sequence_ops.cc",
        "shape_ops.cc",
        "shape_ops.h",
        "slice_op.cc",
        "slice_op.h",
        "slice_op_cpu_impl.h",
        "slice_op_cpu_impl_1.cc",
        "slice_op_cpu_impl_2.cc",
        "slice_op_cpu_impl_3.cc",
        "slice_op_cpu_impl_4.cc",
        "slice_op_cpu_impl_5.cc",
        "slice_op_cpu_impl_6.cc",
        "slice_op_cpu_impl_7.cc",
        "softmax_op.cc",
        "softmax_op_functor.h",
        "split_lib.h",
        "split_lib_cpu.cc",
        "split_op.cc",
        "split_v_op.cc",
        "strided_slice_op.cc",
        "strided_slice_op.h",
        "strided_slice_op_impl.h",
        "strided_slice_op_inst_0.cc",
        "strided_slice_op_inst_1.cc",
        "strided_slice_op_inst_2.cc",
        "strided_slice_op_inst_3.cc",
        "strided_slice_op_inst_4.cc",
        "strided_slice_op_inst_5.cc",
        "strided_slice_op_inst_6.cc",
        "strided_slice_op_inst_7.cc",
        "unpack_op.cc",
        "variable_ops.cc",
        "variable_ops.h",
    ],
)

# Other kernels we may want on Android.
#
# The kernels can be consumed as a whole or in two groups for
# supporting separate compilation. Note that the split into groups
# is entirely for improving compilation time, and not for
# organizational reasons; you should not depend on any
# of those groups independently.
filegroup(
    name = "android_extended_ops",
    srcs = [
        ":android_extended_ops_group1",
        ":android_extended_ops_group2",
        ":android_quantized_ops",
    ],
    visibility = ["//visibility:public"],
)

filegroup(
    name = "android_extended_ops_headers",
    srcs = [
        "argmax_op.h",
        "avgpooling_op.h",
        "batch_matmul_op_impl.h",
        "batch_norm_op.h",
        "control_flow_ops.h",
        "conv_2d.h",
        "conv_ops.h",
        "data_format_ops.h",
        "depthtospace_op.h",
        "depthwise_conv_op.h",
        "fake_quant_ops_functor.h",
        "fused_batch_norm_op.h",
        "gemm_functors.h",
        "image_resizer_state.h",
        "initializable_lookup_table.h",
        "lookup_table_init_op.h",
        "lookup_table_op.h",
        "lookup_util.h",
        "maxpooling_op.h",
        "mfcc.h",
        "mfcc_dct.h",
        "mfcc_mel_filterbank.h",
        "mirror_pad_op.h",
        "mirror_pad_op_cpu_impl.h",
        "pad_op.h",
        "random_op.h",
        "reduction_ops.h",
        "reduction_ops_common.h",
        "relu_op.h",
        "relu_op_functor.h",
        "reshape_util.h",
        "resize_bilinear_op.h",
        "resize_nearest_neighbor_op.h",
        "reverse_op.h",
        "save_restore_tensor.h",
        "segment_reduction_ops.h",
        "softplus_op.h",
        "softsign_op.h",
        "spacetobatch_functor.h",
        "spacetodepth_op.h",
        "spectrogram.h",
        "string_util.h",
        "tensor_array.h",
        "tile_functor.h",
        "tile_ops_cpu_impl.h",
        "tile_ops_impl.h",
        "topk_op.h",
        "training_op_helpers.h",
        "training_ops.h",
        "transpose_functor.h",
        "transpose_op.h",
        "where_op.h",
        "xent_op.h",
    ],
)

filegroup(
    name = "android_extended_ops_group1",
    srcs = [
        "argmax_op.cc",
        "avgpooling_op.cc",
        "batch_matmul_op_real.cc",
        "batch_norm_op.cc",
        "bcast_ops.cc",
        "check_numerics_op.cc",
        "control_flow_ops.cc",
        "conv_2d.h",
        "conv_grad_filter_ops.cc",
        "conv_grad_input_ops.cc",
        "conv_grad_ops.cc",
        "conv_grad_ops.h",
        "conv_ops.cc",
        "conv_ops_fused.cc",
        "conv_ops_using_gemm.cc",
        "crop_and_resize_op.cc",
        "crop_and_resize_op.h",
        "cwise_op_abs.cc",
        "cwise_op_add_1.cc",
        "cwise_op_add_2.cc",
        "cwise_op_bitwise_and.cc",
        "cwise_op_bitwise_or.cc",
        "cwise_op_bitwise_xor.cc",
        "cwise_op_div.cc",
        "cwise_op_equal_to_1.cc",
        "cwise_op_equal_to_2.cc",
        "cwise_op_not_equal_to_1.cc",
        "cwise_op_not_equal_to_2.cc",
        "cwise_op_exp.cc",
        "cwise_op_floor.cc",
        "cwise_op_floor_div.cc",
        "cwise_op_floor_mod.cc",
        "cwise_op_greater.cc",
        "cwise_op_greater_equal.cc",
        "cwise_op_invert.cc",
        "cwise_op_isfinite.cc",
        "cwise_op_isnan.cc",
        "cwise_op_left_shift.cc",
        "cwise_op_less.cc",
        "cwise_op_less_equal.cc",
        "cwise_op_log.cc",
        "cwise_op_logical_and.cc",
        "cwise_op_logical_not.cc",
        "cwise_op_logical_or.cc",
        "cwise_op_maximum.cc",
        "cwise_op_minimum.cc",
        "cwise_op_mul_1.cc",
        "cwise_op_mul_2.cc",
        "cwise_op_neg.cc",
        "cwise_op_pow.cc",
        "cwise_op_reciprocal.cc",
        "cwise_op_right_shift.cc",
        "cwise_op_round.cc",
        "cwise_op_rsqrt.cc",
        "cwise_op_select.cc",
        "cwise_op_sigmoid.cc",
        "cwise_op_sign.cc",
        "cwise_op_sqrt.cc",
        "cwise_op_square.cc",
        "cwise_op_squared_difference.cc",
        "cwise_op_sub.cc",
        "cwise_op_tanh.cc",
        "cwise_op_xlogy.cc",
        "cwise_op_xdivy.cc",
        "data_format_ops.cc",
        "decode_wav_op.cc",
        "deep_conv2d.cc",
        "deep_conv2d.h",
        "depthwise_conv_op.cc",
        "dynamic_partition_op.cc",
        "encode_wav_op.cc",
        "fake_quant_ops.cc",
        "fifo_queue.cc",
        "fifo_queue_op.cc",
        "fused_batch_norm_op.cc",
        "listdiff_op.cc",
        "population_count_op.cc",
        "population_count_op.h",
        "winograd_transform.h",
        ":android_extended_ops_headers",
    ] + select({
        ":xsmm_convolutions": [
            "xsmm_conv2d.h",
            "xsmm_conv2d.cc",
        ],
        "//conditions:default": [],
    }),
)

filegroup(
    name = "android_extended_ops_group2",
    srcs = [
        "batchtospace_op.cc",
        "ctc_decoder_ops.cc",
        "decode_bmp_op.cc",
        "depthtospace_op.cc",
        "dynamic_stitch_op.cc",
        "in_topk_op.cc",
        "initializable_lookup_table.cc",
        "logging_ops.cc",
        "lookup_table_init_op.cc",
        "lookup_table_op.cc",
        "lookup_util.cc",
        "lrn_op.cc",
        "maxpooling_op.cc",
        "mfcc.cc",
        "mfcc_dct.cc",
        "mfcc_mel_filterbank.cc",
        "mfcc_op.cc",
        "mirror_pad_op.cc",
        "mirror_pad_op_cpu_impl_1.cc",
        "mirror_pad_op_cpu_impl_2.cc",
        "mirror_pad_op_cpu_impl_3.cc",
        "mirror_pad_op_cpu_impl_4.cc",
        "mirror_pad_op_cpu_impl_5.cc",
        "pad_op.cc",
        "padding_fifo_queue.cc",
        "padding_fifo_queue_op.cc",
        "queue_base.cc",
        "queue_op.cc",
        "queue_ops.cc",
        "random_op.cc",
        "reduction_ops_all.cc",
        "reduction_ops_any.cc",
        "reduction_ops_common.cc",
        "reduction_ops_max.cc",
        "reduction_ops_mean.cc",
        "reduction_ops_min.cc",
        "reduction_ops_prod.cc",
        "reduction_ops_sum.cc",
        "relu_op.cc",
        "reshape_util.cc",
        "resize_bilinear_op.cc",
        "resize_nearest_neighbor_op.cc",
        "restore_op.cc",
        "reverse_op.cc",
        "save_op.cc",
        "save_restore_tensor.cc",
        "save_restore_v2_ops.cc",
        "segment_reduction_ops.cc",
        "session_ops.cc",
        "softplus_op.cc",
        "softsign_op.cc",
        "spacetobatch_functor.cc",
        "spacetobatch_op.cc",
        "spacetodepth_op.cc",
        "sparse_fill_empty_rows_op.cc",
        "sparse_reshape_op.cc",
        "sparse_to_dense_op.cc",
        "spectrogram.cc",
        "spectrogram_op.cc",
        "stack_ops.cc",
        "string_join_op.cc",
        "string_util.cc",
        "summary_op.cc",
        "tensor_array.cc",
        "tensor_array_ops.cc",
        "tile_functor_cpu.cc",
        "tile_ops.cc",
        "tile_ops_cpu_impl_1.cc",
        "tile_ops_cpu_impl_2.cc",
        "tile_ops_cpu_impl_3.cc",
        "tile_ops_cpu_impl_4.cc",
        "tile_ops_cpu_impl_5.cc",
        "tile_ops_cpu_impl_6.cc",
        "tile_ops_cpu_impl_7.cc",
        "topk_op.cc",
        "training_op_helpers.cc",
        "training_ops.cc",
        "transpose_functor_cpu.cc",
        "transpose_op.cc",
        "unique_op.cc",
        "where_op.cc",
        "xent_op.cc",
        ":android_extended_ops_headers",
    ],
)

TensorFlow Mobile通过编译选项,在完整的TensorFlow基础上进行裁剪,在保留TensorFlow核心功能的同时去掉不必要的代码。例如分布式执行的逻辑,windows平台的兼容逻辑,利用gpu计算的逻辑等等。

TensorFlow Mobile的OP支持完整吗?

TensorFlow Mobile并不包含所有的OP,只有一些核心必要的op,详见上面android_core_ops和android_extended_ops。

TensorFlow Lite在实现上又有啥区别

TensorFlow Lite的源码在tensorflow/contrib/lite目录下。其核心编译逻辑如下

### tensorflow/contrib/lite/BUILD
cc_library(
    name = "framework",
    srcs = [
        "allocation.cc",
        "graph_info.cc",
        "interpreter.cc",
        "model.cc",
        "mutable_op_resolver.cc",
        "optional_debug_tools.cc",
        "stderr_reporter.cc",
    ] + select({
        "//tensorflow:android": [
            "nnapi_delegate.cc",
            "mmap_allocation.cc",
        ],
        "//tensorflow:windows": [
            "nnapi_delegate_disabled.cc",
            "mmap_allocation_disabled.cc",
        ],
        "//conditions:default": [
            "nnapi_delegate_disabled.cc",
            "mmap_allocation.cc",
        ],
    }),
    hdrs = [
        "allocation.h",
        "context.h",
        "context_util.h",
        "error_reporter.h",
        "graph_info.h",
        "interpreter.h",
        "model.h",
        "mutable_op_resolver.h",
        "nnapi_delegate.h",
        "op_resolver.h",
        "optional_debug_tools.h",
        "stderr_reporter.h",
    ],
    copts = tflite_copts(),
    linkopts = [
    ] + select({
        "//tensorflow:android": [
            "-llog",
        ],
        "//conditions:default": [
        ],
    }),
    deps = [
        ":arena_planner",
        ":graph_info",
        ":memory_planner",
        ":schema_fbs_version",
        ":simple_memory_arena",
        ":string",
        ":util",
        "//tensorflow/contrib/lite/c:c_api_internal",
        "//tensorflow/contrib/lite/core/api",
        "//tensorflow/contrib/lite/kernels:eigen_support",
        "//tensorflow/contrib/lite/kernels:gemm_support",
        "//tensorflow/contrib/lite/nnapi:nnapi_lib",
        "//tensorflow/contrib/lite/profiling:profiler",
        "//tensorflow/contrib/lite/schema:schema_fbs",
    ],
)

相比TensorFlow Mobile是对完整TensorFlow的裁减,TensorFlow Lite基本就是重新实现了。从内部实现来说,在TensorFlow内核最基本的OP,Context等数据结构,都是新的。从外在表现来说,模型文件从PB格式改成了FlatBuffers格式,TensorFlow的size有大幅度优化,降至300K,然后提供一个converter将普通TensorFlow模型转化成TensorFlow Lite需要的格式。因此,无论从哪方面看,TensorFlow Lite都是一个新的实现方案。

参考资料

TensorFlow Architecture
TensorFlow Mobile VS TensorFlow Lite
TensorFlow代码解析
TensorFlow Lite

相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
目录
相关文章
|
8月前
|
机器学习/深度学习 人工智能 API
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:1~5
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:1~5
236 0
|
8月前
|
机器学习/深度学习 存储 人工智能
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(3)
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(3)
128 0
|
Java TensorFlow 算法框架/工具
Android 中集成 TensorFlow Lite图片识别
Android 中集成 TensorFlow Lite图片识别
131 0
|
8月前
|
机器学习/深度学习 Dart TensorFlow
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(5)
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(5)
178 0
|
5月前
|
开发者 算法 虚拟化
惊爆!Uno Platform 调试与性能分析终极攻略,从工具运用到代码优化,带你攻克开发难题成就完美应用
【8月更文挑战第31天】在 Uno Platform 中,调试可通过 Visual Studio 设置断点和逐步执行代码实现,同时浏览器开发者工具有助于 Web 版本调试。性能分析则利用 Visual Studio 的性能分析器检查 CPU 和内存使用情况,还可通过记录时间戳进行简单分析。优化性能涉及代码逻辑优化、资源管理和用户界面简化,综合利用平台提供的工具和技术,确保应用高效稳定运行。
112 0
|
机器学习/深度学习 PyTorch TensorFlow
TensorFlow VS PyTorch哪个更强?
TensorFlow 和 PyTorch 都是流行的深度学习框架,它们有一些共同点,例如都支持多种编程语言和硬件平台,也都提供了丰富的工具和库来支持深度学习模型的构建、训练和部署。以下是它们的一些区别和优缺点: 区别: 1. 编程风格:TensorFlow 使用的是静态图模型,需要先定义整个计算图,然后再进行计算。PyTorch 使用的是动态图模型,可以像普通 Python 代码一样进行定义、调试和修改。 2. 计算效率:由于 TensorFlow 的计算图是静态的,可以对其进行优化和分布式计算,因此在大规模数据和模型上进行训练时,TensorFlow 的效率更高。而 PyTorch 在小规模
549 0
|
8月前
|
机器学习/深度学习 PyTorch TensorFlow
TensorFlow vs PyTorch:深度学习框架的比较研究
TensorFlow vs PyTorch:深度学习框架的比较研究
92 1
|
8月前
|
机器学习/深度学习 存储 编解码
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(4)
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(4)
175 0
|
8月前
|
机器学习/深度学习 存储 算法框架/工具
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(2)
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(2)
97 0
|
8月前
|
机器学习/深度学习 存储 运维
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(1)
TensorFlow Lite,ML Kit 和 Flutter 移动深度学习:6~11(1)
107 0