1. 计算模型复杂度的衡量

FLOPS（即“每秒浮点运算次数”，“每秒峰值速度”），是“每秒所执行的浮点运算次数”（floating-point operations per second）的缩写。它常被用来估算电脑的执行效能，尤其是在使用到大量浮点运算的科学计算领域中。正因为FLOPS字尾的那个S，代表秒，而不是复数，所以不能省略掉。

FLOPs ：s为小写，指浮点运算数，理解为计算量。可以用来衡量算法/模型的复杂度。（模型）在论文中常用GFLOPs（1 G F L O P s = 1 0 9 F L O P s 1 GFLOPs = 10^9 FLOPs1GFLOPs=10

FLOPs）

MADD、MACC（multiply-accumulate operations）：意指先乘起来再加起来的运算次数。也用来衡量算法/模型的复杂度。大约是 FLOPs 的一半。

简单举例：

y = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + ... + w[n-1]*x[n-1]

上面的等式，有n次浮点乘法，n − 1 n-1n−1次浮点加法，所以浮点运算的次数为F L O P s = 2 n − 1 FLOPs = 2n-1FLOPs=2n−1。

先乘起来再加起来的浮点运算次数为n次，所以M A C C = n MACC=nMACC=n

2 . 典型层的复杂性计算原理

2.1 全连接层的复杂性计算

对于y = matmul(X, W) + b，我们假设X XX为的维度为[ m , p ] [m,p][m,p],w ww的维度为[ p , n ] [p,n][p,n]，b bb为[ n ] [n][n]，所以我们有公式可以得出这个矩阵有 F L O P s = m × n × ( 2 p − 1 ) + m × n FLOPs=m×n×(2p-1)+m×nFLOPs=m×n×(2p−1)+m×n次浮点运算，M A C C = m × n × p MACC = m×n×pMACC=m×n×p次乘加浮点运算。

举例。对于一个全连接层，输入向量inputs为100维，输出向量outputs为200维，所以浮点运算个数为F L O P s = 200 × ( 200 − 1 ) + 200 FLOPs=200×(200-1)+200FLOPs=200×(200−1)+200

2.2 卷积层的复杂性计算

对于一个卷积层，我们输入为[ b a t c h , H , W , C ] [batch,H,W,C][batch,H,W,C]，有G GG个[ f , f ] [f,f][f,f]卷积核，padding = SAME,stride=1。所以M A C C = b a t c h × H × W × f × f × C × G MACC=batch×H×W×f×f×C×GMACC=batch×H×W×f×f×C×G

3. 全连接Tensorflow实现

我们定义一个函数对Tensorflow的浮点计算量(FLOPs)和参数量进行统计。

import tensorflow as tf
def stats_graph(graph):
    flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
    params = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.trainable_variables_parameter())
    print('FLOPs: {};    Trainable params: {}'.format(flops.total_float_ops, params.total_parameters))

我们定义一个矩阵A与矩阵B相乘的运算，去分析浮点计算量和参数量y = matmul(X, W) + b，假设X XX为的维度为[ 25 , 16 ] [25,16][25,16],w ww的维度为[ 16 , 9 ] [16,9][16,9]，b bb为[ 9 ] [9][9]，所以

F L O P s = 25 × 9 × ( 2 × 16 − 1 ) + 25 × 9 = 7200 FLOPs=25×9×(2×16-1)+25×9=7200

FLOPs=25×9×(2×16−1)+25×9=7200

p r a r m e t e r s = 25 × 16 + 16 × 9 + 9 = 553 prarmeters=25×16+16×9+9=553

prarmeters=25×16+16×9+9=553

F L O P s ( i n T F s t y l e s ) = 25 × 9 × ( 2 × 16 ) + 25 × 9 = 7425 FLOPs(inTFstyles)=25×9×(2×16)+25×9=7425

FLOPs(inTFstyles)=25×9×(2×16)+25×9=7425

利用正态分布对变量进行初始化的程序

with tf.Graph().as_default() as graph:
    X = tf.get_variable(initializer=tf.random_normal_initializer(dtype=tf.float32), shape=(25, 16), name='X')
    W = tf.get_variable(initializer=tf.random_normal_initializer(dtype=tf.float32), shape=(16, 9), name='W')
    b = tf.get_variable(initializer=tf.random_normal_initializer(dtype=tf.float32),shape=(9,),name="b")
    C = tf.matmul(X, W, name='ouput') 
    stats_graph(graph)

结果

FLOPs: 8531;    Trainable params: 553

考虑过到对变量初始化的过程也需要进行浮点运算，所以真实的输出要比7425大。

利用常量初始化器对变量进行初始化的程序

with tf.Graph().as_default() as graph:
    X = tf.get_variable(initializer=tf.constant_initializer(value=1, dtype=tf.float32), shape=(25, 16), name='X')
    W = tf.get_variable(initializer=tf.zeros_initializer(dtype=tf.float32), shape=(16, 9), name='W')
    b = tf.get_variable(initializer=tf.zeros_initializer(dtype=tf.float32),shape=(9,),name="b")
    C = tf.matmul(X, W, name='ouput')  + b
    stats_graph(graph)

结果：

FLOPs: 7425;    Trainable params: 553
• 1

由此我们可知，由常量初始化器对变量进行初始化，不会消耗FLOP。

4. GraphDef

Tensorflow所运行的代码，或者说用python代码表达的计算，所描述的对象实际上就是一张计算图，包含了各个运算节点和用于计算的张量。而Graph_def是图Graph的序列表示。python所描述的这个graph，并不是在运行Tensorflow，启动一个Session后就保持不变的，因为Tensorflow在实际运行过程中，真实的计算是会被下放到多CPU，或者GPU、ARM等异构设备上进行高性能计算的，如果仅仅单纯地使用python肯定是无法有效地完成计算的。所以Tensorflow的实际计算过程是这样的：

Tensorflow先将python代码所描绘的图进行转换，转化成由许多NodeDef的Protocol Buffer（即序列化），在概念上NoteDef与（Python Graph中的）Operation相对应，再通过C/C++/CUDA运行Protocol Buffer所定义的图

a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
d = a*b

上文 Python 图对应的 GraphDef

node { name: 'Placeholder' # 注释：这是一个叫做 'Placeholder' 的
       node op: 'Placeholder' 
     attr { key: 'dtype' value { type: DT_FLOAT } } 
     attr { key: 'shape' value { shape { unknown_rank: true } } }
  }
node { name: 'Placeholder_1' # 注释：这是一个叫做 'Placeholder_1' 的
      node op: 'Placeholder'
    attr { key: 'dtype' value { type: DT_FLOAT } } 
    attr { key: 'shape' value { shape { unknown_rank: true } } }
  }
node { name: 'mul' # 注释：一个 Mul（乘法）操作
     op: 'Mul' 
     input: 'Placeholder' # 使用上面的node（即Placeholder和Placeholder_1）
     input: 'Placeholder_1' # 作为这个Node的输入 
       attr { key: 'T' value { type: DT_FLOAT } }
  }

以上三个 NodeDef 定义了两个Placeholder和一个Multiply。Placeholder 通过 attr（attribute的缩写）来定义数据类型和 Tensor 的形状。Multiply通过 input 属性定义了两个placeholder作为其输入。无论是 Placeholder 还是 Multiply 都没有关于输出（output）的信息。其实 Tensorflow 中都是通过 Input 来定义 Node 之间的连接信息。

从上面我们可以看出，GraphDef定义的均是Operation。没有变量Variable，因为GraphDef中不保存任何 Variable 的信息，所以如果我们从 graph_def 来构建图并恢复训练的话，是不能成功的。但是在实际上 inference 中，通常就是使用 GraphDef。那么GraphDef中连Variable都没有，怎么存储weight呢？原来GraphDef 虽然不能保存 Variable，但可以保存 Constant 。通过 tf.constant 将 weight 直接存储在 NodeDef 里，tensorflow 1.3.0 版本也提供了一套叫做 freeze_graph 的工具来自动的将图中的 Variable 替换成 constant 存储在 GraphDef 里面，并将该图导出为 Proto。

5. Freeze graph

convert_variables_to_constants函数，会将计算图中的变量取值以常量的形式保存。在保存模型文件的时候，我们只是导出了GraphDef部分，GraphDef保存了从输入层到输出层的计算过程。在保存的时候，通过convert_variables_to_constants函数来指定保存的节点名称而不是张量的名称，“add:0”是张量的名称而"add"表示的是节点的名称。

程序实现

from tensorflow.python.framework import graph_util
import tensorflow as tf
def stats_graph(graph):
    flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
    params = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.trainable_variables_parameter())
    print('FLOPs: {};    Trainable params: {}'.format(flops.total_float_ops, params.total_parameters))
def load_pb(pb):
    with tf.gfile.GFile(pb, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph
with tf.Graph().as_default() as graph:
    # ***** (1) Create Graph *****
    X = tf.get_variable(initializer=tf.random_normal_initializer(dtype=tf.float32), shape=(25, 16), name='X')
    W = tf.get_variable(initializer=tf.random_normal_initializer(dtype=tf.float32), shape=(16, 9), name='W')
    b = tf.get_variable(initializer=tf.random_normal_initializer(dtype=tf.float32),shape=(9,),name="b")
    C = tf.matmul(X, W) + b
    print('stats before freezing')
    stats_graph(graph)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        # ***** (2) freeze graph *****
        #graph.as_graph_def()，导出当前计算图的GraphDef部分
        #将此图仅保存操作Operation，并将节点保存为常数
        output_graph = graph_util.convert_variables_to_constants(sess, graph.as_graph_def(), ['add'])
        #将计算图写入到模型文件中
        with tf.gfile.GFile('graph.pb', "wb") as f:
            f.write(output_graph.SerializeToString())
# ***** (3) Load frozen graph *****
graph = load_pb('./graph.pb')
print('stats after freezing')
stats_graph(graph)

结果

stats before freezing
FLOPs: 8531;    Trainable params: 553
stats after freezing
FLOPs: 7425;    Trainable params: 0

(3)tesorflow 计算模型复杂度

1. 计算模型复杂度的衡量

2 . 典型层的复杂性计算原理

2.1 全连接层的复杂性计算

2.2 卷积层的复杂性计算

3. 全连接Tensorflow实现

4. GraphDef

5. Freeze graph

热门文章

最新文章

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

(3)tesorflow 计算模型复杂度

1. 计算模型复杂度的衡量

2 . 典型层的复杂性计算原理

2.1 全连接层的复杂性计算

2.2 卷积层的复杂性计算

3. 全连接Tensorflow实现

4. GraphDef

5. Freeze graph

热门文章

最新文章

相关电子书

相关实验场景