【霍罗维兹数据结构】数据抽象化 | 时间复杂度 | 性能分析与性能度量-阿里云开发者社区

Ⅰ. 数据抽象化 - DATA ABSTRACTION

0x00 C语言基本数据类型

char, int, float, double, short, long, unsigned...

0x01 将数据分组的机制

数组和结构体

int arr[10];
struct student {
    char name[10];
    int id;
    char grade;
};

0x02 指针数据类型

数据类型都可以用指针指代：

pointer-to-an-int, pointer-to-a-real, pointer-to-a-char, and pointer-to-a-float.

int* i, *pi

预定义数据类型 / 用户定义的数据类型 predefined data types / user-defined data types

Ⅱ. 什么是数据结构 - What is a data type

0x00 定义

"数据结构是一种具有一定逻辑关系，在计算机中应用某种存储结构，并且封装了相应操作的数据元素的集合。"

A data type is a collection of objects and a set of operations that act on those objects.

0x01 对象的规格

💬 比如 int 类型：

{0, +1, -1, +2, -2, ... , INT_MAX, INT_MIN}

0x02 ADT的定义

抽象数据类型（AbstractDataType，ADT）是计算机科学中具有类似行为的特定类别的数据结构的数学模型；或者具有类似语义的一种或多种程序设计语言的数据类型。抽象数据类型是间接定义的，通过其上的可执行的操作以及这些操作的效果的数学约束（与可能的代价）。

一个抽象的数据类型是独立于实现的，操作的规范包括操作的名称、参数的类型和结果的类型。

同时，说明该函式是做什么的，而不求助于内部的代表性的细节。

package in Ada

class in C++

0x03 对数据类型的操作进行分类

Creator/constructor

Transformers

Observers/reporters

0x04 例子：抽象数据类型 Natural_Number

ADT Natural_Number 是：

① 对象：一个有序的整数子范围，从零开始，到计算机上的最大整数（INT_MAX）结束。

② 功能：对于所有的 x,y Nat_Number, True, False, Boolean。

Nat_No Zero()               ::= 0
Boolean Is_Zero(x)          ::= if (x) return FALSE
 else return TRUE
Nat_No Add(x, y)            ::= if ((x+y)<= INT_MAX) return x+y
 else return INT_MAX
Boolean Equal(x, y)         ::= if (x==y) return TRUE
 else return FALSE
Nat_No Successor(x)         ::= if (x == INT_MAX) return x
 else return x+1
Nat_No Subtract(x,y)        ::= if (x<y) return 0
 else return x-y

end Natural_Number

0x05 性能分析 - PERFORMANCE ANALYSIS

判断一个项目的标准 - Criteria of judging a program:

① 程序是否符合任务的原始规范？

② 它是否能正常工作？

③ 程序是否有很好的文档记录？

④ 程序是否有效地使用函数来创建逻辑单元？

⑤ 该程序的代码是否可读？

绩效评估 - Performance Evaluation

程序是否有效地使用主存储和辅助存储？

对此任务来说该程序的运行时间是否可以接受？

性能分析 - Performance Analysis

estimates of time and space that are machine independent.

对时间和空间的估计，是与机器无关的。

绩效衡量 - Performance Measurement

obtaining machine-dependent running times.

used to identify inefficient code segments.

① 获得与机器有关的运行时间

② 用来识别低效的代码段。

定义 - Definition

The space complexity of a program is the amount of memory that it needs to run to completion. The time complexity of a program is the amount of computer time that it needs to run to completion.

① 一个程序的空间复杂度是指其运行到完成时所需的内存量

② 一个程序的时间复杂度是指其运行到完成所需的时间。

Ⅲ. 时间复杂度 - Time Complexity

0x00 固定空间要求 - Fixed space requirements

independent from the number and size of the program's inputs and outputs, e.g., the instruction space, space for simple variables, fixed-size structured variables, and constants.

独立于程序的输入和输出的数量和大小，

例如：指令空间、简单变量空间、固定大小的结构化变量和常量。

0x01 可变空间要求 - Variable space requirements

space needed by structured variables whose size depends on the particular instance, I, of the problem being solved.

结构化变量所需要的时间，其大小取决于所解决的问题的特性实例

0x02 例1.6：simple arithmetic function

[Program 1.9]

float abc(float a, float b, float c) {
    return a + b + b*c + (a+b-c) / (a+b) + 4.00;
}

0x03 例1.7：adding a list of numbers iteratively

[Program 1.10]

float sum(float list[], int n) {
    float tempsum = 0;
    int i;
    for (i=0; i<n; i++)
        tempsum += list[i];
    return tempsum;
}

（如果参数是按值传递的）

（如果是通过引用传递的）

0x04 adding a list of numbers recursively

[Program 1.11]

float rsum(float list[], int n) {
    if (n) return rsum(list, n-1) + list[n-1];
    return 0;
}

(一个递归调用所需时间)

时间复杂度 - Time Complexity

① 编译时间（Compile Time)

② 运行时间（Execution (Running) Time）

实际上，我们只关注程序的运行时间。

0x00 确定执行时间

执行每项操作所需的时间

对给定实例进行的每个操作的数量（取决于编译器）

计算程序详细地运行时间好像没有什么意义，

计算程序的操作的数量也不会给我们带来什么实质性的东西。

0x01 定义 - Definition

"A program step is a syntactically or semantically meaningful program segment whose execution time is independent of the instance characteristics. "

一个程序步骤是一个语法上或语义上有意义的程序段，其执行时间于特例特征无关。

cnt 计数器：

通过创建一个全局变量 count，count++ 就可以确定一个程序或一个函数解决某一个特定问题所需的步骤数。

0x02 [Example 1.9] [Iterative summing of a list of numbers]

float sum(float list[], int n) {
    float tempsum = 0; count++; /*for assignment*/
    int i;
    for (i=0; i<n; i++) {
        count++; /*for the for loop */
        tempsum += list[i]; count++; /*for assignment*/
    }
    count++; /* last execution of for */
    count++; /* for return */
    return tempsum;
}

float sum(float list[], int n) {
    float tempsum = 0;
    int i;
    for (i=0; i<n; i++)
        count += 2;   /* for the for loop */
    count += 3;
    return 0;
}

如果 count 的初始值是 0，其最终结果将是 2n + 3 。

0x03 [Example 1.10] [Recursive summing of a list of numbers]

[Program 1.14]

float rsum(float list[], int n) {
    count++; /* for if conditional */
    if (n) {
        count++; /* for return and rsum invocation */
        return rsum(list, n-1) + list[n-1];
    }
    count++;
    return list[0];
}

count 结果为 2n + 2 。

0x04 [Example 1.11] : [Matrix addition]

void add ( 
    int a[][MAX_SIZE], 
    int b[][MAX_SIZE], 
    int c[][MAX_SIZE], 
    int rows, 
    int cols
    )
{
    int i, j;
    for (i=0; i<rows; i++)
        for (j=0; j<cols; j++)
    c[i][j] = a[i][j] + b[i][j];
}

void add (
    int a[][MAX_SIZE], 
    int b[][MAX_SIZE], 
    int c[][MAX_SIZE], 
    int rows, 
    int cols
    )
{
    int i, j;
    for (i=0; i<rows; i++) {
        count++; /* for i for loop */
        for (j=0; j<cols; j++) {
            count++; /* for j for loop */
            c[i][j] = a[i][j] + b[i][j];
            count++; /* for assignment statement */
        }
        count++; /* last time of j for loop */
    }
    count++; /* last time of i for loop */
}

void add (
    int a[][MAX_SIZE], 
    int b[][MAX_SIZE],
    int c[][MAX_SIZE], 
    int rows, 
    int cols
    )
{
    int i, j;
    for (i=0; i<rows; i++) {
        for (j=0; j<cols; j++)
            count += 2;
        count += 2;
    }
    count++;
}

count 结果为

表格表示：步骤 / 执行

[Example 1.13]

[Example 1.14]

0x05 摘要

一个程序的时间复杂度是由步骤的数量决定的，程序在计算它所编写的函数时所需要的时间。

步骤的数量本身是实例的一个函数特点，例如输入的数量，输出的数量，输入和输出的大小等等。

在确定一个程序的步数之前，我们需要确切的知道问题的哪些特征要使用。

对于许多程序来说，时间的复杂性并不完全取决于特性的特征上。

对于相同大小的不同输入，步数也是不同的：

最好情况、平均情况、最差情况。

例子：Binary Search、Insertion Sort

渐进符号 - Asymptotic Notation (Ο, Ω, Θ)

0x00 渐进符号（）

是我们确定步数的 "动机"

比较两个程序的时间复杂度功能，以及预测运行时间随着实力特征的变化而增长。

确定切确的步数（无论是最好情况、平均情况还是最差情况）

证明一个程序的成功与否是一项极其困难的任务。

花费巨大的精力来准确确定步数并不是值得，因为步骤的概念本身就是不准确的。

由于步数代表的意义不确切，对于比较而言，计算出准确的步数并没有什么卵用。

对于大多数情况，步数计算可以表示为作为实例特征的一个函数，例如：

或

如果两个步数的差值很大怎么办？例如：与

如果两个步骤的计数顺序不同呢？例如：与

0x01 盈亏平衡点（break even point）

准确的盈亏平衡点不能用分析法确定。

这些程序必须在计算机上跑，才能确定盈亏平衡点。

0x02 一些术语

定义： [ Big "oh" ] 大O渐进表示法

如果存在正的常数和，比如，对于所有的

为了使声明具有信息量，应当是一个尽可能小的的函数，于。

定理 1.3：

If then

证明：

定义：[Omega]

如果存在正的常数并使为所有

为了使声明，使其具有信息量，应该尽可能地和函数一样大，使为真。

定理 1.3：

If and ， then

定义：[Theta]

如果存在正的常数，和使得，适用于所有。

定理 1.4：

If and ， then

例子：Complexity of matrix addition

例子：二分查找

数组中的元素数量，while 循环每次迭代需要时间。

while 循环最多被迭代次。

最坏结果：循环被迭代次

最好结果：

例子：Magic square

幻方是一个由 1 到的整数组成的矩阵，使得每一行和每一列以及两条主对角线的总和是相同的，当时，共同的和是 65 。

#include <stdio.h>
#define MAX_SIZE 15 /* maximum size of square */
void main(void)
/* construct a magic square, iteratively */
{
    static int square[MAX_SIZE] [MAX_SIZE];
    int i, j, row, column; /* indices */
    int count; /* counter */
    int size; /* Square size */
    printf ("Enter the size of the square: ");
    scanf("%d", &size);
    /* check for input errors */
    if (size<1 || size>MAX_SIZE+1) {
        fprintf(stderr, "Error! Size is out of range\n");
        exit(1);
    }
    if (!(size % 2)) {
        fprintf(stderr, "Error! Size is even");
        exit(1);
    }
    for (i=0; i<size; i++)
        for (j=0; j<size; j++)
            square[i][j] = 0;
    square[0][(size-1)/2] = 1; /* middle of first row */
    /* i and j are current position */
    i = 0;
    j = (size-1) / 2;
    for (count = 2; count <= size * size; count++) {
        row = (i-1 < 0) ? (size-1) : (i-1); /* up */
        column = (j-1 < 0) ? (size-1) : (j-1); /* left */
        if (square[row][column]) /* down */
            i = (++i) % size;
        else { /* square is unoccupied */
            i = row;
            j = column;
        }
        square[i][j] = count;
    }
    /* output the magic square */
    printf("Magic Square of the size %d : \n\n", size);
    for (i = 0; i < size; i++) {
        for (j = 0; j < size; j++)
            printf ("%5d", square[i][j]);
        printf("\n");
    }
    printf("\n\n");
}

Ⅳ. 实际复杂性 - Practical Complexities

0x00 例子

一个程序的时间复杂度通常是以下一些函数实例的特征。

这种复杂的功能：

① 在确定时间要求如何变化方面非常有用，随着实例特征的变化。

② 也可以用于比较执行相同任务的两个程序 P 和 Q。

我们假设 程序P 的复杂度为并且 程序Q 的复杂度为。

我们就可以的断言 —— 对于足够大的，程序P 比 程序Q 快。

0x01 常见的复杂度表

0x02 大O对比图

0x03 每秒10亿条指令在计算机上的时间

Ⅴ. 性能度量 - PERFORMANCE MEASUREMENT

0x00 如何衡量真正的执行时间

使用 C 的标准库

函数是通过语句访问的：

#include <time.h>

对于小数据可能会产生不准确的结果（例如我们计算机上 CLK_TCK 的值是18，那么 n < 500 的时钟跳动次数就小于10）。

0x01 生成测试数据

生成一个数据集，结果是最坏的情况一个项目的表现并不总是容易的。

我们可以生成适当数量的随机测试数据。

获得平均案例数据通常要难得多。

最好是分析被测试的算法，以确定应该为实验产生的数据类别--算法的具体任务。

参考资料：

Fundamentals of Data Structures in C

本章完。

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【霍罗维兹数据结构】数据抽象化 | 时间复杂度 | 性能分析与性能度量