Python 金融编程第二版（GPT 重译）（二）（2）-阿里云开发者社区

Python 金融编程第二版（GPT 重译）（二）（1）https://developer.aliyun.com/article/1559301

结构化数组

NumPy提供了除了常规数组之外，还提供了结构化（记录）数组，允许描述和处理类似表格的数据结构，每个（命名的）列具有各种不同的数据类型。它们将SQL表格类似的数据结构带到了Python中，大部分具备常规ndarray对象的优点（语法、方法、性能）。

代码的向量化

代码的矢量化是一种获得更紧凑代码并可能更快执行的策略。其基本思想是对复杂对象进行“一次性”操作或应用函数，而不是通过循环遍历对象的单个元素。在Python中，函数式编程工具，如map和filter，提供了一些基本的矢量化手段。然而，NumPy在其核心深处内置了矢量化。

基本矢量化

正如我们在上一节中学到的，简单的数学运算，如计算所有元素的总和，可以直接在ndarray对象上实现（通过方法或通用函数）。还可以进行更一般的矢量化操作。例如，我们可以按元素将两个NumPy数组相加如下：

In [117]: np.random.seed(100)
          r = np.arange(12).reshape((4, 3))  # ①
          s = np.arange(12).reshape((4, 3)) * 0.5  # ②
In [118]: r  # ①
Out[118]: array([[ 0,  1,  2],
                 [ 3,  4,  5],
                 [ 6,  7,  8],
                 [ 9, 10, 11]])
In [119]: s  # ②
Out[119]: array([[ 0. ,  0.5,  1. ],
                 [ 1.5,  2. ,  2.5],
                 [ 3. ,  3.5,  4. ],
                 [ 4.5,  5. ,  5.5]])
In [120]: r + s  # ③
Out[120]: array([[  0. ,   1.5,   3. ],
                 [  4.5,   6. ,   7.5],
                 [  9. ,  10.5,  12. ],
                 [ 13.5,  15. ,  16.5]])

①

具有随机数的第一个ndarray对象。

②

具有随机数的第二个ndarray对象。

③

逐元素加法作为矢量化操作（无循环）。

NumPy还支持所谓的广播。这允许在单个操作中组合不同形状的对象。我们之前已经使用过这个功能。考虑以下示例：

In [121]: r + 3  # ①
Out[121]: array([[ 3,  4,  5],
                 [ 6,  7,  8],
                 [ 9, 10, 11],
                 [12, 13, 14]])
In [122]: 2 * r  # ②
Out[122]: array([[ 0,  2,  4],
                 [ 6,  8, 10],
                 [12, 14, 16],
                 [18, 20, 22]])
In [123]: 2 * r + 3  # ③
Out[123]: array([[ 3,  5,  7],
                 [ 9, 11, 13],
                 [15, 17, 19],
                 [21, 23, 25]])

①

在标量加法期间，标量被广播并添加到每个元素。

②

在标量乘法期间，标量也广播并与每个元素相乘。

③

此线性变换结合了两个操作。

这些操作也适用于不同形状的ndarray对象，直到某个特定点为止：

In [124]: r
Out[124]: array([[ 0,  1,  2],
                 [ 3,  4,  5],
                 [ 6,  7,  8],
                 [ 9, 10, 11]])
In [125]: r.shape
Out[125]: (4, 3)
In [126]: s = np.arange(0, 12, 4)  # ①
          s  # ①
Out[126]: array([0, 4, 8])
In [127]: r + s  # ②
Out[127]: array([[ 0,  5, 10],
                 [ 3,  8, 13],
                 [ 6, 11, 16],
                 [ 9, 14, 19]])
In [128]: s = np.arange(0, 12, 3)  # ③
          s  # ③
Out[128]: array([0, 3, 6, 9])
In [129]: # r + s # ④
In [130]: r.transpose() + s  # ⑤
Out[130]: array([[ 0,  6, 12, 18],
                 [ 1,  7, 13, 19],
                 [ 2,  8, 14, 20]])
In [131]: sr = s.reshape(-1, 1)  # ⑥
          sr
Out[131]: array([[0],
                 [3],
                 [6],
                 [9]])
In [132]: sr.shape  # ⑥
Out[132]: (4, 1)
In [133]: r + s.reshape(-1, 1)  # ⑥
Out[133]: array([[ 0,  1,  2],
                 [ 6,  7,  8],
                 [12, 13, 14],
                 [18, 19, 20]])

①

长度为 3 的新一维ndarray对象。

②

r（矩阵）和s（向量）对象可以直接相加。

③

另一个长度为 4 的一维ndarray对象。

④

新s（向量）对象的长度现在与r对象的第二维长度不同。

⑤

再次转置r对象允许进行矢量化加法。

⑥

或者，s的形状可以更改为(4, 1)以使加法起作用（但结果不同）。

通常情况下，自定义的Python函数也适用于numpy.ndarray。如果实现允许，数组可以像int或float对象一样与函数一起使用。考虑以下函数：

In [134]: def f(x):
              return 3 * x + 5  # ①
In [135]: f(0.5)  # ②
Out[135]: 6.5
In [136]: f(r)  # ③
Out[136]: array([[ 5,  8, 11],
                 [14, 17, 20],
                 [23, 26, 29],
                 [32, 35, 38]])

①

实现对参数x进行线性变换的简单 Python 函数。

②

函数f应用于 Python 的float对象。

③

同一函数应用于ndarray对象，导致函数的向量化和逐个元素的评估。

NumPy所做的是简单地将函数f逐个元素地应用于对象。在这种意义上，通过使用这种操作，我们并不避免循环；我们只是在Python级别上避免了它们，并将循环委托给了NumPy。在NumPy级别上，对ndarray对象进行循环处理是由高度优化的代码来完成的，其中大部分代码都是用C编写的，因此通常比纯Python快得多。这解释了在基于数组的用例中使用NumPy带来性能优势的“秘密”。

内存布局

当我们首次使用np.zero初始化numpy.ndarray对象时，我们提供了一个可选参数用于内存布局。这个参数大致指定了数组的哪些元素会被连续地存储在内存中。当处理小数组时，这几乎不会对数组操作的性能产生任何可测量的影响。然而，当数组变大并且取决于要在其上实现的（财务）算法时，情况可能会有所不同。这就是内存布局发挥作用的时候（参见，例如多维数组的内存布局）。

要说明数组的内存布局在科学和金融中的潜在重要性，考虑以下构建多维ndarray对象的情况：

In [137]: x = np.random.standard_normal((1000000, 5))  # ①
In [138]: y = 2 * x + 3  # ②
In [139]: C = np.array((x, y), order='C')  # ③
In [140]: F = np.array((x, y), order='F')  # ④
In [141]: x = 0.0; y = 0.0  # ⑤
In [142]: C[:2].round(2)  # ⑥
Out[142]: array([[[-1.75,  0.34,  1.15, -0.25,  0.98],
                  [ 0.51,  0.22, -1.07, -0.19,  0.26],
                  [-0.46,  0.44, -0.58,  0.82,  0.67],
                  ...,
                  [-0.05,  0.14,  0.17,  0.33,  1.39],
                  [ 1.02,  0.3 , -1.23, -0.68, -0.87],
                  [ 0.83, -0.73,  1.03,  0.34, -0.46]],
                 [[-0.5 ,  3.69,  5.31,  2.5 ,  4.96],
                  [ 4.03,  3.44,  0.86,  2.62,  3.51],
                  [ 2.08,  3.87,  1.83,  4.63,  4.35],
                  ...,
                  [ 2.9 ,  3.28,  3.33,  3.67,  5.78],
                  [ 5.04,  3.6 ,  0.54,  1.65,  1.26],
                  [ 4.67,  1.54,  5.06,  3.69,  2.07]]])

①

一个在两个维度上具有较大不对称性的ndarray对象。

②

对原始对象数据进行线性变换。

③

这将创建一个二维ndarray对象，其顺序为C（行优先）。

④

这将创建一个二维ndarray对象，其顺序为F（列优先）。

⑤

内存被释放（取决于垃圾收集）。

⑥

从C对象中获取一些数字。

让我们看一些关于两种类型的ndarray对象的基本示例和用例，并考虑它们在不同内存布局下执行的速度：

In [143]: %timeit C.sum()  # ①
          4.65 ms ± 73.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [144]: %timeit F.sum()  # ①
          4.56 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [145]: %timeit C.sum(axis=0)  # ②
          20.9 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [146]: %timeit C.sum(axis=1)  # ③
          38.5 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [147]: %timeit F.sum(axis=0)  # ②
          87.5 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [148]: %timeit F.sum(axis=1)  # ③
          81.6 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [149]: F = 0.0; C = 0.0

①

计算所有元素的总和。

②

每行计算和（“许多”）。

③

计算每列的总和（“少”）。

我们可以总结性能结果如下：

当计算所有元素的总和时，内存布局实际上并不重要。
对 C-ordered ndarray 对象的求和在行和列上都更快（绝对速度优势）。
使用 C-ordered（行优先）ndarray 对象，对行求和相对比对列求和更快。
使用 F-ordered（列优先）ndarray 对象，对列求和相对比对行求和更快。

结论

NumPy 是 Python 中数值计算的首选包。ndarray 类是专门设计用于处理（大）数值数据的高效方便的类。强大的方法和 NumPy 的通用函数允许进行向量化的代码，大部分避免了在 Python 层上的慢循环。本章介绍的许多方法也适用于 pandas 及其 DataFrame 类（见第五章）

第五章：数据分析与 pandas

数据！数据！数据！没有数据，我无法制造砖头！

夏洛克·福尔摩斯

简介

本章讨论的是pandas，这是一个专注于表格数据的数据分析库。pandas在最近几年已经成为一个强大的工具，不仅提供了强大的类和功能，还很好地封装了来自其他软件包的现有功能。结果是一个用户界面，使得数据分析，特别是金融分析，成为一项便捷和高效的任务。

在pandas的核心和本章中的是DataFrame，一个有效处理表格形式数据的类，即以列为组织的数据。为此，DataFrame类提供了列标签以及对数据集的行（记录）进行灵活索引的能力，类似于关系数据库中的表或 Excel 电子表格。

本章涵盖了以下基本数据结构：

对象类型	意义	用途/模型为
`DataFrame`	带有索引的二维数据对象	表格数据以列组织
`Series`	带有索引的一维数据对象	单一（时间）数据系列

本章组织如下：

“DataFrame 类”

本章从使用简单且小的数据集探索pandas的DataFrame类的基本特征和能力开始；然后通过使用NumPy的ndarray对象并将其转换为DataFrame对象来进行处理。

“基本分析” 和 “基本可视化”

本章还展示了基本的分析和可视化能力，尽管后面的章节在这方面更深入。

“Series 类”

本节简要介绍了pandas的Series类，它在某种程度上代表了DataFrame类的一个特殊情况，只包含单列数据。

“GroupBy 操作”

DataFrame类的一大优势在于根据单个或多个列对数据进行分组。

“复杂选择”

使用（复杂）条件允许从DataFrame对象中轻松选择数据。

“串联、连接和合并”

将不同数据集合并为一个是数据分析中的重要操作。pandas提供了多种选项来完成这样的任务。

“性能方面”

与 Python 一般一样，pandas在一般情况下提供了多种选项来完成相同的目标。本节简要讨论潜在的性能差异。

DataFrame 类

本节涵盖了DataFrame类的一些基本方面。这个类非常复杂和强大，这里只能展示其中一小部分功能。后续章节提供更多例子并揭示不同的方面。

使用 DataFrame 类的第一步

从相当基本的角度来看，DataFrame类被设计用来管理带索引和标签的数据，与SQL数据库表或电子表格应用程序中的工作表并没有太大的不同。考虑以下创建DataFrame对象的示例：

In [1]: import pandas as pd  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
In [2]: df = pd.DataFrame([10, 20, 30, 40],  ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/2.png)
                          columns=['numbers'],  ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/3.png)
                          index=['a', 'b', 'c', 'd'])  ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/4.png)
In [3]: df  ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/5.png)
Out[3]:    numbers
        a       10
        b       20
        c       30
        d       40

导入pandas。

将数据定义为list对象。

指定列标签。

指定索引值/标签。

显示DataFrame对象的数据以及列和索引标签。

这个简单的例子已经展示了当涉及到存储数据时DataFrame类的一些主要特性：

数据

数据本身可以以不同的形状和类型提供（list、tuple、ndarray和dict对象都是候选对象）。

标签

数据以列的形式组织，可以具有自定义名称。

索引

存在可以采用不同格式（例如，数字、字符串、时间信息）的索引。

与此类DataFrame对象一起工作通常非常方便和高效，例如，与常规的ndarray对象相比，当您想要像扩大现有对象一样时，后者更为专业和受限。以下是展示在DataFrame对象上进行典型操作的简单示例：

In [4]: df.index  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
Out[4]: Index(['a', 'b', 'c', 'd'], dtype='object')
In [5]: df.columns  ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/2.png)
Out[5]: Index(['numbers'], dtype='object')
In [6]: df.loc['c']  ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/3.png)
Out[6]: numbers    30
        Name: c, dtype: int64
In [7]: df.loc[['a', 'd']]  ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/4.png)
Out[7]:    numbers
        a       10
        d       40
In [8]: df.iloc[1:3]  ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/5.png)
Out[8]:    numbers
        b       20
        c       30
In [9]: df.sum()  ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/6.png)
Out[9]: numbers    100
        dtype: int64
In [10]: df.apply(lambda x: x ** 2)  ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/7.png)
Out[10]:    numbers
         a      100
         b      400
         c      900
         d     1600
In [11]: df ** 2  ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/8.png)
Out[11]:    numbers
         a      100
         b      400
         c      900
         d     1600

index属性和Index对象。

columns属性和Index对象。

选择与索引c对应的值。

选择与索引a和d对应的两个值。

通过索引位置选择第二行和第三行。

计算单列的总和。

使用apply()方法以向量化方式计算平方。

直接应用向量化，就像使用ndarray对象一样。

与NumPy的ndarray对象相反，可以在两个维度上扩大DataFrame对象：

In [12]: df['floats'] = (1.5, 2.5, 3.5, 4.5)  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
In [13]: df
Out[13]:    numbers  floats
         a       10     1.5
         b       20     2.5
         c       30     3.5
         d       40     4.5
In [14]: df['floats']  ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/2.png)
Out[14]: a    1.5
         b    2.5
         c    3.5
         d    4.5
         Name: floats, dtype: float64

使用提供的float对象作为tuple对象添加新列。

选择此列并显示其数据和索引标签。

整个DataFrame对象也可以用来定义新列。在这种情况下，索引会自动对齐：

In [15]: df['names'] = pd.DataFrame(['Yves', 'Sandra', 'Lilli', 'Henry'],
                                    index=['d', 'a', 'b', 'c'])  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
In [16]: df
Out[16]:    numbers  floats   names
         a       10     1.5  Sandra
         b       20     2.5   Lilli
         c       30     3.5   Henry
         d       40     4.5    Yve

基于DataFrame对象创建另一个新列。

数据附加工作方式类似。但是，在以下示例中，我们看到通常应避免的副作用——索引被简单的范围索引替换：

In [17]: df.append({'numbers': 100, 'floats': 5.75, 'names': 'Jil'},
                        ignore_index=True)  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
Out[17]:    numbers  floats   names
         0       10    1.50  Sandra
         1       20    2.50   Lilli
         2       30    3.50   Henry
         3       40    4.50    Yves
         4      100    5.75     Jil
In [18]: df = df.append(pd.DataFrame({'numbers': 100, 'floats': 5.75,
                                      'names': 'Jil'}, index=['y',]))  ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/2.png)
In [19]: df
Out[19]:    floats   names  numbers
         a    1.50  Sandra       10
         b    2.50   Lilli       20
         c    3.50   Henry       30
         d    4.50    Yves       40
         y    5.75     Jil      100
In [20]: df = df.append(pd.DataFrame({'names': 'Liz'}, index=['z',]))  ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/3.png)
In [21]: df
Out[21]:    floats   names  numbers
         a    1.50  Sandra     10.0
         b    2.50   Lilli     20.0
         c    3.50   Henry     30.0
         d    4.50    Yves     40.0
         y    5.75     Jil    100.0
         z     NaN     Liz      NaN
In [22]: df.dtypes  ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/4.png)
Out[22]: floats     float64
         names       object
         numbers    float64
         dtype: object

通过dict对象添加新行；这是一个临时操作，在此期间索引信息会丢失。

这基于具有索引信息的DataFrame对象附加行；原始索引信息被保留。

这将不完整的数据行附加到DataFrame对象中，导致NaN值。

单列的不同dtypes；这类似于带有NumPy的记录数组。

尽管现在存在缺失值，但大多数方法调用仍将起作用。例如：

In [23]: df[['numbers', 'floats']].mean()  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
Out[23]: numbers    40.00
         floats      3.55
         dtype: float64
In [24]: df[['numbers', 'floats']].std()  ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/2.png)
Out[24]: numbers    35.355339
         floats      1.662077
         dtype: float6

对指定的两列求平均值（忽略具有NaN值的行）。

对指定的两列计算标准差（忽略具有NaN值的行）。

DataFrame 类的第二步

本小节中的示例基于具有标准正态分布随机数的ndarray对象。它探索了进一步的功能，如使用DatetimeIndex来管理时间序列数据。

In [25]: import numpy as np
In [26]: np.random.seed(100)
In [27]: a = np.random.standard_normal((9, 4))
In [28]: a
Out[28]: array([[-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604],
                [ 0.98132079,  0.51421884,  0.22117967, -1.07004333],
                [-0.18949583,  0.25500144, -0.45802699,  0.43516349],
                [-0.58359505,  0.81684707,  0.67272081, -0.10441114],
                [-0.53128038,  1.02973269, -0.43813562, -1.11831825],
                [ 1.61898166,  1.54160517, -0.25187914, -0.84243574],
                [ 0.18451869,  0.9370822 ,  0.73100034,  1.36155613],
                [-0.32623806,  0.05567601,  0.22239961, -1.443217  ],
                [-0.75635231,  0.81645401,  0.75044476, -0.45594693]])

尽管可以更直接地构造DataFrame对象（如前所示），但通常使用ndarray对象是一个很好的选择，因为pandas将保留基本结构，并且“只”会添加元信息（例如，索引值）。它还代表了金融应用和一般科学研究的典型用例。例如：

In [29]: df = pd.DataFrame(a)  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
In [30]: df
Out[30]:           0         1         2         3
         0 -1.749765  0.342680  1.153036 -0.252436
         1  0.981321  0.514219  0.221180 -1.070043
         2 -0.189496  0.255001 -0.458027  0.435163
         3 -0.583595  0.816847  0.672721 -0.104411
         4 -0.531280  1.029733 -0.438136 -1.118318
         5  1.618982  1.541605 -0.251879 -0.842436
         6  0.184519  0.937082  0.731000  1.361556
         7 -0.326238  0.055676  0.222400 -1.443217
         8 -0.756352  0.816454  0.750445 -0.455947

从ndarray对象创建DataFrame对象。

表 5-1 列出了DataFrame函数接受的参数。在表中，“array-like”意味着类似于ndarray对象的数据结构，例如list。Index是pandas Index类的一个实例。

表 5-1. DataFrame 函数的参数

参数	格式	描述
`data`	`ndarray`/`dict`/`DataFrame`	`DataFrame`的数据；`dict`可以包含`Series`，`ndarray`，`list`等
`index`	`Index`/array-like	要使用的索引；默认为`range(n)`
`columns`	`Index`/array-like	要使用的列标题；默认为`range(n)`
`dtype`	`dtype`，默认为`None`	要使用/强制的数据类型；否则，它会被推断
`copy`	`bool`，默认为`None`	从输入复制数据

与结构化数组一样，正如我们已经看到的那样，DataFrame对象具有可以直接通过分配具有正确数量元素的list来定义的列名。这说明您可以在需要时定义/更改DataFrame对象的属性：

In [31]: df.columns = ['No1', 'No2', 'No3', 'No4']  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
In [32]: df
Out[32]:         No1       No2       No3       No4
         0 -1.749765  0.342680  1.153036 -0.252436
         1  0.981321  0.514219  0.221180 -1.070043
         2 -0.189496  0.255001 -0.458027  0.435163
         3 -0.583595  0.816847  0.672721 -0.104411
         4 -0.531280  1.029733 -0.438136 -1.118318
         5  1.618982  1.541605 -0.251879 -0.842436
         6  0.184519  0.937082  0.731000  1.361556
         7 -0.326238  0.055676  0.222400 -1.443217
         8 -0.756352  0.816454  0.750445 -0.455947
In [33]: df['No2'].mean()  ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/2.png)
Out[33]: 0.70103309414564585

通过list对象指定列标签。

现在选择列变得很容易。

要高效处理金融时间序列数据，必须能够处理时间索引。这也可以被视为pandas的一项重要优势。例如，假设我们的四个列中的九个数据条目对应于从 2019 年 1 月开始的每月末数据。然后，可以使用date_range()函数生成DatetimeIndex对象，如下所示：

In [34]: dates = pd.date_range('2019-1-1', periods=9, freq='M')  ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-fin-2e/img/1.png)
In [35]: dates
Out[35]: DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
                        '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
                        '2019-09-30'],
                       dtype='datetime64[ns]', freq='M'

创建一个DatetimeIndex对象。

表 5-2 列出了date_range函数的参数。

表 5-2。date_range函数的参数

参数	格式	描述
`start`	`string`/`datetime`	生成日期的左边界
`end`	`string`/`datetime`	生成日期的右边界
`periods`	`integer`/`None`	期数（如果`start`或`end`为`None`）
`freq`	`string`/`DateOffset`	频率字符串，例如，`5D`代表 5 天
`tz`	`string`/`None`	本地化索引的时区名称
`normalize`	`bool`，默认为`None`	规范化`start`和`end`为午夜
`name`	`string`，默认为`None`	结果索引的名称

以下代码将刚刚创建的DatetimeIndex对象定义为相关的索引对象，从而使原始数据集生成时间序列：

In [36]: df.index = dates
In [37]: df
Out[37]:                  No1       No2       No3       No4
         2019-01-31 -1.749765  0.342680  1.153036 -0.252436
         2019-02-28  0.981321  0.514219  0.221180 -1.070043
         2019-03-31 -0.189496  0.255001 -0.458027  0.435163
         2019-04-30 -0.583595  0.816847  0.672721 -0.104411
         2019-05-31 -0.531280  1.029733 -0.438136 -1.118318
         2019-06-30  1.618982  1.541605 -0.251879 -0.842436
         2019-07-31  0.184519  0.937082  0.731000  1.361556
         2019-08-31 -0.326238  0.055676  0.222400 -1.443217
         2019-09-30 -0.756352  0.816454  0.750445 -0.455947

在使用date_range函数生成DatetimeIndex对象时，频率参数freq有多种选择。表 5-3 列出了所有选项。

表 5-3。date_range函数的频率参数值

别名	描述
`B`	工作日频率
`C`	自定义工作日频率（实验性的）
`D`	日历日频率
`W`	周频率
`M`	月度末频率
`BM`	工作月末频率
`MS`	月初频率
`BMS`	工作月初频率
`Q`	季度末频率
`BQ`	工作季度末频率
`QS`	季度初频率
`BQS`	工作季度初频率
`A`	年度末频率
`BA`	工作年度末频率
`AS`	年度初频率
`BAS`	工作年度初频率
`H`	每小时频率
`T`	分钟频率
`S`	每秒频率
`L`	毫秒
`U`	微秒

在某些情况下，以ndarray对象的形式访问原始数据集是值得的。例如，values属性直接提供了对它的访问。

In [38]: df.values
Out[38]: array([[-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604],
                [ 0.98132079,  0.51421884,  0.22117967, -1.07004333],
                [-0.18949583,  0.25500144, -0.45802699,  0.43516349],
                [-0.58359505,  0.81684707,  0.67272081, -0.10441114],
                [-0.53128038,  1.02973269, -0.43813562, -1.11831825],
                [ 1.61898166,  1.54160517, -0.25187914, -0.84243574],
                [ 0.18451869,  0.9370822 ,  0.73100034,  1.36155613],
                [-0.32623806,  0.05567601,  0.22239961, -1.443217  ],
                [-0.75635231,  0.81645401,  0.75044476, -0.45594693]])
In [39]: np.array(df)
Out[39]: array([[-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604],
                [ 0.98132079,  0.51421884,  0.22117967, -1.07004333],
                [-0.18949583,  0.25500144, -0.45802699,  0.43516349],
                [-0.58359505,  0.81684707,  0.67272081, -0.10441114],
                [-0.53128038,  1.02973269, -0.43813562, -1.11831825],
                [ 1.61898166,  1.54160517, -0.25187914, -0.84243574],
                [ 0.18451869,  0.9370822 ,  0.73100034,  1.36155613],
                [-0.32623806,  0.05567601,  0.22239961, -1.443217  ],
                [-0.75635231,  0.81645401,  0.75044476, -0.45594693]])

Python 金融编程第二版（GPT 重译）（二）（3）https://developer.aliyun.com/article/1559306

Python 金融编程第二版（GPT 重译）（二）（2）

结构化数组

代码的向量化

基本矢量化

内存布局

结论

更多资源

第五章：数据分析与 pandas

简介

DataFrame 类

使用 DataFrame 类的第一步

DataFrame 类的第二步

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Python 金融编程第二版（GPT 重译）（二）（2）

结构化数组

代码的向量化

基本矢量化

内存布局

结论

更多资源

第五章：数据分析与 pandas

简介

DataFrame 类

使用 DataFrame 类的第一步

DataFrame 类的第二步

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像