NumPy

简介: 参见维基百科NumPyNumPyType: moduleProvidesAn array object of arbitrary homogeneous itemsFast mathematical operations over arraysLinear Algebra, Four...

参见维基百科NumPy

NumPy

Type: module


Provides

  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation

How to use the documentation


Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
the NumPy homepage http://www.scipy.org_.

We recommend exploring the docstrings using
IPython http://ipython.scipy.org_, an advanced Python shell with
TAB-completion and introspection capabilities.

For some objects, np.info(obj) may provide additional help(用来获取函数,类,模块的一些相关信息). This is
particularly true if you see the line "Help on ufunc object:" at the top
of the help() page. Ufuncs are implemented in C, not Python, for speed.
The native Python help() does not know how to view their help, but our
np.info() function does.

To search for documents containing a keyword, do::

import numpy as np
np.lookfor('keyword')

General-purpose documents like a glossary and help on the basic concepts
of numpy are available under the doc sub-module::

from numpy import doc
help(doc)
Available subpackages

---------------------
doc
    Topical documentation on broadcasting, indexing, etc.
lib
    Basic functions used by several sub-packages.
random
    Core Random Tools
linalg
    Core Linear Algebra Tools
fft
    Core FFT routines
polynomial
    Polynomial tools
testing
    NumPy testing tools
f2py
    Fortran to Python Interface Generator.
distutils
    Enhancements to distutils with support for
    Fortran compilers support and more.

Utilities

---------
test
    Run numpy unittests
show_config
    Show numpy build configuration
dual
    Overwrite certain functions with high-performance Scipy tools
matlib
    Make everything matrices.
__version__
    NumPy version string

下面举几个例子:

import numpy as np
help(doc)   

help(doc.creation)

doc.basics?

help(np.lib)

ndarray预览

翻译自Quickstart tutorial¶
NumPy的主要的对象是同类的多维数组(homogeneous multidimensional array)。 NumPy的维度(dimensions)被称为轴(axes)。 轴的数字代表rank

例如,在三维空间中一个坐标(coordinates)为[1, 2, 1]的点是一维数组,axis的长度(length)是3。而

[[ 1., 0., 0.],
 [ 0., 1., 2.]]

的rank是 2 (此数组是2-dimensional)。它的第一个维度(dimension (axis) )的长度是 2, 第二个维度长度是3。

NumPy的array类被称为ndarray

  • ndarray.ndim: 数组的坐标轴(或轴或维度)(axes (dimensions))的个数。
  • ndarray.shape: 数组的维度(dimensions),是由每个维度的length组成的整数元组。
    对于一个n行m列的矩阵(matrix), shape便是(n,m)
  • ndarray.size: 数组的元素(elements)的总数,等于shape的元素的积。
  • ndarray.dtype:一个描述数组的元素的类型的对象。
  • ndarray.itemsize:数组的每个元素的二进制表示的大小。 例如,元素的类型为float64的数组有 8 (=64/8)个itemsize,类型为complex32itemsize 4 (=32/8)
  • ndarray.data:the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

下面有一些示例:

z = np.array([[ 0,  1,  2,  3,  4],
              [ 5,  6,  7,  8,  9],
              [10, 11, 12, 13, 14]])
t = np.array([z, 2 * z + 1])
t
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[ 1,  3,  5,  7,  9],
        [11, 13, 15, 17, 19],
        [21, 23, 25, 27, 29]]])
print('z.ndim = ', z.ndim)
print('t.ndim = ', t.ndim)
z.ndim =  2
t.ndim =  3
print('z.shape = ',z.shape)
print('t.shape = ',t.shape)
z.shape =  (3, 5)
t.shape =  (2, 3, 5)
print('z.size = ',z.size)
print('t.size = ',t.size)
z.size =  15
t.size =  30
t.dtype.name
'int32'
t.itemsize
4
type(t)
numpy.ndarray

ndarray索引

z
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
z[0]  # 第一行元素
array([0, 1, 2, 3, 4])
z[0, 2] # 第一行的第三个元素
2
t[0]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
t[0][2]
array([10, 11, 12, 13, 14])
t[0, 2]
array([10, 11, 12, 13, 14])
t[0, 2, 3]
13
t[0, :2, 2:4]
array([[2, 3],
       [7, 8]])

对于列表

e = [1, 2, 3, 4]
p = [e, e]
p[0][0]
1
p[0,0]  # 这种语法是错误的
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-300-d527d1725556> in <module>()
----> 1 p[0,0]  # 这种语法是错误的


TypeError: list indices must be integers or slices, not tuple

ndarray支持向量化运算

作用于每个元素的运算

z
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
z.sum()  # 所有元素的sum
105
z.sum(axis = 0)    # sum along axis 0, i.e. column-wise sum,相当于矩阵的行向量
array([15, 18, 21, 24, 27])
z.sum(axis = 1)   # 相当于矩阵的列向量
array([10, 35, 60])
z.std()  # 所有元素标准差
4.3204937989385739
z.std(axis = 0)
array([ 4.0824829,  4.0824829,  4.0824829,  4.0824829,  4.0824829])
z.cumsum()  # 所有元素的累积和
array([  0,   1,   3,   6,  10,  15,  21,  28,  36,  45,  55,  66,  78,
        91, 105], dtype=int32)
z * 2   # 类似矩阵的数量乘法
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])
z ** 2  
array([[  0,   1,   4,   9,  16],
       [ 25,  36,  49,  64,  81],
       [100, 121, 144, 169, 196]], dtype=int32)
np.sqrt(z)
array([[ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ],
       [ 2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ],
       [ 3.16227766,  3.31662479,  3.46410162,  3.60555128,  3.74165739]])
y = np.arange(10)  # 类似 Python 的 range, 但是回传 array
y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array([1, 2, 3, 6])
b = np.linspace(0, 2, 4)  # 建立一個array, 在0与2的范围之间4等分
c = a - b
c
array([ 1.        ,  1.33333333,  1.66666667,  4.        ])
# 全域方法
a = np.linspace(-np.pi, np.pi, 100) 
b = np.sin(a)
c = np.cos(a)
b = np.array([1,2,3,4])
a = np.array([4,5,6,7])
print('a + b = ', a + b)
print('a - b = ', a - b)
print('a * b = ', a * b)
print('a / b = ', a / b)
print('a // b = ', a // b)
print('a % b = ', a % b)
a + b =  [ 5  7  9 11]
a - b =  [3 3 3 3]
a * b =  [ 4 10 18 28]
a / b =  [ 4.    2.5   2.    1.75]
a // b =  [4 2 2 1]
a % b =  [0 1 0 3]

对于非数值型数组

a = np.array(list('python'))
a
array(['p', 'y', 't', 'h', 'o', 'n'],
      dtype='<U1')
b = np.array(list('numpy'))
b
array(['n', 'u', 'm', 'p', 'y'],
      dtype='<U1')
a + b
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-153-f96fb8f649b6> in <module>()
----> 1 a + b


TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
list(a) + list(b)
['p', 'y', 't', 'h', 'o', 'n', 'n', 'u', 'm', 'p', 'y']

线性代数

from numpy.random import rand
from numpy.linalg import solve, inv
a = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])
a.transpose()
array([[ 1. ,  3. ,  5. ],
       [ 2. ,  4. ,  9. ],
       [ 3. ,  6.7,  5. ]])
inv(a)
array([[-2.27683616,  0.96045198,  0.07909605],
       [ 1.04519774, -0.56497175,  0.1299435 ],
       [ 0.39548023,  0.05649718, -0.11299435]])
b =  np.array([3, 2, 1])
solve(a, b)  # 解方程式 ax = b
array([-4.83050847,  2.13559322,  1.18644068])
c = rand(3, 3)  # 建立一個 3x3 随机矩阵
c
array([[ 0.98539238,  0.62602057,  0.63592577],
       [ 0.84697864,  0.86223698,  0.20982139],
       [ 0.15532627,  0.53992238,  0.65312854]])
np.dot(a, c)  # 矩阵相乘
array([[  3.14532847,   3.97026167,   3.01495417],
       [  7.38477771,   8.94448958,   7.1230241 ],
       [ 13.32640097,  13.58984759,   8.33366406]])

数组的创建

参考 np.doc.creation?
There are 5 general mechanisms for creating arrays:

  1. Conversion from other Python structures (e.g., lists, tuples)
  2. Intrinsic numpy array array creation objects (e.g., arange, ones, zeros,
    etc.)
  3. Reading arrays from disk, either from standard or custom formats
  4. Creating arrays from raw bytes through the use of strings or buffers
  5. Use of special library functions (e.g., random)
import numpy as np
x = np.array([2,3,1,0])
x1 = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, and types
x2 = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]])

y = np.zeros((2, 3))
y1 = np.ones((2,3))
y2 = np.arange(10)
y3 = np.arange(2, 10, dtype=np.float)
y4 = np.arange(2, 10, 0.2)
y5 = np.linspace(1., 4., 6)  # 将1和4之间六等分

z = np.indices((3,3))

r = [x, x1, x2, y, y1, y2, y3, y4, y5, z]
s = 'x, x1, x2, y, y1, y2, y3, y4, y5, z'.split(', ')

for i in range(len(r)):
    print('%s =  ' % s[i])
    print('')
    print(r[i])
    print(75 * '=')
x =  

[2 3 1 0]
===========================================================================
x1 =  

[[ 1.+0.j  2.+0.j]
 [ 0.+0.j  0.+0.j]
 [ 1.+1.j  3.+0.j]]
===========================================================================
x2 =  

[[ 1.+0.j  2.+0.j]
 [ 0.+0.j  0.+0.j]
 [ 1.+1.j  3.+0.j]]
===========================================================================
y =  

[[ 0.  0.  0.]
 [ 0.  0.  0.]]
===========================================================================
y1 =  

[[ 1.  1.  1.]
 [ 1.  1.  1.]]
===========================================================================
y2 =  

[0 1 2 3 4 5 6 7 8 9]
===========================================================================
y3 =  

[ 2.  3.  4.  5.  6.  7.  8.  9.]
===========================================================================
y4 =  

[ 2.   2.2  2.4  2.6  2.8  3.   3.2  3.4  3.6  3.8  4.   4.2  4.4  4.6  4.8
  5.   5.2  5.4  5.6  5.8  6.   6.2  6.4  6.6  6.8  7.   7.2  7.4  7.6  7.8
  8.   8.2  8.4  8.6  8.8  9.   9.2  9.4  9.6  9.8]
===========================================================================
y5 =  

[ 1.   1.6  2.2  2.8  3.4  4. ]
===========================================================================
z =  

[[[0 0 0]
  [1 1 1]
  [2 2 2]]

 [[0 1 2]
  [0 1 2]
  [0 1 2]]]
===========================================================================

Tips: 关于参数 order:

order 指内存中存储元素的顺序,C 指和 C语言 相似(即行优先),F 指和 Fortran 相似(即列优先)

g = np.ones((2,3,4), dtype = 'i', order = 'C')  # 还有 `np.zeros()`
g
array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int32)
# 可将其他数组作为参数传入,返回传入数组的 `shape` 相同的全一矩阵
h = np.ones_like(g, dtype = 'float16', order = 'C')  # 还有 `np.zeros_like()`
h
array([[[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]],

       [[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]]], dtype=float16)

注意事项:

  1. 数组的组成/长度/大小在任何维度内都是同质的
  2. 整个数组只允许一种数据类型(numpy.dtype)。

NumPy dtype对象

dtype 描述 示例
t 位域 t4(4位)
b 布尔值 b(TrueFalse)
I 整数 i8(64位)
u 无符号整数 u8(64位)
f 浮点数 f8(64位)
c 浮点复数 c16(128位)
o 对象 o(指向对象的指针)
S,a 字符串 S24(24个字符)
U Unicode U24(24个Unicode字符)
V 其他 V12(12字节数据块)

结构数组

允许我们至少在每列上使用不同的NumPy数据类型。

np.info(np.dtype)
 dtype()

dtype(obj, align=False, copy=False)

Create a data type object.

A numpy array is homogeneous, and contains elements described by a
dtype object. A dtype object can be constructed from different
combinations of fundamental numeric types.

Parameters
----------
obj
    Object to be converted to a data type object.
align : bool, optional
    Add padding to the fields to match what a C compiler would output
    for a similar C-struct. Can be ``True`` only if `obj` is a dictionary
    or a comma-separated string. If a struct dtype is being created,
    this also sets a sticky alignment flag ``isalignedstruct``.
copy : bool, optional
    Make a new copy of the data-type object. If ``False``, the result
    may just be a reference to a built-in data-type object.

See also
--------
result_type

Examples
--------
Using array-scalar type:

>>> np.dtype(np.int16)
dtype('int16')

Structured type, one field name 'f1', containing int16:

>>> np.dtype([('f1', np.int16)])
dtype([('f1', '<i2')])

Structured type, one field named 'f1', in itself containing a structured
type with one field:

>>> np.dtype([('f1', [('f1', np.int16)])])
dtype([('f1', [('f1', '<i2')])])

Structured type, two fields: the first field contains an unsigned int, the
second an int32:

>>> np.dtype([('f1', np.uint), ('f2', np.int32)])
dtype([('f1', '<u4'), ('f2', '<i4')])

Using array-protocol type strings:

>>> np.dtype([('a','f8'),('b','S10')])
dtype([('a', '<f8'), ('b', '|S10')])

Using comma-separated field formats.  The shape is (2,3):

>>> np.dtype("i4, (2,3)f8")
dtype([('f0', '<i4'), ('f1', '<f8', (2, 3))])

Using tuples.  ``int`` is a fixed type, 3 the field's shape.  ``void``
is a flexible type, here of size 10:

>>> np.dtype([('hello',(np.int,3)),('world',np.void,10)])
dtype([('hello', '<i4', 3), ('world', '|V10')])

Subdivide ``int16`` into 2 ``int8``'s, called x and y.  0 and 1 are
the offsets in bytes:

>>> np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)}))
dtype(('<i2', [('x', '|i1'), ('y', '|i1')]))

Using dictionaries.  Two fields named 'gender' and 'age':

>>> np.dtype({'names':['gender','age'], 'formats':['S1',np.uint8]})
dtype([('gender', '|S1'), ('age', '|u1')])

Offsets in bytes, here 0 and 25:

>>> np.dtype({'surname':('S25',0),'age':(np.uint8,25)})
dtype([('surname', '|S25'), ('age', '|u1')])


Methods:

  newbyteorder  --  newbyteorder(new_order='S')
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'),
               ('Height', 'f'), ('Children/Pets', 'i4', 2)])
s = np.array([('Smith', 45, 1.83, (0, 1)),
              ('Jones', 53, 1.72, (2, 2))], dtype=dt)
s
array([(b'Smith', 45,  1.83000004, [0, 1]),
       (b'Jones', 53,  1.72000003, [2, 2])],
      dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])
s['Name']
array([b'Smith', b'Jones'],
      dtype='|S10')
s['Age']
array([45, 53])
s["Height"].mean()
1.7750001
s[1]
(b'Jones', 53,  1.72000003, [2, 2])
s[1]['Age']
53

代码向量化

r = np.array([[1,2,3],[2,3,4],[3,4,5],[4,5,6]])
s = np.array([[2,3,4],[3,4,5],[4,5,6],[6,7,8]])

简单的数学运算

r + s    
array([[ 3,  5,  7],
       [ 5,  7,  9],
       [ 7,  9, 11],
       [10, 12, 14]])
r * s
array([[ 2,  6, 12],
       [ 6, 12, 20],
       [12, 20, 30],
       [24, 35, 48]])
r % s
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]], dtype=int32)
s // r
array([[2, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int32)

支持广播

更多内容参考http://www.cnblogs.com/lyon2014/p/4696989.html

r
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])
2 * r + 3
array([[ 5,  7,  9],
       [ 7,  9, 11],
       [ 9, 11, 13],
       [11, 13, 15]])
f = np.array([9,8,7])
f
array([9, 8, 7])
r + f
array([[10, 10, 10],
       [11, 11, 11],
       [12, 12, 12],
       [13, 13, 13]])
# r.transpose() 转置
np.shape(r.T)
(3, 4)
def f(x):
    return 3 * x + 5
f(r.T)
array([[ 8, 11, 14, 17],
       [11, 14, 17, 20],
       [14, 17, 20, 23]])
np.sin(r)
array([[ 0.84147098,  0.90929743,  0.14112001],
       [ 0.90929743,  0.14112001, -0.7568025 ],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.7568025 , -0.95892427, -0.2794155 ]])
np.sin(np.pi)
1.2246467991473532e-16

ufunc

http://docs.scipy.org/doc/numpy/reference/ufuncs.html

Memory Layout(内存布局)

x = np.random.standard_normal((5, 10000000))
y = 2 * x + 3  # linear equation y = a * x + b
C = np.array((x, y), order='C')
F = np.array((x, y), order='F')
x = 0.0; y = 0.0  # memory clean-up
C[:2].round(2)
array([[[ 0.67,  0.29,  1.54, ...,  0.07,  2.64, -0.65],
        [ 0.4 , -0.63,  1.43, ...,  1.11,  0.93, -0.52],
        [-0.41,  2.23, -1.16, ..., -1.66,  0.07,  0.21],
        [ 1.46,  1.22,  0.2 , ..., -0.56,  2.36, -1.65],
        [-0.39,  1.73, -0.24, ..., -1.45,  0.43, -0.41]],

       [[ 4.34,  3.58,  6.08, ...,  3.15,  8.28,  1.69],
        [ 3.79,  1.73,  5.86, ...,  5.22,  4.87,  1.97],
        [ 2.17,  7.46,  0.67, ..., -0.32,  3.15,  3.42],
        [ 5.93,  5.44,  3.4 , ...,  1.89,  7.72, -0.3 ],
        [ 2.22,  6.46,  2.51, ...,  0.1 ,  3.85,  2.18]]])
%timeit C.sum()
135 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum()
134 ms ± 499 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

加总数组元素时,两种内存布局没有显著差异。但是,考虑以下情况便会有显著的差异。

%timeit C[0].sum(axis=0)
128 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit C[0].sum(axis=1)
66.5 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum(axis=0)
1.06 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit F.sum(axis=1)
2.12 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
F = 0.0; C = 0.0  # memory clean-up

从上面可以看出:
在少量大型的向量上的操作比在大量小型向量上性能好。
少量大型向量的元素保存在相邻的内存位置上,这可以解释相对的性能优势。
但是,与类C语言变种相比,整体操作要慢得多。

选择合适的内存布局,可将代码执行速度提高2个以上的数量级。

结语:

  1. 基本数据类型(整数,浮点数,字符串)提供了原始数据类型。
  2. 标准数据结构(元组,列表,字典,集合类)提供了对数据集的各种操作。
  3. 数组(numpy.ndarray类)提供了代码的向量化操作,使得代码变得更加简洁、方便、高性能。

值得参考的资料:

探寻有趣之事!
目录
相关文章
|
JavaScript 前端开发 API
详解React与Vue的性能对比
详解React与Vue的性能对比
641 0
|
11月前
|
前端开发 JavaScript Java
通过ChatGPT生成测试用例和测试脚本(2)
通过ChatGPT生成测试用例和测试脚本
295 21
|
前端开发 UED CDN
前端代码分割和按需加载策略
前端应用的规模不断增长,为了提高网页加载速度和减少初始加载时间,前端代码分割和按需加载策略变得越来越重要。本文将深入探讨前端代码分割和按需加载的策略,以及如何使用现代前端工具来实现这些优化。
407 0
|
Kubernetes Cloud Native 容器
Kubernetes 与 OpenStack
Kubernetes 与 OpenStack
999 0
|
开发工具 Android开发
关于Android studio 无线adb连接设备的方法
在开发过程中,真机调试,往往需要依赖USB数据线,使用无线adb,我们可以抛开USB数据线,在USB数据线找不到的请况下,也可以保证测试机与电脑处于连接状态,照样可以调试测试。
1027 1
|
测试技术
基础知识还不会?测试理论知识系列——沙盒环境
沙盒环境就是给特定人员用的环境的总称嘛,一般分开发环境,测试环境
942 0
基础知识还不会?测试理论知识系列——沙盒环境
|
小程序 关系型数据库 MySQL
一套满足企业自建OpenAPI、接口平台的解决方案、产品和源代码
企业自建接口平台的好处。 在现代软件系统项目开发中,API接口是不可或缺的组成部分。 不管是内部系统之间的接口调用和提供,还是外部API接口的对接和开发,搭建企业自己统一的接口平台,对API接口的开发、管理和维护,都会大有裨益。
|
缓存 移动开发 NoSQL
技术实践第三期|HashTag在Redis集群环境下的使用
欢迎了解友盟+技术干货第三期内容:Redis集群环境如何按照前缀批量删除缓存。希望能对开发者们在实际应用中有所帮助。
技术实践第三期|HashTag在Redis集群环境下的使用
|
传感器 物联网 Java
新大陆云平台篇
新大陆云平台篇 简介 代码分析 全部代码
1295 0