3、其他聚合操作
Function Name NaN-safe Version Description np.sum np.nansum Compute sum of elements np.prod np.nanprod Compute product of elements np.mean np.nanmean Compute mean of elements np.std np.nanstd Compute standard deviation np.var np.nanvar Compute variance np.min np.nanmin Find minimum value np.max np.nanmax Find maximum value np.argmin np.nanargmin Find index of minimum value np.argmax np.nanargmax Find index of maximum value np.median np.nanmedian Compute median of elements np.percentile np.nanpercentile Compute rank-based statistics of elements np.any N/A Evaluate whether any elements are true np.all N/A Evaluate whether all elements are true np.power 幂运算
np.nan # 这个数字代表的是缺失,默认是浮点类型 type(np.nan) # 任何数字和nan相运算都是缺失 • 1 • 2 • 3
float • 1
np.nan + 10
nan • 1
np.nan*10 • 1
nan • 1
nd2 = np.array([12,23,np.nan,34,np.nan,90]) nd2
array([ 12., 23., nan, 34., nan, 90.]) • 1
# 对nd2聚合 nd2.sum(axis=0) • 1 • 2
nan • 1
nd2.max() • 1
nan
普通聚合对于有缺失的数组来说会造成干扰,就需要使用带nan的聚合
np.nansum(nd2) • 1
159.0 • 1
np.nanmean(nd2) • 1
39.75
聚合操作:
1)axis指定的是聚合的哪个维度,默认没有代表完全聚合(即把所有的数组全聚合起来最后得到一个常数),如果axis值指定哪个维度,这个维度就会消失,取而代之的是聚合以后的结果 2)numpy里面的聚合函数有两个版本带nan和不带nan,带nan的聚合会把缺失的那些项在聚合的时候直接剔除掉
思考题:如何根据第3列来对一个5*5矩阵排序?
nd = np.random.randint(0,100,size=(5,5)) nd
array([[70, 76, 87, 23, 68], [34, 3, 59, 93, 71], [71, 64, 98, 31, 70], [59, 17, 71, 99, 50], [86, 58, 91, 22, 18]])
排序
np.sort(nd,axis=0)
array([[34, 3, 59, 22, 18], [59, 17, 71, 23, 50], [70, 58, 87, 31, 68], [71, 64, 91, 93, 70], [86, 76, 98, 99, 71]])
np.sort(nd[:,3]) • 1
array([22, 23, 31, 93, 99]) • 1
nd[[4,0,2,1,3]]
array([[86, 58, 91, 22, 18], [70, 76, 87, 23, 68], [71, 64, 98, 31, 70], [34, 3, 59, 93, 71], [59, 17, 71, 99, 50]])
ind = np.argsort(nd[:,3]) # 按照从小到大的顺序排序以后,返回元素对应的下标 ind • 1 • 2
array([4, 0, 2, 1, 3], dtype=int64) • 1
nd[ind]
array([[86, 58, 91, 22, 18], [70, 76, 87, 23, 68], [71, 64, 98, 31, 70], [34, 3, 59, 93, 71], [59, 17, 71, 99, 50]])
五、ndarray的矩阵操作
1. 基本矩阵操作
1)算术运算(即加减乘除)
nd = np.random.randint(0,10,size=(3,3)) nd
array([[7, 4, 6], [4, 5, 1], [0, 2, 5]])
nd + nd
array([[14, 8, 12], [ 8, 10, 2], [ 0, 4, 10]])
nd + 2 # 在这里常数2会被放大成一个3*3的矩阵值全为2
array([[9, 6, 8], [6, 7, 3], [2, 4, 7]])
nd - 2
array([[ 5, 2, 4], [ 2, 3, -1], [-2, 0, 3]])
在数学矩阵是可以乘以或除以一个常数的
nd * 4
array([[28, 16, 24], [16, 20, 4], [ 0, 8, 20]])
nd / 4 • 1
array([[ 1.75, 1. , 1.5 ], [ 1. , 1.25, 0.25], [ 0. , 0.5 , 1.25]]) • 1 • 2 • 3
1/nd
C:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide """Entry point for launching an IPython kernel. array([[ 0.14285714, 0.25 , 0.16666667], [ 0.25 , 0.2 , 1. ], [ inf, 0.5 , 0.2 ]])
2)矩阵积
nd1 = np.random.randint(0,10,size=(2,3)) nd2 = np.random.randint(0,10,size=(3,3)) print(nd1) print(nd2)
[[8 3 5] [3 3 5]] [[4 1 0] [1 3 0] [7 6 7]]
np.dot(nd1,nd2)
array([[70, 47, 35], [50, 42, 35]])
两个矩阵A和B相乘的时候A*B的时候,数学上要求A列数要B的行数保持一致(因为我们在乘的时候是拿A的行乘B的列)
2. 广播机制
ndarray的广播机制的两条规则:
- 1、为缺失维度补1
- 2、假定缺失的元素用已有值填
nd + nd1
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-243-1efd3ade59a4> in <module>() ----> 1 nd + nd1 ValueError: operands could not be broadcast together with shapes (3,3) (2,3)
nd • 1
array([[7, 4, 6], [4, 5, 1], [0, 2, 5]])
nd1 = np.random.randint(0,10,size=3) nd1 • 1 • 2
array([1, 8, 6])
矩阵和向量相加减,矩阵和常数相加减,向量和常数相加减在数学上是不允许
在程序中,之所以可这样计算,原因是广播机制,把低维度的数据扩展成了和高维度形状类似的数据类型
nd + nd1
array([[ 8, 12, 12], [ 5, 13, 7], [ 1, 10, 11]]) • 1 • 2 • 3
nd1 + 3
array([ 4, 11, 9]) • 1
nd2 = np.random.randint(0,10,size=4) nd2 • 1 • 2
array([8, 5, 1, 7])
nd1+nd2
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-249-99c1f2f85312> in <module>() ----> 1 nd1+nd2 ValueError: operands could not be broadcast together with shapes (3,) (4,)
nd + nd2
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-250-434995cd4e14> in <module>() ----> 1 nd + nd2 ValueError: operands could not be broadcast together with shapes (3,3) (4,)
nd3 = np.random.randint(0,10,size=(3,1)) nd3 • 1 • 2
array([[6], [8], [6]]) • 1 • 2 • 3
nd +nd3 # nd3是一个列向量
array([[13, 10, 12], [12, 13, 9], [ 6, 8, 11]]) • 1 • 2 • 3
广播机制的原则:
1)就是要把缺失的那些行或者列补充完整 2)我们可以把一个常数向任何一个矩阵或者向量进行广播,用常数来填补整个扩展的矩阵 3)向量可以向形状类似的举证广播(比如行向量可以向列数与其一致矩阵广播),向量在向矩阵广播的时候,用向量的行(或列)取填补扩展的矩阵