一. numpy模块
1.1 创建数组
1.1.1 数组:array()
一维数组情况:
import numpy as np a = np.array([1, 2, 3, 4]) b = np.array(['产品编号', '销售数量', '销售单价', '销售金额']) print(a) print(b)
[1 2 3 4] ['产品编号' '销售数量' '销售单价' '销售金额']
二维数组情况:
import numpy as np c = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print (c)
[[1 2 3] [4 5 6] [7 8 9]]
参数见下表
array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
参数 | 说明 |
object | 必选,为一个序列型对象,如列表、元组、集合等,还可以是一个已创建好的数组 |
dtype | 可选,用于指定数组元素的数据类型 |
copy | 可选,用于设置是否需要复制对象 |
order | 可选,用于指定创建数组的样式 |
subok | 可选,默认返回一个与基类的类型一致的数组 |
ndmin | 可选,用于指定生成数组的最小维度 |
1.1.2 等差数组:arange()
索引从0开始,记左不记右
3参数情况:
import numpy as np d = np.arange(1,20,4) #第3个参数为步长 可选 默认为1 1~19,步长4 记左不记右 print(d)
[ 1 5 9 13 17]
2参数情况:
import numpy as np d = np.arange(1,20) print(d)
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
1参数情况:
import numpy as np d = np.arange(20) print(d)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
参数说明见表
arange(start, stop, step, dtype=None)
参数 | 说明 |
start | 可选,表示起始值。省略时默认为0 |
stop | 必选,表示结束值。生成的数组元素不包括该值 |
step | 可选,表示步长。省略时默认为1。给出该参数则必须给出start参数 |
dtype | 可选,表示数组元素类型。默认为None |
1.1.3 随机数组:rand()、randn()、randint()
1. rand()函数:生成元素值在[0,1)区间的随机数
一维情况:
import numpy as np e = np.random.rand(3) print(e)
[0.8412559 0.63220568 0.7395547 ]
二维情况:
import numpy as np e = np.random.rand(2, 3) print(e)
[[0.03654404 0.33348249 0.30089453] [0.35291365 0.56683093 0.41812811]]
2. randn()函数:生成元素值在[0,1)区间的随机数,且符合标准正态分布
一维情况:
import numpy as np e = np.random.randn(3) print(e)
[ 0.36809175 -0.07224965 -0.33366574]
二维情况:
import numpy as np e = np.random.randn(3, 3) print(e)
[[ 1.14014499 -0.95577809 -0.94003745] [-2.61768236 -0.6565676 0.74041531] [-0.3138474 0.68276791 0.17315121]]
3. randint()函数:生成指定范围的随机数,记左不记右
一维情况:
import numpy as np e = np.random.randint(1, 5, 10) #随机数范围:[1,5) 10个数 print(e)
[4 1 2 4 1 3 2 4 1 2]
二维情况:第三个参数指定维度
import numpy as np e = np.random.randint(1, 10, (4, 2)) #随机数范围:[1,10) print(e)
[[4 4] [2 5] [3 3] [7 8]]
1.2 查看数组的属性
1.2.1 行列数 :shape
import numpy as np arr = np.array([[1, 2],[3, 4],[5, 6]]) print(arr.shape)
(3, 2)
只查看行数、或者列数
import numpy as np arr = np.array([[1, 2],[3, 4],[5, 6]]) print(arr.shape[0]) #查看行数 print(arr.shape[1]) #查看列数
3 2
1.2.2 元素个数:size
import numpy as np arr = np.array([[1, 2],[3, 4],[5, 6]]) print(arr.size)
6
1.2.3 元素的数据类型:dtype
import numpy as np arr = np.array([[1.3, 2, 3.6, 4], [5, 6, 7.8, 8]]) print(arr.dtype)
float64
1.2.4 转换元素的数据类型:astype()
import numpy as np arr = np.array([[1.3, 2, 3.6, 4], [5, 6, 7.8, 8]]) arr1 = arr.astype(int) print(arr1) print(arr1.dtype)
[[1 2 3 4] [5 6 7 8]] int32
1.2.5 数组维度:ndim
import numpy as np arr = np.array([[1, 2],[3, 4],[5, 6]]) print(arr.ndim)
2
1.3 选取数组元素
1.3.1 一维数组
1. 选取单个元素
import numpy as np arr = np.array([12, 2, 40, 64, 56, 6, 57, 18, 95, 17, 21, 12]) print(arr[0]) print(arr[5]) print(arr[-1]) #倒数第1个 print(arr[-4])
12 6 12 95
2. 选取连续的元素
import numpy as np arr = np.array([12, 2, 40, 64, 56, 6, 57, 18, 95, 17, 21, 12]) print(arr[1:6]) print(arr[3:-2]) print(arr[:3]) print(arr[:-3]) print(arr[3:]) print(arr[-3:])
[ 2 40 64 56 6] [64 56 6 57 18 95 17] [12 2 40] [12 2 40 64 56 6 57 18 95] [64 56 6 57 18 95 17 21 12] [17 21 12]
3. 选取不连续的元素
import numpy as np arr = np.array([12, 2, 40, 64, 56, 6, 57, 18, 95, 17, 21, 12]) print(arr[1:5:2]) #[1,5) 步长为2 print(arr[5:1:-2]) #第三个参数指定步长 print(arr[::3]) print(arr[3::]) print(arr[:3:])
[ 2 64] [ 6 64] [12 64 57 17] [64 56 6 57 18 95 17 21 12] [12 2 40]
1.3.2 二维数组
逗号隔开两个索引
1. 选取单个元素
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) print(arr[1, 2]) #第1行第2列 索引从0开始
6
2. 选取单行或单列元素
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) print(arr[2]) print(arr[:, 1])
[7 8 9] [ 2 5 8 11]
3. 选取某些行或某些列的元素
某些行
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) print(arr[1:3]) print(arr[:2]) print(arr[2:])
[[4 5 6] [7 8 9]] [[1 2 3] [4 5 6]] [[ 7 8 9] [10 11 12]]
某些列
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]) print(arr[:, 1:3]) print(arr[:, :2]) print(arr[:, 2:])
[[ 2 3] [ 6 7] [10 11] [14 15]] [[ 1 2] [ 5 6] [ 9 10] [13 14]] [[ 3 4] [ 7 8] [11 12] [15 16]]
4. 同时选取行列元素
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]) print(arr[0:2, 1:3])
[[2 3] [6 7]]
1.4 数组的重塑和转置:reshape()
1.4.1 一维数组
import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7, 8]) a = arr.reshape(2, 4) #需要重塑前后元素个数相等才行 b = arr.reshape(4, 2) print(a) print(b)
[[1 2 3 4] [5 6 7 8]] [[1 2] [3 4] [5 6] [7 8]]
1.4.2 多维数组
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) c = arr.reshape(4, 3) d = arr.reshape(2, 6) print(c) print(d)
[[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]] [[ 1 2 3 4 5 6] [ 7 8 9 10 11 12]]
1.4.3 多维重塑为一维:flatten()、ravel()
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) print(arr.flatten()) #降为一维 print(arr.ravel())
[ 1 2 3 4 5 6 7 8 9 10 11 12] [ 1 2 3 4 5 6 7 8 9 10 11 12]
1.4.4 数组的转置:T属性、transpose()
1. T属性
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) print(arr) print(arr.T)
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] [[ 1 5 9] [ 2 6 10] [ 3 7 11] [ 4 8 12]]
2. transpose()函数
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) arr1 = np.transpose(arr) print(arr1)
[[ 1 5 9] [ 2 6 10] [ 3 7 11] [ 4 8 12]]
1.5 数组的处理
1.5.1 添加元素:append()、insert()
1. append()函数
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) arr1 = np.append(arr, [[7, 8, 9]]) print(arr1)
[1 2 3 4 5 6 7 8 9]
可以看出append()函数在二维数组中添加元素,结果转为了一维数组。
那怎么保持二维数组呢?可以设置axis参数按行或者按列添加
axis=0:按行添加
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) arr1 = np.append(arr, [[7, 8, 9]], axis = 0) print(arr1)
[[1 2 3] [4 5 6] [7 8 9]]
axis=1:按列添加
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) arr1 = np.append(arr, [[7, 8],[9, 10]], axis = 1) print(arr1)
[[ 1 2 3 7 8] [ 4 5 6 9 10]]
append()函数参数如下
append(arr, values, axis=None)
参数 | 说明 |
arr | 必选,要添加元素的数组 |
values | 必选,要添加的数组元素 |
axis | 可选,默认为None。省略此参数时,默认当一维数组尾插元素。为0按行添加;为1按列添加 |
2. insert()函数
import numpy as np arr = np.array([[1, 2], [3, 4], [5, 6]]) arr1 = np.insert(arr, 1, [7, 8]) print(arr1)
[1 7 8 2 3 4 5 6]
可以看出先把二维数组降成了一维数组,再在索引为1的位置添加元素。
那么怎么保持在二维添加元素呢?同样设置axis参数
import numpy as np arr = np.array([[1, 2], [3, 4], [5, 6]]) arr1 = np.insert(arr, 1, [7, 8], axis = 0) arr2 = np.insert(arr, 1, [7, 8, 9], axis = 1) print(arr1) print(arr2)
[[1 2] [7 8] [3 4] [5 6]] [[1 7 2] [3 8 4] [5 9 6]]
insert()参数如下
insert(arr, obj, values, axis)
参数 | 说明 |
arr | 必选,要插入元素的数组 |
obj | 必选,数组的索引值,表示插入元素的位置 |
values | 必选,要插入的元素 |
axis | 可选,省略此参数时,默认当一维数组插入。为0按行;为1按列 |
1.5.2 删除元素:delete()
也分按行和按列删除
import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) arr1 = np.delete(arr, 2) arr2 = np.delete(arr, 2, axis = 0) arr3 = np.delete(arr, 2, axis = 1) print(arr1) print(arr2) print(arr3)
[1 2 4 5 6 7 8 9] [[1 2 3] [4 5 6]] [[1 2] [4 5] [7 8]]
1.5.3 处理缺失值:isnan()
标记缺失值:isnan()函数
import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, np.nan, 8, 9]) print(arr) print(np.isnan(arr))
[ 1. 2. 3. 4. 5. 6. nan 8. 9.] [False False False False False False True False False]
补充缺失值:
arr[np.isnan(arr)] = 0 print(arr)
[1. 2. 3. 4. 5. 6. 0. 8. 9.]
1.5.4 处理重复值:unique()
import numpy as np arr = np.array([8, 4, 2, 3, 5, 2, 5, 5, 6, 8, 8, 9]) arr1 = np.unique(arr) arr1, arr2 = np.unique(arr, return_counts=True) # 两个返回值 arr1为去重后数组 arr2为每个元素出现的次数 print(arr1) print(arr2)
[2 3 4 5 6 8 9] [2 1 1 3 1 3 1]
1.5.5 拼接数组:concatenate()、hstack()、vstack()
1. concatenate()函数
同样axis参数可以指定拼接按行还是按列
import numpy as np arr1 = np.array([[1, 2, 3], [4, 5, 6]]) arr2 = np.array([[7, 8, 9], [10, 11, 12]]) arr3 = np.concatenate((arr1, arr2), axis=0) #行方向上拼接 arr4 = np.concatenate((arr1, arr2), axis=1) #列方向上拼接 print(arr3) print(arr4)
[[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]] [[ 1 2 3 7 8 9] [ 4 5 6 10 11 12]]
2. hstack()函数:以水平堆叠的方式拼接数组
import numpy as np arr1 = np.array([[1, 2, 3], [4, 5, 6]]) arr2 = np.array([[7, 8, 9], [10, 11, 12]]) arr3 = np.hstack((arr1, arr2)) print(arr3)
[[ 1 2 3 7 8 9] [ 4 5 6 10 11 12]]
3. vstack()函数:以垂直堆叠的方式拼接数组
import numpy as np arr1 = np.array([[1, 2, 3], [4, 5, 6]]) arr2 = np.array([[7, 8, 9], [10, 11, 12]]) arr3 = np.vstack((arr1, arr2)) print(arr3)
[[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]]
1.5.6 拆分数组:split()、hsplit()、vsplit()
1. split()函数
import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) arr1 = np.split(arr, 2) #第2个参数指定拆成几个数组 arr2 = np.split(arr, 4) print(arr1) print(arr2)
[array([1, 2, 3, 4, 5, 6]), array([ 7, 8, 9, 10, 11, 12])] [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9]), array([10, 11, 12])]
第二个参数还可以是数组,指定拆分的位置
import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) arr3 = np.split(arr, [2, 6]) arr4 = np.split(arr, [2, 3, 8, 10]) print(arr3) print(arr4)
[array([1, 2]), array([3, 4, 5, 6]), array([ 7, 8, 9, 10, 11, 12])] [array([1, 2]), array([3]), array([4, 5, 6, 7, 8]), array([ 9, 10]), array([11, 12])]
2. hsplit()函数和vsplit()函数
hsplit()函数:横向拆成几个数组
vsplit()函数:纵向拆成几个数组
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9 , 10, 11, 12],[13, 14, 15, 16]]) arr5 = np.hsplit(arr, 2) arr6 = np.vsplit(arr, 2) print(arr5) print(arr6)
[array([[ 1, 2], [ 5, 6], [ 9, 10], [13, 14]]), array([[ 3, 4], [ 7, 8], [11, 12], [15, 16]])] [array([[1, 2, 3, 4], [5, 6, 7, 8]]), array([[ 9, 10, 11, 12], [13, 14, 15, 16]])]
1.6 数组的运算
1.6.1 四则运算
数组与数组之间的运算
import numpy as np arr1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) arr2 = np.array([[9, 10, 11, 12], [13, 14, 15, 16]]) arr3 = arr1 + arr2 #对应位置的元素相加 arr4 = arr1 * arr2 #对应位置的元素相乘 print(arr3) print(arr4)
[[10 12 14 16] [18 20 22 24]] [[ 9 20 33 48] [ 65 84 105 128]]
数组与数值的运算
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) arr5 = arr + 5 #每个位置的元素+5 arr6 = arr * 10 #每个位置的元素*10 print(arr5) print(arr6)
[[ 6 7 8 9] [10 11 12 13]] [[10 20 30 40] [50 60 70 80]]
1.6.2 统计运算:sum()、mean()、max()
1. 求和:sum()函数
可以指定整个数组求和,还是按行或者按列
axis=0:每一列的元素求和
axis=1:每一行的元素求和
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) arr1 = arr.sum() arr2 = arr.sum(axis=0) #每一列的元素求和 arr3 = arr.sum(axis=1) #每一行的元素求和 print(arr1) print(arr2) print(arr3)
78 [15 18 21 24] [10 26 42]
2. 求平均值:mean()函数
axis=0:每一列求均值
axis=1:每一行求均值
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) arr1 = arr.mean() arr2 = arr.mean(axis=0) #每一列求均值 arr3 = arr.mean(axis=1) #每一行求均值 print(arr1) print(arr2) print(arr3)
6.5 [5. 6. 7. 8.] [ 2.5 6.5 10.5]
3. 求最值:max()函数
axis=0:每一列求最大值
axis=1:每一行求最大值
import numpy as np arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) arr1 = arr.max() arr2 = arr.max(axis=0) #每一列求最大值 arr3 = arr.max(axis=1) #每一行求最大值 print(arr1) print(arr2) print(arr3)
12 [ 9 10 11 12] [ 4 8 12]