前言
个人感觉网上对pandas的总结感觉不够详尽细致,在这里我对pandas做个相对细致的小结吧,在数据分析与人工智能方面会有所涉及到的东西在这里都说说吧,也是对自己学习的一种小结!
pandas用法的介绍
安装部分我就不说了,装个pip,使用命令pip install pandas就可以安装了,在Ubuntu中可能会出现没有权限的提示,直接加上sudo即可,以下讲解都是建立在python3平台的讲解,python2类似,python3中安装的时候使用sudo pip3 install pandas即可。
pandas是Python的一个数据分析模块,是为了解决数据分析任务而创建的,纳入了大量的库和标准数据模型,提供了高效地操作大型数据集所需的工具。
pandas中的数据结构
:
- Series: 一维数组,类似于python中的基本数据结构list,区别是series只允许存储相同的数据类型,这样可以更有效的使用内存,提高运算效率。就像数据库中的列数据。
- DataFrame: 二维的表格型数据结构。很多功能与R中的data.frame类似。可以将DataFrame理解为Series的容器。
- Panel:三维的数组,可以理解为DataFrame的容器。
关于pandas的更多详细的介绍请参看:http://pandas.pydata.org/pandas-docs/stable/10min.html
感兴趣的同学还可以看看我之前写过的numpy用法小结,库中大部分用法和numpy类似,可以对比着看,方便理解
下面我们以一个food_info.csv数据集来为大家讲解pandas的基本用法,该数据文件有需要的同学可以加我好友私聊我,或者把你的请求发邮箱至i_love_sjtu@qq.com,感谢看此文的您的支持和理解~~~
1.read_csv
pandas.read_csv(""),这里我们讲解下,read_csv函数的意思是读取文件信息,用来处理数据信息,可以处理数据文件。
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(type(food_info)) print(food_info.dtypes)
打印结果如下:
<class 'pandas.core.frame.DataFrame'> NDB_No int64 Shrt_Desc object Water_(g) float64 Energ_Kcal int64 Protein_(g) float64 Lipid_Tot_(g) float64 Ash_(g) float64 Carbohydrt_(g) float64 Fiber_TD_(g) float64 Sugar_Tot_(g) float64 Calcium_(mg) float64 Iron_(mg) float64 Magnesium_(mg) float64 Phosphorus_(mg) float64 Potassium_(mg) float64 Sodium_(mg) float64 Zinc_(mg) float64 Copper_(mg) float64 Manganese_(mg) float64 Selenium_(mcg) float64 Vit_C_(mg) float64 Thiamin_(mg) float64 Riboflavin_(mg) float64 Niacin_(mg) float64 Vit_B6_(mg) float64 Vit_B12_(mcg) float64 Vit_A_IU float64 Vit_A_RAE float64 Vit_E_(mg) float64 Vit_D_mcg float64 Vit_D_IU float64 Vit_K_(mcg) float64 FA_Sat_(g) float64 FA_Mono_(g) float64 FA_Poly_(g) float64 Cholestrl_(mg) float64 dtype: object
我解释一下上面的用法,genfromtxt传入了三个参数,第一个参数是数据文件,名为world_alcohol.txt,该数据文件有需要的同学可以加我好友私聊我,或者把你的请求发邮箱至i_love_sjtu@qq.com
然后delimiter是分隔符,由于数据集中的数据是用逗号分隔的,所以设定参数delimiter=',',dtype是获取数据类型,数据集中的类型为str
print(type(food_info))打印数据文件的数据类型
print(food_info.dtypes)打印每一列数据的格式
2.shape
xxx.shape 显示的功能是查看数据表的维度数
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info.shape)
打印结果:
(8618, 36)
显示出当前表的维度是8618行36列。
3.info()
xxx.info()获取数据表基本信息(维度、列名称、数据格式、所占空间等)
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info.info())
打印结果:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8618 entries, 0 to 8617 Data columns (total 36 columns): NDB_No 8618 non-null int64 Shrt_Desc 8618 non-null object Water_(g) 8612 non-null float64 Energ_Kcal 8618 non-null int64 Protein_(g) 8618 non-null float64 Lipid_Tot_(g) 8618 non-null float64 Ash_(g) 8286 non-null float64 Carbohydrt_(g) 8618 non-null float64 Fiber_TD_(g) 7962 non-null float64 Sugar_Tot_(g) 6679 non-null float64 Calcium_(mg) 8264 non-null float64 Iron_(mg) 8471 non-null float64 Magnesium_(mg) 7936 non-null float64 Phosphorus_(mg) 8046 non-null float64 Potassium_(mg) 8208 non-null float64 Sodium_(mg) 8535 non-null float64 Zinc_(mg) 7917 non-null float64 Copper_(mg) 7363 non-null float64 Manganese_(mg) 6478 non-null float64 Selenium_(mcg) 6868 non-null float64 Vit_C_(mg) 7826 non-null float64 Thiamin_(mg) 7939 non-null float64 Riboflavin_(mg) 7961 non-null float64 Niacin_(mg) 7937 non-null float64 Vit_B6_(mg) 7677 non-null float64 Vit_B12_(mcg) 7427 non-null float64 Vit_A_IU 7932 non-null float64 Vit_A_RAE 7089 non-null float64 Vit_E_(mg) 5613 non-null float64 Vit_D_mcg 5319 non-null float64 Vit_D_IU 5320 non-null float64 Vit_K_(mcg) 4969 non-null float64 FA_Sat_(g) 8274 non-null float64 FA_Mono_(g) 7947 non-null float64 FA_Poly_(g) 7954 non-null float64 Cholestrl_(mg) 8250 non-null float64 dtypes: float64(33), int64(2), object(1) memory usage: 2.4+ MB
4.dtypes和astypes
xxx.dtypes是显示每一列数据的格式,可以指定某一列。
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info.dtypes)
打印结果:
NDB_No int64 Shrt_Desc object Water_(g) float64 Energ_Kcal int64 Protein_(g) float64 Lipid_Tot_(g) float64 Ash_(g) float64 Carbohydrt_(g) float64 Fiber_TD_(g) float64 Sugar_Tot_(g) float64 Calcium_(mg) float64 Iron_(mg) float64 Magnesium_(mg) float64 Phosphorus_(mg) float64 Potassium_(mg) float64 Sodium_(mg) float64 Zinc_(mg) float64 Copper_(mg) float64 Manganese_(mg) float64 Selenium_(mcg) float64 Vit_C_(mg) float64 Thiamin_(mg) float64 Riboflavin_(mg) float64 Niacin_(mg) float64 Vit_B6_(mg) float64 Vit_B12_(mcg) float64 Vit_A_IU float64 Vit_A_RAE float64 Vit_E_(mg) float64 Vit_D_mcg float64 Vit_D_IU float64 Vit_K_(mcg) float64 FA_Sat_(g) float64 FA_Mono_(g) float64 FA_Poly_(g) float64 Cholestrl_(mg) float64 dtype: object
而如果我们想转换表中指定列的数据类型 我们应该使用astype进行转换
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info['NDB_No'].astype('float64'))
打印结果:
0 1001.0 1 1002.0 2 1003.0 3 1004.0 4 1005.0 5 1006.0 6 1007.0 7 1008.0 8 1009.0 9 1010.0 10 1011.0 11 1012.0 12 1013.0 13 1014.0 14 1015.0 15 1016.0 16 1017.0 17 1018.0 18 1019.0 19 1020.0 20 1021.0 21 1022.0 22 1023.0 23 1024.0 24 1025.0 25 1026.0 26 1027.0 27 1028.0 28 1029.0 29 1030.0 ... 8588 43544.0 8589 43546.0 8590 43550.0 8591 43566.0 8592 43570.0 8593 43572.0 8594 43585.0 8595 43589.0 8596 43595.0 8597 43597.0 8598 43598.0 8599 44005.0 8600 44018.0 8601 44048.0 8602 44055.0 8603 44061.0 8604 44074.0 8605 44110.0 8606 44158.0 8607 44203.0 8608 44258.0 8609 44259.0 8610 44260.0 8611 48052.0 8612 80200.0 8613 83110.0 8614 90240.0 8615 90480.0 8616 90560.0 8617 93600.0 Name: NDB_No, Length: 8618, dtype: float64
原来NDB_No是int64类型,现在转换为float64类型了
5.isnull()
xxx.isnull() 用来查看数据表或者某一列数据的值是否为空值。
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info.isnull())
打印结果:
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) \ 0 False False False False False False 1 False False False False False False 2 False False False False False False 3 False False False False False False 4 False False False False False False 5 False False False False False False 6 False False False False False False 7 False False False False False False 8 False False False False False False 9 False False False False False False 10 False False False False False False 11 False False False False False False 12 False False False False False False 13 False False False False False False 14 False False False False False False 15 False False False False False False 16 False False False False False False 17 False False False False False False 18 False False False False False False 19 False False False False False False 20 False False False False False False 21 False False False False False False 22 False False False False False False 23 False False False False False False 24 False False False False False False 25 False False False False False False 26 False False False False False False 27 False False False False False False 28 False False False False False False 29 False False False False False False ... ... ... ... ... ... ... 8588 False False False False False False 8589 False False False False False False 8590 False False False False False False 8591 False False False False False False 8592 False False False False False False 8593 False False False False False False 8594 False False False False False False 8595 False False False False False False 8596 False False False False False False 8597 False False False False False False 8598 False False False False False False 8599 False False False False False False 8600 False False False False False False 8601 False False False False False False 8602 False False False False False False 8603 False False False False False False 8604 False False False False False False 8605 False False False False False False 8606 False False False False False False 8607 False False False False False False 8608 False False False False False False 8609 False False False False False False 8610 False False False False False False 8611 False False False False False False 8612 False False False False False False 8613 False False False False False False 8614 False False False False False False 8615 False False False False False False 8616 False False False False False False 8617 False False False False False False Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... \ 0 False False False False ... 1 False False False False ... 2 False False False False ... 3 False False False False ... 4 False False False False ... 5 False False False False ... 6 False False False False ... 7 False False False True ... 8 False False False False ... 9 False False False True ... 10 False False False False ... 11 False False False False ... 12 False False False False ... 13 False False False False ... 14 False False False False ... 15 False False False False ... 16 False False False False ... 17 False False False False ... 18 False False False False ... 19 False False False False ... 20 False False False True ... 21 False False False False ... 22 False False False False ... 23 False False False False ... 24 False False False False ... 25 False False False False ... 26 False False False False ... 27 False False False False ... 28 False False False False ... 29 False False False False ... ... ... ... ... ... ... 8588 False False False False ... 8589 False False False False ... 8590 False False False False ... 8591 False False False False ... 8592 False False False False ... 8593 False False False False ... 8594 False False False False ... 8595 False False False False ... 8596 False False False False ... 8597 False False False False ... 8598 False False False False ... 8599 False False False False ... 8600 False False False False ... 8601 False False False False ... 8602 False False False False ... 8603 False False False False ... 8604 False False False True ... 8605 False False False False ... 8606 False False False False ... 8607 False False False False ... 8608 False False False False ... 8609 False False False False ... 8610 False False False False ... 8611 False False False False ... 8612 False False False False ... 8613 False False False False ... 8614 False False False False ... 8615 False False False False ... 8616 False False False False ... 8617 False False False False ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) \ 0 False False False False False False 1 False False False False False False 2 False False False False False False 3 False False False False False False 4 False False False False False False 5 False False False False False False 6 False False False False False False 7 False False True True True True 8 False False False False False False 9 False False True True True True 10 False False False False False False 11 False False False False False False 12 False False False False False False 13 False False False False False False 14 False False False False False False 15 False False False False False False 16 False False False False False False 17 False False False False False False 18 False False False False False False 19 False False False False False False 20 False False True True True True 21 False False False False False False 22 False False False False False False 23 False False False False False False 24 False False False False False False 25 False False False False False False 26 False False False False False False 27 False False False False False False 28 False False False False False False 29 False False False False False False ... ... ... ... ... ... ... 8588 False False False False False False 8589 False False False False False False 8590 False False False False False False 8591 False False False False False False 8592 False False False False False False 8593 False False False False False False 8594 False False False False False False 8595 False False False False False False 8596 False False False False False False 8597 False False False False False False 8598 False False False False False False 8599 False False False False False False 8600 False False False False False False 8601 False False False False False False 8602 False False False False False False 8603 False False False False False False 8604 False True True True True True 8605 False False False False False False 8606 False False False False False False 8607 False False False False False False 8608 False False False False False False 8609 False False False False False False 8610 False False False False False False 8611 False False False False False False 8612 False False False False False False 8613 False False False False False False 8614 False False False False False False 8615 False False False False False False 8616 False False False False False False 8617 False False False False False False FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg) 0 False False False False 1 False False False False 2 False False False False 3 False False False False 4 False False False False 5 False False False False 6 False False False False 7 False False False False 8 False False False False 9 False False False False 10 False False False False 11 False False False False 12 False False False False 13 False False False False 14 False False False False 15 False False False False 16 False False False False 17 False False False False 18 False False False False 19 False False False False 20 False False False False 21 False False False False 22 False False False False 23 False False False False 24 False False False False 25 False False False False 26 False False False False 27 False False False False 28 False False False False 29 False False False False ... ... ... ... ... 8588 False False False False 8589 False False False False 8590 False False False False 8591 False False False False 8592 False False False False 8593 False False False False 8594 False False False False 8595 False False False False 8596 False False False False 8597 False False False False 8598 False False False False 8599 False False False False 8600 False False False False 8601 False False False False 8602 False False False False 8603 False False False False 8604 False False False False 8605 False False False False 8606 False False False False 8607 False False False False 8608 False False False False 8609 False False False False 8610 False False False False 8611 False False False False 8612 False False False False 8613 False False False False 8614 False False False False 8615 False False False False 8616 False False False False 8617 False False False False [8618 rows x 36 columns]
6.columns
xxx.columns可以用来查看数据表中列的名称
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info.columns)
打印结果:
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)', 'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)', 'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)', 'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)', 'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg', 'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)', 'Cholestrl_(mg)'], dtype='object')
7.head()和tail()
xxx.head()默认是用来查看前10行数据
而xxx.tail()默认用来查看后10行数据
可以传入参数x,指定查看前x行的数据
举个例子:
import pandas food_info = pandas.read_csv("food_info.csv") print(food_info.head()) print(food_info.tail())
打印结果:
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) \ 0 1001 BUTTER WITH SALT 15.87 717 0.85 1 1002 BUTTER WHIPPED WITH SALT 15.87 717 0.85 2 1003 BUTTER OIL ANHYDROUS 0.24 876 0.28 3 1004 CHEESE BLUE 42.41 353 21.40 4 1005 CHEESE BRICK 41.11 371 23.24 Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) \ 0 81.11 2.11 0.06 0.0 0.06 1 81.11 2.11 0.06 0.0 0.06 2 99.48 0.00 0.00 0.0 0.00 3 28.74 5.11 2.34 0.0 0.50 4 29.68 3.18 2.79 0.0 0.51 ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU \ 0 ... 2499.0 684.0 2.32 1.5 60.0 1 ... 2499.0 684.0 2.32 1.5 60.0 2 ... 3069.0 840.0 2.80 1.8 73.0 3 ... 721.0 198.0 0.25 0.5 21.0 4 ... 1080.0 292.0 0.26 0.5 22.0 Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg) 0 7.0 51.368 21.021 3.043 215.0 1 7.0 50.489 23.426 3.012 219.0 2 8.6 61.924 28.732 3.694 256.0 3 2.4 18.669 7.778 0.800 75.0 4 2.5 18.764 8.598 0.784 94.0 [5 rows x 36 columns] NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) \ 8613 83110 MACKEREL SALTED 43.00 305 18.50 8614 90240 SCALLOP (BAY&SEA) CKD STMD 70.25 111 20.54 8615 90480 SYRUP CANE 26.00 269 0.00 8616 90560 SNAIL RAW 79.20 90 16.10 8617 93600 TURTLE GREEN RAW 78.50 89 19.80 Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) \ 8613 25.10 13.40 0.00 0.0 0.0 8614 0.84 2.97 5.41 0.0 0.0 8615 0.00 0.86 73.14 0.0 73.2 8616 1.40 1.30 2.00 0.0 0.0 8617 0.50 1.20 0.00 0.0 0.0 ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU \ 8613 ... 157.0 47.0 2.38 25.2 1006.0 8614 ... 5.0 2.0 0.00 0.0 2.0 8615 ... 0.0 0.0 0.00 0.0 0.0 8616 ... 100.0 30.0 5.00 0.0 0.0 8617 ... 100.0 30.0 0.50 0.0 0.0 Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg) 8613 7.8 7.148 8.320 6.210 95.0 8614 0.0 0.218 0.082 0.222 41.0 8615 0.0 0.000 0.000 0.000 0.0 8616 0.1 0.361 0.259 0.252 50.0 8617 0.1 0.127 0.088 0.170 50.0 [5 rows x 36 columns]