南大《探索数据的奥秘》课件示例代码笔记04

简介: 南大《探索数据的奥秘》课件示例代码笔记04

Chp5-1

2019 年 12 月 20 日

In [3]: import pandas as pd
my_data = pd.read_csv("C:\Python\Scripts\my_data\Titanic.csv")
my_data
Out[3]: PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
5 6 0 3
6 7 0 1
7 8 0 3
8 9 1 3
9 10 1 2
10 11 1 3
11 12 1 1
12 13 0 3
13 14 0 3
14 15 0 3
15 16 1 2
16 17 0 3
17 18 1 2
18 19 0 3
19 20 1 3
20 21 0 2
21 22 1 2
22 23 1 3
23 24 1 1
24 25 0 3
25 26 1 3
26 27 0 3
27 28 0 1
28 29 1 3
29 30 0 3
.. … … …
861 862 0 2
862 863 1 1
863 864 0 3
864 865 0 2
865 866 1 2
866 867 1 2
867 868 0 1
868 869 0 3
869 870 1 3
870 871 0 3
871 872 1 1
872 873 0 1
873 874 0 3
874 875 1 2
875 876 1 3
876 877 0 3
877 878 0 3
878 879 0 3
879 880 1 1
880 881 1 2
881 882 0 3
882 883 0 3
883 884 0 2
884 885 0 3
885 886 0 3
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0
7 Palsson, Master. Gosta Leonard male 2.0 3
8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0
9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1
10 Sandstrom, Miss. Marguerite Rut female 4.0 1
11 Bonnell, Miss. Elizabeth female 58.0 0
12 Saundercock, Mr. William Henry male 20.0 0
13 Andersson, Mr. Anders Johan male 39.0 1
14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0
15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0
16 Rice, Master. Eugene male 2.0 4
17 Williams, Mr. Charles Eugene male NaN 0
18 Vander Planke, Mrs. Julius (Emelia Maria Vande… female 31.0 1
19 Masselmani, Mrs. Fatima female NaN 0
20 Fynney, Mr. Joseph J male 35.0 0
21 Beesley, Mr. Lawrence male 34.0 0
22 McGowan, Miss. Anna "Annie" female 15.0 0
23 Sloper, Mr. William Thompson male 28.0 0
24 Palsson, Miss. Torborg Danira female 8.0 3
25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia… female 38.0 1
26 Emir, Mr. Farred Chehab male NaN 0
27 Fortune, Mr. Charles Alexander male 19.0 3
28 O'Dwyer, Miss. Ellen "Nellie" female NaN 0
29 Todoroff, Mr. Lalio male NaN 0
.. … … … …
861 Giles, Mr. Frederick Edward male 21.0 1
862 Swift, Mrs. Frederick Joel (Margaret Welles Ba… female 48.0 0
863 Sage, Miss. Dorothy Edith "Dolly" female NaN 8
864 Gill, Mr. John William male 24.0 0
865 Bystrom, Mrs. (Karolina) female 42.0 0
866 Duran y More, Miss. Asuncion female 27.0 1
867 Roebling, Mr. Washington Augustus II male 31.0 0
868 van Melkebeke, Mr. Philemon male NaN 0
869 Johnson, Master. Harold Theodor male 4.0 1
870 Balkic, Mr. Cerin male 26.0 0
871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1
872 Carlsson, Mr. Frans Olof male 33.0 0
873 Vander Cruyssen, Mr. Victor male 47.0 0
874 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1
875 Najib, Miss. Adele Kiamie "Jane" female 15.0 0
876 Gustafsson, Mr. Alfred Ossian male 20.0 0
877 Petroff, Mr. Nedelio male 19.0 0
878 Laleff, Mr. Kristo male NaN 0
879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0
880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0
881 Markun, Mr. Johann male 33.0 0
882 Dahlberg, Miss. Gerda Ulrika female 22.0 0
883 Banfield, Mr. Frederick James male 28.0 0
884 Sutehall, Mr. Henry Jr male 25.0 0
885 Rice, Mrs. William (Margaret Norton) female 39.0 0
886 Montvila, Rev. Juozas male 27.0 0
887 Graham, Miss. Margaret Edith female 19.0 0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN 1
889 Behr, Mr. Karl Howell male 26.0 0
890 Dooley, Mr. Patrick male 32.0 0
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
5 0 330877 8.4583 NaN Q
6 0 17463 51.8625 E46 S
7 1 349909 21.0750 NaN S
8 2 347742 11.1333 NaN S
9 0 237736 30.0708 NaN C
10 1 PP 9549 16.7000 G6 S
11 0 113783 26.5500 C103 S
12 0 A/5. 2151 8.0500 NaN S
13 5 347082 31.2750 NaN S
14 0 350406 7.8542 NaN S
15 0 248706 16.0000 NaN S
16 1 382652 29.1250 NaN Q
17 0 244373 13.0000 NaN S
18 0 345763 18.0000 NaN S
19 0 2649 7.2250 NaN C
20 0 239865 26.0000 NaN S
21 0 248698 13.0000 D56 S
22 0 330923 8.0292 NaN Q
23 0 113788 35.5000 A6 S
24 1 349909 21.0750 NaN S
25 5 347077 31.3875 NaN S
26 0 2631 7.2250 NaN C
27 2 19950 263.0000 C23 C25 C27 S
28 0 330959 7.8792 NaN Q
29 0 349216 7.8958 NaN S
.. … … … … …
861 0 28134 11.5000 NaN S
862 0 17466 25.9292 D17 S
863 2 CA. 2343 69.5500 NaN S
864 0 233866 13.0000 NaN S
865 0 236852 13.0000 NaN S
866 0 SC/PARIS 2149 13.8583 NaN C
867 0 PC 17590 50.4958 A24 S
868 0 345777 9.5000 NaN S
869 1 347742 11.1333 NaN S
870 0 349248 7.8958 NaN S
871 1 11751 52.5542 D35 S
872 0 695 5.0000 B51 B53 B55 S
873 0 345765 9.0000 NaN S
874 0 P/PP 3381 24.0000 NaN C
875 0 2667 7.2250 NaN C
876 0 7534 9.8458 NaN S
877 0 349212 7.8958 NaN S
878 0 349217 7.8958 NaN S
879 1 11767 83.1583 C50 C
880 1 230433 26.0000 NaN S
881 0 349257 7.8958 NaN S
882 0 7552 10.5167 NaN S
883 0 C.A./SOTON 34068 10.5000 NaN S
884 0 SOTON/OQ 392076 7.0500 NaN S
885 5 382652 29.1250 NaN Q
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q
[891 rows x 12 columns]
In [ ]: import numpy as np
from scipy import stats
from matplotlib import pyplot as plt
In [3]: my_data.head(5)
Out[3]: PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
In [2]: my_dummy=pd.get_dummies((my_data[['Embarked']]),prefix='Embarked')
my_dummy.head(5)
Out[2]: Embarked_C Embarked_Q Embarked_S
0 0 0 1
1 1 0 0
2 0 0 1
3 0 0 1
4 0 0 1
In [4]: print(my_data.info()) # 这里可以看到, dataframe 的 info 方法能返回对数据的一些总结
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
None
目录
相关文章
南大《探索数据的奥秘》课件示例代码笔记12
南大《探索数据的奥秘》课件示例代码笔记12
62 0
南大《探索数据的奥秘》课件示例代码笔记11
南大《探索数据的奥秘》课件示例代码笔记11
71 0
南大《探索数据的奥秘》课件示例代码笔记01
南大《探索数据的奥秘》课件示例代码笔记01
40 0
南大《探索数据的奥秘》课件示例代码笔记05
南大《探索数据的奥秘》课件示例代码笔记05
64 0
南大《探索数据的奥秘》课件示例代码笔记10
南大《探索数据的奥秘》课件示例代码笔记10
64 0
南大《探索数据的奥秘》课件示例代码笔记02
南大《探索数据的奥秘》课件示例代码笔记02
60 0
南大《探索数据的奥秘》课件示例代码笔记08
南大《探索数据的奥秘》课件示例代码笔记08
48 0
南大《探索数据的奥秘》课件示例代码笔记14
南大《探索数据的奥秘》课件示例代码笔记14
62 0
南大《探索数据的奥秘》课件示例代码笔记17
南大《探索数据的奥秘》课件示例代码笔记17
40 0
南大《探索数据的奥秘》课件示例代码笔记18
南大《探索数据的奥秘》课件示例代码笔记18
53 0