pandas数据操作
字符串方法
Series对象在其str属性中配备了一组字符串处理方法,可以很容易的应用到数组中的每个元素
import numpy as np
import pandas as pd
t = pd.Series(['a_b_c_d','c_d_e',np.nan,'f_g_h'])
t
0 a_b_c_d
1 c_d_e
2 NaN
3 f_g_h
dtype: object
t.str.cat(['A','B','C','D'],sep=',')
0 a_b_c_d,A
1 c_d_e,B
2 NaN
3 f_g_h,D
dtype: object
t.str.split('_')
0 [a, b, c, d]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object
t.str.get(0)
0 a
1 c
2 NaN
3 f
dtype: object
t.str.replace("_", ".")
0 a.b.c.d
1 c.d.e
2 NaN
3 f.g.h
dtype: object
t.str.pad(10, fillchar="?")
0 ???a_b_c_d
1 ?????c_d_e
2 NaN
3 ?????f_g_h
dtype: object
t.str.pad(10, side="right", fillchar="?")
0 a_b_c_d???
1 c_d_e?????
2 NaN
3 f_g_h?????
dtype: object
t.str.center(10, fillchar="?")
0 ?a_b_c_d??
1 ??c_d_e???
2 NaN
3 ??f_g_h???
dtype: object
t.str.find('d')
0 6.0
1 2.0
2 NaN
3 -1.0
dtype: float64
t.str.rfind('d')
0 6.0
1 2.0
2 NaN
3 -1.0
dtype: float64
数据转置(行列转换)
dates = pd.date_range('20130101',periods=10)
dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
'2013-01-09', '2013-01-10'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D'])
df.head()
|
A |
B |
C |
D |
2013-01-01 |
-0.665173 |
0.516813 |
0.745156 |
-0.303295 |
2013-01-02 |
-0.953574 |
2.125147 |
0.238382 |
-0.400209 |
2013-01-03 |
-0.233966 |
2.066662 |
0.331000 |
-2.802471 |
2013-01-04 |
2.038273 |
0.982127 |
-1.096000 |
-1.051818 |
2013-01-05 |
-1.438657 |
-1.208042 |
-0.375673 |
0.384522 |
df.head().T
|
2013-01-01 00:00:00 |
2013-01-02 00:00:00 |
2013-01-03 00:00:00 |
2013-01-04 00:00:00 |
2013-01-05 00:00:00 |
A |
-0.665173 |
-0.953574 |
-0.233966 |
2.038273 |
-1.438657 |
B |
0.516813 |
2.125147 |
2.066662 |
0.982127 |
-1.208042 |
C |
0.745156 |
0.238382 |
0.331000 |
-1.096000 |
-0.375673 |
D |
-0.303295 |
-0.400209 |
-2.802471 |
-1.051818 |
0.384522 |
对数据应用function
df.head().apply(np.cumsum)
|
A |
B |
C |
D |
2013-01-01 |
-0.665173 |
0.516813 |
0.745156 |
-0.303295 |
2013-01-02 |
-1.618747 |
2.641960 |
0.983537 |
-0.703504 |
2013-01-03 |
-1.852713 |
4.708622 |
1.314537 |
-3.505975 |
2013-01-04 |
0.185560 |
5.690749 |
0.218537 |
-4.557793 |
2013-01-05 |
-1.253098 |
4.482707 |
-0.157135 |
-4.173271 |
频率
计算值出现的次数,类似直方图
s = pd.Series(np.random.randint(0, 7, size=10))
s
0 3
1 3
2 1
3 6
4 3
5 3
6 5
7 2
8 1
9 0
dtype: int32
s.value_counts()
3 4
1 2
6 1
5 1
2 1
0 1
dtype: int64