pandas数据操作

2017-10-08 1001

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： pandas数据操作字符串方法Series对象在其str属性中配备了一组字符串处理方法，可以很容易的应用到数组中的每个元素import numpy as npimport pandas as pdt = pd.

pandas数据操作

字符串方法

Series对象在其str属性中配备了一组字符串处理方法，可以很容易的应用到数组中的每个元素

import numpy as np
import pandas as pd

t = pd.Series(['a_b_c_d','c_d_e',np.nan,'f_g_h'])
t

0    a_b_c_d
1      c_d_e
2        NaN
3      f_g_h
dtype: object

t.str.cat(['A','B','C','D'],sep=',') # 拼接字符串

0    a_b_c_d,A
1      c_d_e,B
2          NaN
3      f_g_h,D
dtype: object

t.str.split('_') # 切分字符串

0    [a, b, c, d]
1       [c, d, e]
2             NaN
3       [f, g, h]
dtype: object

t.str.get(0) # 获取指定位置的字符串

0      a
1      c
2    NaN
3      f
dtype: object

t.str.replace("_", ".") # 替换字符串

0    a.b.c.d
1      c.d.e
2        NaN
3      f.g.h
dtype: object

t.str.pad(10, fillchar="?") # 左补齐

0    ???a_b_c_d
1    ?????c_d_e
2           NaN
3    ?????f_g_h
dtype: object

t.str.pad(10, side="right", fillchar="?") # 右补齐

0    a_b_c_d???
1    c_d_e?????
2           NaN
3    f_g_h?????
dtype: object

t.str.center(10, fillchar="?") #中间补齐

0    ?a_b_c_d??
1    ??c_d_e???
2           NaN
3    ??f_g_h???
dtype: object

t.str.find('d') # 查找给定字符串的位置，左边开始

0    6.0
1    2.0
2    NaN
3   -1.0
dtype: float64

t.str.rfind('d') # 查找给定字符串的位置，右边开始

0    6.0
1    2.0
2    NaN
3   -1.0
dtype: float64

数据转置(行列转换)

dates = pd.date_range('20130101',periods=10)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
               '2013-01-09', '2013-01-10'],
              dtype='datetime64[ns]', freq='D')

df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D'])
df.head()

	A	B	C	D
2013-01-01	-0.665173	0.516813	0.745156	-0.303295
2013-01-02	-0.953574	2.125147	0.238382	-0.400209
2013-01-03	-0.233966	2.066662	0.331000	-2.802471
2013-01-04	2.038273	0.982127	-1.096000	-1.051818
2013-01-05	-1.438657	-1.208042	-0.375673	0.384522

df.head().T # 行列转换

	2013-01-01 00:00:00	2013-01-02 00:00:00	2013-01-03 00:00:00	2013-01-04 00:00:00	2013-01-05 00:00:00
A	-0.665173	-0.953574	-0.233966	2.038273	-1.438657
B	0.516813	2.125147	2.066662	0.982127	-1.208042
C	0.745156	0.238382	0.331000	-1.096000	-0.375673
D	-0.303295	-0.400209	-2.802471	-1.051818	0.384522

对数据应用function

df.head().apply(np.cumsum) # cumsum 累加

	A	B	C	D
2013-01-01	-0.665173	0.516813	0.745156	-0.303295
2013-01-02	-1.618747	2.641960	0.983537	-0.703504
2013-01-03	-1.852713	4.708622	1.314537	-3.505975
2013-01-04	0.185560	5.690749	0.218537	-4.557793
2013-01-05	-1.253098	4.482707	-0.157135	-4.173271

频率

计算值出现的次数，类似直方图

s = pd.Series(np.random.randint(0, 7, size=10))
s

0    3
1    3
2    1
3    6
4    3
5    3
6    5
7    2
8    1
9    0
dtype: int32

s.value_counts()

3    4
1    2
6    1
5    1
2    1
0    1
dtype: int64

pandas数据操作

pandas数据操作

字符串方法

数据转置(行列转换)

对数据应用function

频率

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

直播

下载

镜像站

技术资料

pandas数据操作

pandas数据操作

字符串方法

数据转置(行列转换)

对数据应用function

频率

热门文章

最新文章

相关课程

相关电子书