Pandas时间序列

2017-10-09 1270

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Pandas时间序列pandas 提供了一组标准的时间序列处理工具和数据算法数据类型及操作Python 标准库的 datetimedatetime 模块中的 datetime、 time、 calendar 等类都可以用来存储时间类型以及进行一些转换和运算操作。

Pandas时间序列

pandas 提供了一组标准的时间序列处理工具和数据算法

数据类型及操作

Python 标准库的 datetime

datetime 模块中的 datetime、 time、 calendar 等类都可以用来存储时间类型以及进行一些转换和运算操作。

from datetime import datetime
now = datetime.now()
now

datetime.datetime(2017, 10, 9, 12, 41, 23, 916666)

delta = datetime(2010,2,2)-datetime(2010,2,1)
delta

datetime.timedelta(1)

now + delta

datetime.datetime(2017, 10, 10, 12, 41, 23, 916666)

datetime 对象间的减法运算会得到一个 timedelta 对象，表示一个时间段。

datetime 对象与它所保存的字符串格式时间戳之间可以互相转换。str() 函数是可用的，但更推荐 datetime.strptime() 方法。这个方法可以实现双向转换。

str(now)

'2017-10-09 12:41:23.916666'

now.strftime('%Y-%m-%d')

'2017-10-09'

datetime.strptime('2010-01-01','%Y-%m-%d')

datetime.datetime(2010, 1, 1, 0, 0)

pandas 的 TimeStamp

pandas 最基本的时间日期对象是一个从 Series 派生出来的子类 TimeStamp，这个对象与 datetime 对象保有高度兼容性，可通过 pd.to_datetime() 函数转换。（一般是从 datetime 转换为 Timestamp）

import pandas as pd

pd.to_datetime(now)

Timestamp('2017-10-09 12:41:23.916666')

import numpy as np

pd.to_datetime(np.nan)

NaT

pandas 的时间序列

pandas 最基本的时间序列类型就是以时间戳（TimeStamp）为 index 元素的 Series 类型。

dates = [datetime(2011,1,1),datetime(2011,1,2),datetime(2011,1,3)]
ts = pd.Series(np.random.randn(3),index=dates)
ts

2011-01-01   -0.233171
2011-01-02   -1.053316
2011-01-03   -0.448214
dtype: float64

type(ts)

pandas.core.series.Series

ts.index

DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq=None)

ts.index[0]

Timestamp('2011-01-01 00:00:00')

时间序列之间的算术运算会自动按时间对齐。

索引、选取、子集构造

时间序列只是 index 比较特殊的 Series ，因此一般的索引操作对时间序列依然有效。其特别之处在于对时间序列索引的操作优化。如使用各种字符串进行索引：

ts['20110101']

-0.23317140272262557

ts['2011-01-01']

-0.23317140272262557

ts['01/01/2011']

-0.23317140272262557

对于较长的序列，还可以只传入 “年” 或 “年月” 选取切片：

ts

2011-01-01   -0.233171
2011-01-02   -1.053316
2011-01-03   -0.448214
dtype: float64

ts['2011']

2011-01-01   -0.233171
2011-01-02   -1.053316
2011-01-03   -0.448214
dtype: float64

ts['2011-1-2':'2012-12']

2011-01-02   -1.053316
2011-01-03   -0.448214
dtype: float64

生成日期范围

pd.date_range() 可用于生成指定长度的 DatetimeIndex。参数可以是起始结束日期，或单给一个日期，加一个时间段参数。日期是包含的。

pd.date_range('20100101','20100110')

DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
               '2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
               '2010-01-09', '2010-01-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start='20100101',periods=10)

DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
               '2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
               '2010-01-09', '2010-01-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(end='20100110',periods=10)

DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
               '2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
               '2010-01-09', '2010-01-10'],
              dtype='datetime64[ns]', freq='D')

移动（超前和滞后）数据

移动（shifting）指的是沿着时间轴将数据前移或后移。Series 和 DataFrame 都有一个 .shift() 方法用于执行单纯的移动操作，index 维持不变：

ts

2011-01-01   -0.233171
2011-01-02   -1.053316
2011-01-03   -0.448214
dtype: float64

ts.shift(2)

2011-01-01         NaN
2011-01-02         NaN
2011-01-03   -0.233171
dtype: float64

ts.shift(-2)

2011-01-01   -0.448214
2011-01-02         NaN
2011-01-03         NaN
dtype: float64

因为移动操作产生了 NA 值，另一种移动方法是移动 index，而保持数据不变。这种移动方法需要额外提供一个 freq 参数来指定移动的频率：

ts.shift(2,freq='D')

2011-01-03   -0.233171
2011-01-04   -1.053316
2011-01-05   -0.448214
Freq: D, dtype: float64

ts.shift(2,freq='3D')

2011-01-07   -0.233171
2011-01-08   -1.053316
2011-01-09   -0.448214
Freq: D, dtype: float64

时期及其算术运算

时期（period）概念不同于前面的时间戳（timestamp），指的是一个时间段。但在使用上并没有太多不同，pd.Period 类的构造函数仍需要一个时间戳，以及一个 freq 参数。freq 用于指明该 period 的长度，时间戳则说明该 period 在公园时间轴上的位置。

p = pd.Period(2010,freq='M')
p

Period('2010-01', 'M')

p + 2

Period('2010-03', 'M')

上例中我给 period 的构造器传了一个 “年” 单位的时间戳和一个 “Month” 的 freq，pandas 便自动把 2010 解释为了 2010-01。

period_range 函数可用于创建规则的时间范围：

pd.period_range('2010-01','2010-05',freq='M')

PeriodIndex(['2010-01', '2010-02', '2010-03', '2010-04', '2010-05'], dtype='period[M]', freq='M')

PeriodIndex 类保存了一组 period，它可以在任何 pandas 数据结构中被用作轴索引：

pd.Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))

2010-01    1.770363
2010-02   -0.402647
2010-03   -0.562749
2010-04   -0.606754
2010-05   -0.368662
Freq: M, dtype: float64

重采样

Pandas可以通过频率转换简单高效的进行重新采样

Pandas在对频率转换进行重新采样时拥有简单、强大且高效的功能（如将按秒采样的数据转换为按分钟为单位进行采样的数据）。这种操作在金融领域非常常见。

rng = pd.date_range('1/1/2012', periods=10, freq='S')
rng

DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:00:01',
               '2012-01-01 00:00:02', '2012-01-01 00:00:03',
               '2012-01-01 00:00:04', '2012-01-01 00:00:05',
               '2012-01-01 00:00:06', '2012-01-01 00:00:07',
               '2012-01-01 00:00:08', '2012-01-01 00:00:09'],
              dtype='datetime64[ns]', freq='S')

ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts

2012-01-01 00:00:00    417
2012-01-01 00:00:01    192
2012-01-01 00:00:02     86
2012-01-01 00:00:03    393
2012-01-01 00:00:04    354
2012-01-01 00:00:05    234
2012-01-01 00:00:06    440
2012-01-01 00:00:07    248
2012-01-01 00:00:08     59
2012-01-01 00:00:09    335
Freq: S, dtype: int32

ts.resample('1Min').sum() # 将秒级数据整合(加)成1min的数据

2012-01-01    2758
Freq: T, dtype: int32

其他类型数值转为时间类型

时间字符串转时间格式：整型例如 20010100000000 这类格式容易当成时间戳转错，带format格式才行

a = pd.DataFrame([[20010101,100000,'aaa'],[20010201,230100,'bbb']])
a

	0	1	2
0	20010101	100000	aaa
1	20010201	230100	bbb

pd.to_datetime(a[0],format='%Y%m%d')

0   2001-01-01
1   2001-02-01
Name: 0, dtype: datetime64[ns]

Pandas时间序列

Pandas时间序列

数据类型及操作

Python 标准库的 datetime

pandas 的 TimeStamp

pandas 的时间序列

索引、选取、子集构造

生成日期范围

移动（超前和滞后）数据

时期及其算术运算

重采样

其他类型数值转为时间类型

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Pandas时间序列

Pandas时间序列

数据类型及操作

Python 标准库的 datetime

pandas 的 TimeStamp

pandas 的时间序列

索引、选取、子集构造

生成日期范围

移动（超前和滞后）数据

时期及其算术运算

重采样

其他类型数值转为时间类型

热门文章

最新文章

相关课程

相关电子书