日期维度表是数据仓库设计中的重要部分,在之前一篇MySQL 制作日期表有介绍使用SQL语言制作,本文来一波如何使用Python生成日期维度表。
- 导入必要的库
需要导入datetime模块,以便在Python中处理日期;导入pandas模块,便于展示,如果遇到没有次模块问题,可前往终端进行pip/conda install 模块;
from datetime import date, timedelta import pandas as pd
进行测试,定义生成日期维度表的函数
日期信息变量可以进行打印测试,待测试没有问题后,进行封装,定义一个名为
- generate_pub_calendar的函数,该函数接受起始日期和结束日期作为参数,并返回日期相关信息。
def generate_pub_calendar(start_date, end_date): date_data_info = [] current_date = start_date # 使用循环生成日期区间的信息 while current_date <= end_date: calendar_year = current_date.year year_start_dt = date(current_date.year, 1, 1) year_end_dt = date(current_date.year, 12, 31) calendar_week = current_date.strftime('%W') # calendar_year_week 年份和年中第几周拼接,乘出位数相加,用作排序和定位年周 calendar_year_week = current_date.year * 100 + current_date.isocalendar()[1] calendar_weekday = current_date.weekday()+1 calendar_weekday_name = current_date.strftime('%A') week_begin_dt = current_date - timedelta(days=current_date.weekday()) week_end_dt = week_begin_dt + timedelta(days=6) calendar_month = current_date.month calendar_month_name = current_date.strftime('%B') month_begin_dt = date(current_date.year, current_date.month, 1) # month_end_dt 先得出下一个月第一天 减去一天 即可 month_end_dt = date(current_date.year + int(current_date.month/12), current_date.month % 12 + 1, 1) - timedelta(days=1) calendar_year_month = current_date.year*100 + current_date.month # calendar_quarter //(向下取整除,它会返回整除结果的整数部分) calendar_quarter = (current_date.month - 1)//3 + 1 # quarter_begin_dt 根据当前季度数*3月份数再加上1个月 quarter_begin_dt = date(current_date.year, ((current_date.month - 1) // 3) * 3 + 1, 1) # quarter_end_dt 根据 month_end_dt 得出逻辑 和 calendar_quarter 配合即可得出 quarter_end_dt = date(current_date.year + int(((current_date.month - 1)//3 + 1) * 3/12),(((current_date.month - 1)//3 + 1) * 3) % 12 + 1, 1) - timedelta(days=1) # calendar_year_quarter 根据 calendar_year、calendar_quarter 做拼接即可 calendar_year_quarter = current_date.year * 100 + ((current_date.month - 1)//3 + 1) # week_in_calendar_quarter 根据 calendar_week 减去 quarter_begin_dt 所在的年周 即可得出 # week_in_calendar_quarter = current_date.isocalendar()[1] - date(current_date.year, ((current_date.month - 1) // 3) * 3 + 1, 1).isocalendar()[1] week_in_calendar_quarter = int(current_date.strftime('%W')) - int(date(current_date.year, ((current_date.month - 1) // 3) * 3 + 1, 1).strftime('%W')) # calendar_year_quarter_week 根据 calendar_year、calendar_quarter 、 week_in_calendar_quarter 做拼接即可 calendar_year_quarter_week = (current_date.year * 100 + ((current_date.month - 1)//3 + 1)) * 100 + (int(current_date.strftime('%W')) - int(date(current_date.year, ((current_date.month - 1) // 3) * 3 + 1, 1).strftime('%W'))) # day_in_calendar_year 当前日期 减 year_start_dt 即可 day_in_calendar_year = (current_date - date(current_date.year, 1, 1)).days day_in_calendar_quarter = (current_date - date(current_date.year, ((current_date.month - 1) // 3) * 3 + 1, 1)).days day_in_calendar_month = current_date.day # 追加日期区间的信息数据 date_data_info.append({ 'calendar_date': current_date, 'calendar_year': calendar_year, 'year_start_dt': year_start_dt, 'year_end_dt': year_end_dt, 'calendar_week': calendar_week, 'calendar_year_week': calendar_year_week, 'calendar_weekday': calendar_weekday, 'calendar_weekday_name': calendar_weekday_name, 'week_begin_dt': week_begin_dt, 'week_end_dt': week_end_dt, 'calendar_month': calendar_month, 'calendar_month_name': calendar_month_name, 'month_begin_dt': month_begin_dt, 'month_end_dt': month_end_dt, 'calendar_year_month': calendar_year_month, 'calendar_quarter': calendar_quarter, 'quarter_begin_dt': quarter_begin_dt, 'quarter_end_dt': quarter_end_dt, 'calendar_year_quarter': calendar_year_quarter, 'week_in_calendar_quarter': week_in_calendar_quarter, 'calendar_year_quarter_week': calendar_year_quarter_week, 'day_in_calendar_year': day_in_calendar_year, 'day_in_calendar_quarter': day_in_calendar_quarter, 'day_in_calendar_month': day_in_calendar_month }) current_date += timedelta(days=1) return date_data_info
在这个函数中,我们使用循环和datetime模块提供的功能来计算相关日期信息,最后将这些信息存储为字典。
- 调用函数并生成日期维度表
start_date = date(2023, 1, 1) end_date = date(2023, 12, 31) pub_calendar_data = generate_pub_calendar(start_date, end_date) df = pd.DataFrame(pub_calendar_data) print(df.tail(10).to_markdown(index=False))
通过调用这个函数,我们可以生成包含各种日期维度信息的日期维度表。
日期维度表为数据分析和业务智能提供了强大的基础,生成日期维度表是数据分析工作中的基础步骤之一,其实实现思路基本一致,只要你理清思路了,无论使用何种语言和工具,都可实现,大家可以实操起来,看看都掌握了几种工具。