# 数据分析 | Numpy实战(一) - 分析某单车骑行时间

#### 实战

##### 数据收集

"Duration (ms)","Start date","End date","Start station number","Start station","End station number","End station","Bike number","Member type"

# 数据收集
def data_collection():
data_arr_list = []
for data_filename in data_filenames:
file = os.path.join(data_path, data_filename)
data_arr_list.append(data_arr)
return data_arr_list

Data Science | Numpy基础(一)

Data Science | Numpy基础(二)

Data Science | 福利列表 | Numpy基础(三)

##### 数据清洗

# 数据清洗
def data_clean(data_arr_list):
duration_min_list = []
for data_arr in data_arr_list:
data_arr = data_arr[:,0]
duration_ms = np.core.defchararray.replace(data_arr,'"','')
duration_min = duration_ms.astype('float') / 1000 / 60
duration_min_list.append(duration_min)
return duration_min_list
##### 数据分析

# 数据分析
def mean_data(duration_min_list):
duration_mean_list = []
for duration_min in duration_min_list:
duration_mean = np.mean(duration_min)
duration_mean_list.append(duration_mean)
return duration_mean_list
##### 结果展示

# 数据展示
def show_data(duration_mean_list):
plt.figure()
name_list = ['第一季度', '第二季度', '第三季度', '第四季度']
plt.bar(range(len(duration_mean_list)),duration_mean_list,tick_label = name_list)
plt.show()

#### 一些踩过的坑

##### 关于数据读取(一)

数据收集部分如果不注意这一点，在数据清洗部分，字段的格式就会因为Duration的值多了一个b转化上就会报错。

##### 关于数据读取上的坑（二）

#500G, 特殊 一行
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
if not chunk:
#说明已经读到了文件结尾
yield buf
break
buf += chunk
with open("input.txt") as f:
print (line)
##### 关于matplotlib.pyplot使用上的坑

解决方式一：修改配置文件
(1)找到matplotlibrc文件（搜索一下就可以找到了）
(2)修改：font.serif和font.sans-serif，我的在205,206行
font.serif: SimHei, Bitstream Vera Serif, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif
font.sans-serif: SimHei, Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif

import matplotlib

matplotlib.rcParams[‘font.sans-serif’] = [‘SimHei’]
matplotlib.rcParams[‘font.family’]=’sans-serif’

matplotlib.rcParams[‘axes.unicode_minus’] = False
