气象数据一直是一个价值较高的数据,它被广泛用于各个领域的研究当中。气象数据包括有气温、气压、相对湿度、降水、蒸发、风向风速、日照等多种指标,但是包含了这些全部指标的气象数据却较难获取,即使获取到了也不能随意分享。
想要大规模爬取的话,需要自己写爬虫,我之前写过一个爬取深圳市数据的爬虫。对深圳市的天气数据爬取基本没有问题。
import requests import demjson import re import calendar import csv headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36\ (KHML, like Gecko) Chrome/52.0.2743.116 Safari/537.36', } def get_url(date): url = 'https://www.timeanddate.com/scripts/cityajax.php?n=china/shenzhen&mode=historic' url += '&hd=' + date url += '&month=' + str(int(date[4:6])) url += '&year=' + date[:4] + '&json=1' return url # input: type(str) eg:'20170601' def crawl_single_day(date): response = requests.get(get_url(date), headers=headers) response_list = demjson.decode(response.text) for weather in response_list: w_time = re.compile(r'^\d+:\d+').search(weather['c'][0]['h']).group(0) w_temperature = re.compile( r'^-?\d+').search(weather['c'][2]['h']).group(0) w_weather = re.compile( r'^(.*?)\.').search(weather['c'][3]['h']).group(1) if weather['c'][4]['h'] == 'No wind': w_wind_speed = '0' else: w_wind_speed = re.compile( r'^\d+').search(weather['c'][4]['h']).group(0) w_wind_direction = re.compile( r'title=\"(.*?)\"').search(weather['c'][5]['h']).group(1) w_humidity = weather['c'][6]['h'] w_barometer = re.compile(r'^\d+').search(weather['c'][7]['h']).group(0) w_visibility = weather['c'][8]['h'] if w_visibility != 'N/A': w_visibility=re.compile(r'^\d+').search(w_visibility).group(0) yield [date, w_time, w_temperature, w_weather, w_wind_speed, w_wind_direction, w_humidity, w_barometer, w_visibility] # input: type(int) eg: year=2017, month=6 def crawl_single_month(year, month): _, num_day = calendar.monthrange(year, month) month_str = str(year) if month < 10: month_str += '0' + str(month) else: month_str += str(month) day_list = list(range(1, num_day + 1)) for day in day_list: if day < 10: for weather in crawl_single_day(month_str + '0' + str(day)): yield weather else: for weather in crawl_single_day(month_str + str(day)): yield weather if __name__ == "__main__": with open('weather0.csv', 'w', encoding='utf-8', newline='') as file: writer = csv.writer(file) writer.writerow('date time temperature weather wind_speed wind_direction humidity barometer visibility'.split()) for month in range(7, 13): writer.writerows(crawl_single_month(2017, month)) with open('weather1.csv', 'w', encoding='utf-8', newline='') as file: writer = csv.writer(file) writer.writerow('date time temperature weather wind_speed wind_direction humidity barometer visibility'.split()) writer.writerows(crawl_single_day('20210401'))
对 20210401的深圳天气数据爬取获得的 csv 文件如下图所示:
当然啦,需求量比较大的话,可以通过地理遥感生态网平台获取气象数据。
地理遥感生态网平台http://www.gisrs.cn发布的气象数据包括有气温、气压、相对湿度、降水、蒸发、风向风速、日照太阳辐射等等多种指标。
1级目录文件名PRSSURF_CLI_CHN_MUL_DAY-PRS-10004-YYYYMM.TXT(本站气压)TEMSURF_CLI_CHN_MUL_DAY-TEM-12001-YYYYMM.TXT(气温)RHUSURF_CLI_CHN_MUL_DAY-RHU-13003-YYYYMM.TXT(相对湿度)PRESURF_CLI_CHN_MUL_DAY-PRE-13011-YYYYMM.TXT(降水)EVPSURF_CLI_CHN_MUL_DAY-EVP-13240-YYYYMM.TXT(蒸发)WINSURF_CLI_CHN_MUL_DAY-WIN-11002-YYYYMM.TXT(风向风速)SSDSURF_CLI_CHN_MUL_DAY-SSD-14032-YYYYMM.TXT(日照)GSTSURF_CLI_CHN_MUL_DAY-GST-12030-0cm-YYYYMM.TXT(0cm地温)
赶紧三连关注下, 数据获取途径如下:
版权声明:本文为CSDN博主「地理遥感生态网」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。