熊猫应用功能速度慢_问答-阿里云开发者社区

def f(x): d = {} d['fips_2'] = x["fips"].values[0] d['apn_2'] = x["apn"].values[0] d['most_recent_sale'] = x["recording_date"].nlargest(1).iloc[-1] d['second_most_recent_sale'] = x["recording_date"].nlargest(2).iloc[-1] d['third_most_recent_sale'] = x["recording_date"].nlargest(3).iloc[-1] d['most_recent_price'] = x.loc[x["recording_date"] == d["most_recent_sale"], "price"].values[0] d['second_most_recent_price'] = x.loc[x["recording_date"] == d["second_most_recent_sale"], "price"].values[0] d['third_most_recent_price'] = x.loc[x["recording_date"] == d["third_most_recent_sale"], "price"].values[0] d['second_grantor'] = x.loc[x["recording_date"] == d["most_recent_sale"], "seller"].values[0] d['prior_grantor'] = x.loc[x["recording_date"] == d["second_most_recent_sale"], "seller"].values[0] d['type'] = x["type"].values[0] print(x["apn"].values[0]) return pd.Series(d, index=['apn_2', 'most_recent_sale', 'second_most_recent_sale', 'most_recent_price', 'second_most_recent_price', 'second_grantor', 'type']) df_grouped = year_past_df.groupby("apn").apply(f)

一个改进是删除几个最大的调用并在开始时进行一次排序。我不知道所有的列作为一个示例数据集是失踪，但像这样的东西可能工作:

def f(x):
    x = x.sort_values("recording_date")
    d = {}
    d['fips_2'] = x["fips"].values[0]
    d['apn_2'] = x["apn"].values[0]
    d['most_recent_sale'] = x.sale.iloc[-1]
    d['second_most_recent_sale'] = x.sale.iloc(-2)
    d['third_most_recent_sale'] = x.sale.iloc(-2)
    d['most_recent_price'] = x.price.iloc(-1)
    d['second_most_recent_price'] = x.price.iloc(-2)
    d['third_most_recent_price'] = x.price.iloc(-3)
    d['second_grantor'] = x.seller.iloc(-1)
    d['prior_grantor'] = x.seller.iloc(-2)
    d['type'] = x["type"].values[0]
    return pd.Series(d, index=['apn_2', 'most_recent_sale', 'second_most_recent_sale', 'most_recent_price', 'second_most_recent_price', 'second_grantor', 'type'])

df_grouped = year_past_df.groupby("apn").apply(f)

另一种选择是在开始时对整个数据集进行排序，然后使用类似这样的agg函数:

agg_dir = {
    'fips': 'first',
    'sale': ['last', lambda x: x.iloc[-2], lambda x: x.iloc[-3]],
    'price': ['last', lambda x: x.iloc[-2], lambda x: x.iloc[-3]],
    'seller': ['last', lambda x: x.iloc[-2]],
    'type': 'first'
}
df_grouped = year_past_df.sort_values("recording_date").groupby("apn").agg(agg_dir)
df_grouped.columns = ['fips_2', 'most_recent_sale', 'second_most_recent_sale', 
                      'third_most_recent_sale', 'most_recent_price', 'second_most_recent_price', 
                      'third_most_recent_price', 'second_grantor', 'prior_grantor', 'type']

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

熊猫应用功能速度慢

相关电子书