【数学建模-某肿瘤疾病诊疗的经济学分析】数据分析

简介: 本文提供了针对一个肿瘤疾病诊疗经济学分析的数学建模案例,其中包括了数据清洗、特征工程、模型分析等步骤,并提供了相关的代码和最终报告的下载链接。

相关信息

1【数学建模-某肿瘤疾病诊疗的经济学分析】数据分析
2 【数学建模-某肿瘤疾病诊疗的经济学分析】数据清洗和特征工程
3 【数学建模-某肿瘤疾病诊疗的经济学分析】第一问模型分析
4 【代码下载】
5【30页的论文下载】

题目

江西省数学建模-某肿瘤疾病诊疗的经济学分析

基于病人的基本数据,疾病类型(主诉和并发,是否手术),住院天数和费用等,数据清洗并建立数学模型做如下分析:
1、建立根据不同疾病的分类模型。建立诊疗费用与疾病类型的数学关系,并进行预测和检验。
2、建立数学模型分析诊疗费用与各类疾病的亚群的特征,比如,高费用人群的年龄,性别,住院日期和相关数据的相关性,尝试对特定的亚群建立预测模型并进行验证。
3、如果该疾病纳入医保,尝试给出根据疾病类型、建议年龄段和国家承担的经济费用的方案并对相关方案合理性和经济性作出评估。

1.png

1 数据集解析

提供的数据集,包含患者序号、患者性别、出生日期、患者入院日期、患者出院日期、主要诊断编码名称、其他诊断、其他手术、住院总费用、住院天数、DRGS分组编码、DRGS分组名称、ADRG名称、费用异常标识。

2 数据集主要特征分析

import numpy as np
import pandas as pd
train_data_file = './cdata.csv'
if __name__ =="__main__":
    t_data = pd.read_csv(train_data_file)#, names=['id', 'sex','born','intime','outtime','maindiag','elsediag','surgery','fee','days','drgsid','drgs','adrgid','adrg','highfee'])
    t_data.columns = ['id', 'sex','born','intime','outtime','maindiag','elsediag','surgery','fee','days','drgsid','drgs','adrgid','adrg','highfee']
    print()
    t_data.describe()

(1)数据长度:17739
(2)主要诊断类别:183种

def maindiag_extract(data):
    text_len =[]
    datalen = len(data)
    for i in range(0,datalen):
        one_lines = ''.join(list(data['maindiag'][i]))
        text_id = one_lines.strip().split("|")
        text_len.append(text_id[0])
    all_category = list(set(text_len))
    print(all_category)
    print(len(all_category))
    print()

(3)次要诊断类别:803

def elsediag_extract(data):
    text_len =[]
    datalen = len(data)
    for i in range(0,datalen):
        nontext = data['elsediag'][i]
        if pd.isnull(nontext):
            continue
        one_lines = ''.join(list(nontext))
        text = one_lines.strip().split(",")
        for j in range(len(text)):
            text_id = text[j].strip().split("|")
            text_len.append(text_id[0])
    all_category = list(set(text_len))
    print(all_category)
    print(len(all_category))

(4)DRGs类别数:72类

def drgs_extract(data):
    text_len =[]
    datalen = len(data)
    for i in range(0,datalen):
        text_id = data['drgsid'][i]
        text_len.append(text_id)
    all_category = list(set(text_len))
    print(all_category)
    print(len(all_category))
    print()

(5)DRGS分组平均费用分布分析

import numpy as np
import pandas as pd
# import tensorflow as tf
from category_encoders.target_encoder import TargetEncoder
import matplotlib.pyplot as plt
import statsmodels.api as sm 
def fee_range(data):
    text_len =[]
    # category =[]
    category={}
    feelist =[]
    datalen = len(data)
    for i in range(0,datalen):
        text_id = data['drgsid'][i]
        data_fee = data['fee'][i]
        feelist.append(data_fee)
        category[text_id] =list(set(feelist))   
    ncate ={} 
    for k in category.keys():
        # 取每个分组下的费用平均
        ncate[k] = np.mean(category[k])

    a_cate = dict(sorted(ncate.items(), key=lambda x: x[1], reverse=True))
    x = list(a_cate.keys())
    y = list(a_cate.values())
    plt.scatter(x, y, alpha=0.9)  # 绘制散点图,透明度为0.6(这样颜色浅一点,比较好看)
    plt.show()
    print(a_cate)  
    print()
if __name__ =="__main__":
    t_data = pd.read_csv(train_data_file)#, names=['id', 'sex','born','intime','outtime','maindiag','elsediag','surgery','fee','days','drgsid','drgs','adrgid','adrg','highfee'])
    t_data.columns = ['id', 'sex','born','intime','outtime','maindiag','elsediag','surgery','fee','days','drgsid','drgs','adrgid','adrg','highfee']
    fee_range(t_data)
    print()

可以看出,DRGS其实是大体上是划分了费用的取用的。

2.png

(6)DRGS分组类别分布

def box_line(data):
    text_len =[]
    # category =[]
    category={}
    cate_box = pd.DataFrame()
    datalen = len(data)
    # feelist =[]
    for i in range(0,datalen):
        text_id = data['drgsid'][i]
        data_fee = data['fee'][i]
        if text_id in category.keys():
            templist = list(category[text_id])
            templist.append(data_fee)
            category[text_id] =list(set(templist))
        else:
            category[text_id] = [data_fee]
    pxy = {}
    for k in category.keys():
        pxy[k] = len(category[k])
        # print(k,len(category[k]))
    resultxy = dict(sorted(pxy.items(), key=lambda x: x[1]))
    x = list(resultxy.keys())
    y = list(resultxy.values())
    for j in resultxy.keys():
        print(j,resultxy[j])
    plt.xlabel('DRGs')
    plt.title('Distribution of the number of grouping categories ')
    plt.ylabel('The amount of DRGS')
    plt.xticks([])
    # x = [i for i in range(len(y))]
    plt.scatter(x, y, alpha=0.9)  # 绘制散点图,透明度为0.6(这样颜色浅一点,比较好看)
    plt.show()
    print()

3.png

DA13 1
DE11 1
DK13 1
DR15 1
DR11 1
GK35 1
IJ13 1
IU35 1
IU31 1
JB23 1
KR13 1
LT13 1
LZ13 1
QR15 1
QS31 1
QT11 1
RA21 1
RA31 1
RA35 1
RD15 1
RU15 1
KR11 2
RA23 2
RT15 2
RV15 2
EJ15 3
ET13 3
RD13 3
RD11 3
RS15 3
RS13 3
RT11 3
XT19 3
BU11 4
EJ13 4
QS43 4
RA33 4
XJ19 6
RA41 12
JR15 13
ED13 14
HR13 16
QR13 17
JR13 19
QT13 19
ER11 25
RT13 35
RU23 35
IU33 40
RU11 43
BU13 46
DR13 51
RA45 54
QS33 55
GR11 59
RV11 68
RA43 96
RE15 170
GR15 214
GR13 217
RC15 223
RE11 241
RC11 243
ER15 288
XS29 378
ER13 412
XT39 420
RW19 469
RV13 2465
RU13 2829
RC13 3910
RE13 4272

(7)DRGS分组中费用范围箱线图

def box_line(data):
    text_len =[]
    # category =[]
    category={}
    cate_box = pd.DataFrame()
    datalen = len(data)
    # feelist =[]
    for i in range(0,datalen):
        text_id = data['drgsid'][i]
        data_fee = data['fee'][i]
        if text_id in category.keys():
            templist = list(category[text_id])
            templist.append(data_fee)
            category[text_id] =list(set(templist))
        else:
            category[text_id] = [data_fee]
    pxy = {}
    for k in category.keys():
        pxy[k] = len(category[k])
        # print(k,len(category[k]))
    sordict = dict(sorted(pxy.items(), key=lambda x: x[1]))
    resultxy ={}
    for k in sordict.keys():
        resultxy[k] = category[k]
    for k in resultxy.keys():
        templi = list(resultxy[k])
        templen = len(templi)
        if 4272 > templen:
            for i in range(4272-templen):
                templi.append(np.nan)
        cate_box[k] = templi
    cate_box.plot.box(title="Fee-categroy")
    plt.grid(linestyle="--", alpha=0.3)
    plt.show()

    print()

4.png


5.png

(8)ADRG的类别分布,39种类别

def drgs_box_line(data):
    category={}
    cate_box = pd.DataFrame()
    datalen = len(data)
    # feelist =[]
    for i in range(0,datalen):
        text_id = data['adrgid'][i]
        data_fee = data['fee'][i]
        if text_id in category.keys():
            templist = list(category[text_id])
            templist.append(data_fee)
            category[text_id] =list(set(templist))
        else:
            category[text_id] = [data_fee]
    pxy = {}
    for k in category.keys():
        pxy[k] = len(category[k])
        # print(k,len(sordict[k]))
    sordict = dict(sorted(pxy.items(), key=lambda x: x[1]))
    resultxy ={}
    # ADRG计算类别排序
    for k in sordict.keys():
        # resultxy[k] = category[k]
        print(k,sordict[k])

DA1 1
DE1 1
DK1 1
GK3 1
IJ1 1
JB2 1
LT1 1
LZ1 1
ET1 3
KR1 3
RA2 3
XT1 3
QS4 4
RA3 6
RS1 6
XJ1 6
EJ1 7
RD1 7
ED1 14
HR1 16
QR1 18
QT1 20
JR1 32
RU2 35
RT1 40
IU3 42
BU1 50
DR1 53
QS3 56
RA4 162
XS2 378
XT3 420
RW1 469
GR1 490
ER1 725
RV1 2535
RU1 2872
RC1 4376
RE1 4682

(9)ADRG与费用的箱线图

def drgs_box_line(data):
    category={}
    cate_box = pd.DataFrame()
    datalen = len(data)
    # feelist =[]
    for i in range(0,datalen):
        text_id = data['adrgid'][i]
        data_fee = data['fee'][i]
        if text_id in category.keys():
            templist = list(category[text_id])
            templist.append(data_fee)
            category[text_id] =list(set(templist))
        else:
            category[text_id] = [data_fee]
    pxy = {}
    for k in category.keys():
        pxy[k] = np.mean(category[k])
        # print(k,len(sordict[k]))
    sordict = dict(sorted(pxy.items(), key=lambda x: x[1]))
    resultxy ={}
    for k in sordict.keys():
        resultxy[k] = category[k]
        # print(k,sordict[k])
    for k in resultxy.keys():
        templi = list(resultxy[k])
        templen = len(templi)
        if 4682 > templen:
            for i in range(4682-templen):
                templi.append(np.nan)
        cate_box[k] = templi
    cate_box.plot.box(title="Fee-categroy")
    plt.grid(linestyle="--", alpha=0.3)
    plt.title('Relationship between ADRG and medical fee')
    plt.xlabel('ADRG')
    plt.ylabel('medical fee')
    plt.show()

6.png

(10)ADRG中ER1、GR1、QS3等每个类别中的样本数据分布,都呈现相似曲线上升。

7.png

总结:
(1)数据长度:17739行
(2)主要诊断类别:183种
(3)DRGs类别数:72种
(4)次要诊断类别:803
(5)ADRG的类别:39种
(6)ADRG编码和DRGS编码无缺失值,但是分布很不均匀,有的类别,只有1个样本,有的类别有4682种。对训练模型来说很不友好。
(7)最后一列属性,是费用异常,可以看到有高费用异常和低费用异常,暂且不知道这些属性有何意义
(8)ADRG中每个类别中的样本数据分布,都呈现相似曲线上升。

3 数据集亚群特征分析

参考类似的病例分析案例,需要分析年龄、性别、有无并发症、住院时长等特征https://www.cn-healthcare.com/articlewm/20181214/content-1042985.html

8.png

(1)年龄与平均费用关系折线图

def age_static(data):
    age_fee ={}
    datalen = len(data)
    for i in range(0,datalen):
        born_year = data['born'][i]
        if born_year=='0 AM':
            continue
        else:
            intime = ''.join(data['intime'][i])
            in_year = intime.strip().split("/")
            age = int(in_year[2])-int(born_year)
            data_fee = data['fee'][i]
            if age in age_fee.keys():
                templist = list(age_fee[age])
                templist.append(data_fee)
                age_fee[age] =list(templist)
            else:
                age_fee[age] = [data_fee] 
    # 计算平均费用
    avg_age_fee ={}
    for k in age_fee.keys():
        avg = np.mean(list(age_fee[k]))
        avg_age_fee[k] = avg
    sort_avg_fee = dict(sorted(avg_age_fee.items(), key=lambda x: x[0]))
    print(sort_avg_fee)
    x = list(sort_avg_fee.keys())
    y = list(sort_avg_fee.values())
    plt.plot(x,y,'b--',label='age-fee')
    plt.title('Relationship between age and cost')
    plt.xlabel('age')
    plt.ylabel('medical-fee')
    plt.show()
    print()

9.png

(2)阶段年龄分布柱状图

30: 25658.83080291971,
40: 25232.891867549668
50: 26072.089125503106
60: 27377.498989296368
70: 32492.331597490345
90: 36317.296185236126}

10.png

def age_static(data):
    age_fee ={}
    datalen = len(data)
    for i in range(0,datalen):
        born_year = data['born'][i]
        if born_year=='0 AM':
            continue
        else:
            intime = ''.join(data['intime'][i])
            in_year = intime.strip().split("/")
            age = int(in_year[2])-int(born_year)
            data_fee = data['fee'][i]
            if age in age_fee.keys():
                templist = list(age_fee[age])
                templist.append(data_fee)
                age_fee[age] =list(templist)
            else:
                age_fee[age] = [data_fee] 
    # 计算平均费用
    avg_age_fee ={}
    for k in age_fee.keys():
        avg = np.mean(list(age_fee[k]))
        avg_age_fee[k] = avg
    sort_avg_fee = dict(sorted(avg_age_fee.items(), key=lambda x: x[0]))
    #绘制直方图,阶段年龄与平均费用的
    li30 =[]
    li40 =[]
    li50 =[]
    li60 =[]
    li70 =[]
    limax =[]
    n_age_fee = {}
    for k in age_fee.keys():
        age = int(k)
        if age <=30:
            li30.extend(age_fee[k])
        elif age <=40:
            li40.extend(age_fee[k])
        elif age <=50:
            li50.extend(age_fee[k])
        elif age <=60:
            li60.extend(age_fee[k])
        elif age <=70:
            li70.extend(age_fee[k])
        else:
            limax.extend(age_fee[k])
    n_age_fee[30] = li30
    n_age_fee[40] = li40
    n_age_fee[50] = li50
    n_age_fee[60] = li60
    n_age_fee[70] = li70
    n_age_fee[90] = limax
    # 计算平均费用
    level_age_fee ={}
    for k in n_age_fee.keys():
        avg = np.mean(list(n_age_fee[k]))
        level_age_fee[k] = avg
    sort_level_age_fee = dict(sorted(level_age_fee.items(), key=lambda x: x[0]))
    x = ['<=30','31-40','41-50','51-60','61-70','>=70']
    y = list(sort_level_age_fee.values())
    plt.title('Relationship between age_range and cost')
    plt.xlabel('age_range')
    plt.ylabel('medical-fee')
    for a,b in zip(x,y):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x, y)
    plt.show()
    print()

11.png

def age_static(data):
age_fee ={}
    datalen = len(data)
    for i in range(0,datalen):
        born_year = data['born'][i]
        if born_year=='0 AM':
            continue
        else:
            intime = ''.join(data['intime'][i])
            in_year = intime.strip().split("/")
            age = int(in_year[2])-int(born_year)
            data_fee = data['fee'][i]
            if age in age_fee.keys():
                templist = list(age_fee[age])
                templist.append(data_fee)
                age_fee[age] =list(templist)
            else:
                age_fee[age] = [data_fee] 
    # 计算平均费用
    avg_age_fee ={}
    for k in age_fee.keys():
        avg = np.mean(list(age_fee[k]))
        avg_age_fee[k] = avg
    sort_avg_fee = dict(sorted(avg_age_fee.items(), key=lambda x: x[0]))
    #绘制直方图,阶段年龄与平均费用的
    li30 =[]
    li40 =[]
    li50 =[]
    li60 =[]
    li70 =[]
    limax =[]
    n_age_fee = {}
    for k in age_fee.keys():
        age = int(k)
        if age <=30:
            li30.extend(age_fee[k])
        elif age <=40:
            li40.extend(age_fee[k])
        elif age <=50:
            li50.extend(age_fee[k])
        elif age <=60:
            li60.extend(age_fee[k])
        elif age <=70:
            li70.extend(age_fee[k])
        else:
            limax.extend(age_fee[k])
    n_age_fee[30] = len(li30)
    n_age_fee[40] = len(li40)
    n_age_fee[50] = len(li50)
    n_age_fee[60] = len(li60)
    n_age_fee[70] = len(li70)
    n_age_fee[90] = len(limax)
    # 计算平均费用
    sort_level_age_fee = dict(sorted(n_age_fee.items(), key=lambda x: x[0]))
    x = ['<=30','31-40','41-50','51-60','61-70','>=70']
    y = list(sort_level_age_fee.values())
    plt.title('Relationship between age_range and population')
    plt.xlabel('age_range')
    plt.ylabel('population')
    for a,b in zip(x,y):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x, y)
    plt.show()
    print()

(4)性别和人口以及费用关系的柱状图

12.png

13.png

def sex_static(data):
    sex_fee ={}
    datalen = len(data)
    male =[]
    female =[]
    for i in range(0,datalen):
        sex = data['sex'][i]
        if sex=='未知':
            continue
        elif sex=='男':
            male.append(data['fee'][i])
        else:
            female.append(data['fee'][i])
        sex_fee['male'] = male
        sex_fee['female'] = female
    # 计算平均费用
    avg_sex_fee ={}
    for k in sex_fee.keys():
        avg = np.mean(list(sex_fee[k]))
        avg_sex_fee[k] = avg
    n_sex_fee={}
    n_sex_fee['male'] = len(sex_fee['male'])
    n_sex_fee['female'] = len(sex_fee['female'])

    x = ['male','female']
    y1 = list(avg_sex_fee.values())
    y2 = list(n_sex_fee.values())
    plt.figure()
    plt.title('Relationship between gender and cost')
    plt.xlabel('gender')
    plt.ylabel('medical-fee')
    for a,b in zip(x,y1):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x, y1)
    plt.figure()
    plt.title('Relationship between gender and population')
    plt.xlabel('gender')
    plt.ylabel('population')
    for a,b in zip(x,y2):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x, y2)
    plt.show()
    print()

(5)住院时长与平均费用的关系

def duration_static(data):
    duration_fee ={}
    datalen = len(data)
    for i in range(0,datalen):
        intime  = ''.join(data['intime'][i]).strip()
        outtime = ''.join(data['outtime'][i]).strip()
        data_fee = data['fee'][i]
        date1=datetime.datetime.strptime(outtime[0:10],"%m/%d/%Y")
        date2=datetime.datetime.strptime(intime[0:10],"%m/%d/%Y")
        day =(date1-date2).days
        if int(day)>300:
            continue
        if day in duration_fee.keys():
            templist = list(duration_fee[day])
            templist.append(data_fee)
            duration_fee[day] =list(templist)
        else:
            duration_fee[day] = [data_fee] 
    # 计算平均费用
    avg_duration_fee ={}
    for k in duration_fee.keys():
        avg = np.mean(list(duration_fee[k]))
        avg_duration_fee[k] = avg
    sort_avg_fee = dict(sorted(avg_duration_fee.items(), key=lambda x: x[0]))
    #绘制直方图,阶段年龄与平均费用的
    li01 =[]
    li25 =[]
    li69 =[]
    li60 =[]
    li100 =[]
    n_duration_fee = {}
    a_duration_fee ={}
    for k in duration_fee.keys():
        day = int(k)
        if day ==0 or day ==1:
            li01.extend(duration_fee[k])
        elif day <=5:
            li25.extend(duration_fee[k])
        elif day <=9:
            li69.extend(duration_fee[k])
        elif day <=60:
            li60.extend(duration_fee[k])
        else:
            li100.extend(duration_fee[k])
    n_duration_fee[1] = len(li01)
    n_duration_fee[25] = len(li25)
    n_duration_fee[69] = len(li69)
    n_duration_fee[60] = len(li60)
    n_duration_fee[100] = len(li100)
    # 计算平均费用
    a_duration_fee[1] = np.mean(li01)
    a_duration_fee[25] = np.mean(li25)
    a_duration_fee[69] = np.mean(li69)
    a_duration_fee[60] = np.mean(li60)
    a_duration_fee[100] = np.mean(li100)
    sort_level_duration_fee = dict(sorted(a_duration_fee.items(), key=lambda x: x[0]))
    '''
    x1 = list(sort_avg_fee.keys())
    x2 = ['0-1','2-5','6-9','10-60','>=60']
    y1 = list(sort_avg_fee.values())
    y2 = list(sort_level_duration_fee.values())
    plt.title('Relationship between hospital-time and medical fee')
    plt.xlabel('day range')
    plt.ylabel('medical fee')
    # plt.xticks([])
    plt.scatter(x1, y1, alpha=0.9)  # 绘制散点图,透明度为0.6(这样颜色浅一点,比较好看)
    plt.figure()
    plt.title('Relationship between hospital-time and population')
    plt.xlabel('day range')
    plt.ylabel('population')
    for a,b in zip(x2,y2):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x2, y2)
    plt.show()
    '''
    x3 = ['0-1','2-5','6-9','10-60','>=60']
    y3 = list(sort_level_duration_fee.values())
    plt.title('Relationship between hospital-time and medical fee')
    plt.xlabel('day range')
    plt.ylabel('medical fee')
    for a,b in zip(x3,y3):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x3, y3)
    plt.show()
    print()

14.png


15.png

(6)住院时长与费用的关系

16.png

17.png

更新的住院时长与费用的关系,除以了区间的天数
(7)有无并发症与费用的关系,有无并发症与人口数量的关系

def complication_static(data):
    com_fee ={}
    datalen = len(data)
    serious =[]
    general =[]
    non = []
    for i in range(0,datalen):
        drgsid = ''.join(data['drgsid'][i]).strip()
        lastid = int(drgsid[-1])
        if lastid==1:
            serious.append(data['fee'][i])
        elif lastid==3:
            general.append(data['fee'][i])
        elif lastid==5:
            non.append(data['fee'][i])
        else:
            continue
        com_fee['serious'] = serious
        com_fee['general'] = general
        com_fee['non'] = non
    # 计算平均费用
    avg_com_fee ={}
    for k in com_fee.keys():
        avg = np.mean(list(com_fee[k]))
        avg_com_fee[k] = avg
    n_com_fee={}
    n_com_fee['serious'] = len(com_fee['serious'])
    n_com_fee['general'] = len(com_fee['general'])
    n_com_fee['non'] = len(com_fee['non'])
    x = ['serious','general','non']
    y1 = list(avg_com_fee.values())
    y2 = list(n_com_fee.values())
    plt.figure()
    plt.title('Relationship between complication and medical fee')
    plt.xlabel('complication')
    plt.ylabel('medical fee')
    for a,b in zip(x,y1):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x, y1)
    plt.figure()
    plt.title('Relationship between complication and population')
    plt.xlabel('complication')
    plt.ylabel('population')
    for a,b in zip(x,y2):  
        plt.text(a, b+0.05, '%.0f' % b, ha='center', va= 'bottom',fontsize=11)
    plt.bar(x, y2)
    plt.show()
    print()

18.png


19.png

总结:
(1)费用与年龄呈非线性关系,年龄越大,平均费用越高
(2)男性比女性人数多,男性比女性平均费用高
(3)并发症分为三种,严重、一般、无,根据数据分析发现,一般的并发症费用较低,严重的并发症费用最高,得一般并发症的人数最多。
(4)住院时长与费用呈线性关系,住院费用随着住院时长而线性增长。

目录
相关文章
|
3月前
|
数据采集 存储 数据挖掘
【优秀python数据分析案例】基于Python书旗网小说网站数据采集与分析的设计与实现
本文介绍了一个基于Python的书旗网小说网站数据采集与分析系统,通过自动化爬虫收集小说数据,利用Pandas进行数据处理,并通过Matplotlib和Seaborn等库进行数据可视化,旨在揭示用户喜好和市场趋势,为图书出版行业提供决策支持。
312 6
【优秀python数据分析案例】基于Python书旗网小说网站数据采集与分析的设计与实现
|
1月前
|
数据挖掘 UED
ChatGPT数据分析——探索性分析
ChatGPT数据分析——探索性分析
|
1月前
|
数据可视化 数据挖掘 数据处理
ChatGPT数据分析应用——热力图分析
ChatGPT数据分析应用——热力图分析
|
1月前
|
数据挖掘
ChatGPT在常用的数据分析方法中的应用(分组分析)
ChatGPT在常用的数据分析方法中的应用(分组分析)
|
1月前
|
机器学习/深度学习 数据采集 数据可视化
如何理解数据分析及数据的预处理,分析建模,可视化
如何理解数据分析及数据的预处理,分析建模,可视化
49 0
|
1月前
|
数据挖掘
ChatGPT在常用的数据分析方法中的应用(对比分析)
ChatGPT在常用的数据分析方法中的应用(对比分析)
|
2月前
|
机器学习/深度学习 人工智能 数据挖掘
数据分析师是在多个行业中专门从事数据搜集、整理和分析的专业人员
数据分析师是在多个行业中专门从事数据搜集、整理和分析的专业人员
38 3
|
3月前
|
数据采集 数据可视化 数据挖掘
【2021 年 MathorCup 高校数学建模挑战赛—赛道A二手车估价问题】1 数据分析及可视化
介绍了2021年MathorCup高校数学建模挑战赛赛道A的二手车估价问题,包括数据的读取、宏观查看、缺失值和异常值的检查、数据分布和相关性的分析,以及特征类别的统计,为建立二手车估价模型提供了数据预处理和分析的基础。
52 5
|
3月前
|
前端开发 Java JSON
Struts 2携手AngularJS与React:探索企业级后端与现代前端框架的完美融合之道
【8月更文挑战第31天】随着Web应用复杂性的提升,前端技术日新月异。AngularJS和React作为主流前端框架,凭借强大的数据绑定和组件化能力,显著提升了开发动态及交互式Web应用的效率。同时,Struts 2 以其出色的性能和丰富的功能,成为众多Java开发者构建企业级应用的首选后端框架。本文探讨了如何将 Struts 2 与 AngularJS 和 React 整合,以充分发挥前后端各自优势,构建更强大、灵活的 Web 应用。
58 0