ML之NB:基于news新闻文本数据集利用纯统计法、kNN、朴素贝叶斯(高斯/多元伯努利/多项式)、线性判别分析LDA、感知器等算法实现文本分类预测

简介: ML之NB:基于news新闻文本数据集利用纯统计法、kNN、朴素贝叶斯(高斯/多元伯努利/多项式)、线性判别分析LDA、感知器等算法实现文本分类预测

设计思路

image.png


输出结果

image.png

image.png

image.png

image.png

F:\Program Files\Python\Python36\lib\site-packages\gensim\utils.py:1209: UserWarning: detected Windows; aliasing chunkize to chunkize_serial

 warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 1293 entries, 0 to 1292

Data columns (total 6 columns):

#   Column      Non-Null Count  Dtype

---  ------      --------------  -----

0   Unnamed: 0  1293 non-null   int64

1   content     1292 non-null   object

2   id          1293 non-null   int64

3   tags        1293 non-null   object

4   time        1293 non-null   object

5   title       1293 non-null   object

dtypes: int64(2), object(4)

memory usage: 60.7+ KB

None

  Unnamed: 0                                            content  \

0           0   牵动人心的雄安新区规划细节内容和出台时间表敲定。日前,北京商报记者从业内获悉,京津冀协同发...  

1           1  去年以来,多个城市先后发布了多项楼市调控政策。在限购、限贷甚至限售的政策“组合拳”下,房地产...  

2           2  在今年中国国际自行车展上,上海凤凰自行车总裁王朝阳表示,共享单车的到来把我们打懵了,影响更是...  

3           3  25家上市银行迎来了一年一度的“分红季”,21世纪经济报道记者根据公开信息梳理发现,25家银...  

4           4  说起卷饼,大家其实并不陌生,这个来自中原的传统美食,发展至今也衍生出各种各样的种类,卷边的制...  

                   id                                  tags  \

0  6428905748545732865   ['财经', '白洋淀', '城市规划', '徐匡迪', '太行山']  

1  6428954136200855810   ['财经', '碧桂园', '万科集团', '投资', '广州恒大']  

2  6420576443738784002    ['财经', '自行车', '凤凰', '王朝阳', '汽车展览']  

3  6429007290541031681  ['财经', '银行', '工商银行', '兴业银行', '交通银行']  

4  6397481672254619905     ['财经', '小吃', '装修', '市场营销', '手工艺']  

                 time                   title  

0  2017-06-07 22:52:55  雄安新区规划“骨架”敲定,方案有望9月底出炉  

1  2017-06-08 08:01:13       “红五月”不红 房企资金链压力攀升  

2  2017-05-16 12:03:00      凤凰自行车总裁:共享单车把我们打懵了  

3  2017-06-08 07:00:00    25家银行分红季派出3536亿“大红包”  

4  2017-03-15 07:03:22      五万以下的小本餐饮项目,卷饼赚钱最稳  

chinese_pattern re.compile('[\\u4e00-\\u9fff]+')

Building prefix dict from F:\File_Jupyter\实用代码\naive_bayes(简单贝叶斯)\jieba_dict\dict.txt.big ...

Loading model from cache C:\Users\niu\AppData\Local\Temp\jieba.ue3752d4e13420d2dc6b66831a5a4ab13.cache

Loading model cost 1.326 seconds.

Prefix dict has been built succesfully.

dictionary

<class 'gensim.corpora.dictionary.Dictionary'> Dictionary(46351 unique tokens: ['一个', '一个个', '一举一动', '一些', '一体']...)

<class 'method'> <bound method Dictionary.doc2bow of <gensim.corpora.dictionary.Dictionary object at 0x000001BDC62291D0>>

F:\Program Files\Python\Python36\lib\site-packages\numpy\core\_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

 return array(a, dtype, copy=False, order=order)

  Unnamed: 0                                            content  \

0           0   牵动人心的雄安新区规划细节内容和出台时间表敲定。日前,北京商报记者从业内获悉,京津冀协同发...  

1           1  去年以来,多个城市先后发布了多项楼市调控政策。在限购、限贷甚至限售的政策“组合拳”下,房地产...  

2           2  在今年中国国际自行车展上,上海凤凰自行车总裁王朝阳表示,共享单车的到来把我们打懵了,影响更是...  

                   id                                 tags  \

0  6428905748545732865  ['财经', '白洋淀', '城市规划', '徐匡迪', '太行山']  

1  6428954136200855810  ['财经', '碧桂园', '万科集团', '投资', '广州恒大']  

2  6420576443738784002   ['财经', '自行车', '凤凰', '王朝阳', '汽车展览']  

                 time                   title  \

0  2017-06-07 22:52:55  雄安新区规划“骨架”敲定,方案有望9月底出炉  

1  2017-06-08 08:01:13       “红五月”不红 房企资金链压力攀升  

2  2017-05-16 12:03:00      凤凰自行车总裁:共享单车把我们打懵了  

                                          doc_words  \

0  [牵动人心, 雄安, 新区, 规划, 细节, 内容, 出台, 时间表, 敲定, 日前, 北京...  

1  [去年, 以来, 多个, 城市, 先后, 发布, 多项, 楼市, 调控, 政策, 限购, 限...  

2  [今年, 中国, 国际, 自行车, 展上, 上海, 凤凰, 自行车, 总裁, 王, 朝阳, ...  

                                             corpus  \

0  [(0, 6), (1, 1), (2, 1), (3, 3), (4, 2), (5, 2...  

1  [(0, 1), (3, 3), (13, 1), (17, 1), (41, 1), (5...  

2  [(15, 1), (53, 1), (167, 1), (262, 1), (396, 1...  

                                              tfidf  

0  [(0, 0.005554342859788116), (1, 0.007470250835...  

1  [(0, 0.002081356679198299), (3, 0.012288034179...  

2  [(15, 0.057457146244872616), (53, 0.0543395377...  

after abs 4.7683716e-07

foo: (1293, 1293)

dis2TSNE_Visual:  (1293, 2)

{'养生': 0, '科技': 1, '财经': 2, '游戏': 3, '育儿': 4, '汽车': 5}

data_frame.keyword_index: 1    379

2    287

5    283

4    148

3    141

0     55

Name: keyword_index, dtype: int64

  Unnamed: 0                                            content  \

0           0   牵动人心的雄安新区规划细节内容和出台时间表敲定。日前,北京商报记者从业内获悉,京津冀协同发...  

1           1  去年以来,多个城市先后发布了多项楼市调控政策。在限购、限贷甚至限售的政策“组合拳”下,房地产...  

2           2  在今年中国国际自行车展上,上海凤凰自行车总裁王朝阳表示,共享单车的到来把我们打懵了,影响更是...  

                   id                                 tags  \

0  6428905748545732865  ['财经', '白洋淀', '城市规划', '徐匡迪', '太行山']  

1  6428954136200855810  ['财经', '碧桂园', '万科集团', '投资', '广州恒大']  

2  6420576443738784002   ['财经', '自行车', '凤凰', '王朝阳', '汽车展览']  

                 time                   title  \

0  2017-06-07 22:52:55  雄安新区规划“骨架”敲定,方案有望9月底出炉  

1  2017-06-08 08:01:13       “红五月”不红 房企资金链压力攀升  

2  2017-05-16 12:03:00      凤凰自行车总裁:共享单车把我们打懵了  

                                          doc_words  \

0  [牵动人心, 雄安, 新区, 规划, 细节, 内容, 出台, 时间表, 敲定, 日前, 北京...  

1  [去年, 以来, 多个, 城市, 先后, 发布, 多项, 楼市, 调控, 政策, 限购, 限...  

2  [今年, 中国, 国际, 自行车, 展上, 上海, 凤凰, 自行车, 总裁, 王, 朝阳, ...  

                                             corpus  \

0  [(0, 6), (1, 1), (2, 1), (3, 3), (4, 2), (5, 2...  

1  [(0, 1), (3, 3), (13, 1), (17, 1), (41, 1), (5...  

2  [(15, 1), (53, 1), (167, 1), (262, 1), (396, 1...  

                                              tfidf   visual01   visual02  \

0  [(0, 0.005554342859788116), (1, 0.007470250835... -65.903542 -14.433964  

1  [(0, 0.002081356679198299), (3, 0.012288034179... -29.659267 -14.811647  

2  [(15, 0.057457146244872616), (53, 0.0543395377... -22.118195 -48.148167  

  keyword_index  

0              2  

1              2  

2              2  

Childcare,label_category_ID_pos.tfidf)[:20]: ['孩子', '家长', '教育', '学习', '男孩子', '成绩', '爸爸', '分享', '帮助', '方法', '小学', '数学', '交流', '男孩', '妈妈', '成长', '父母', '懂', '免费', '翼航']

Childcare,label_category_ID_neg.tfidf)[:20]: []

train_index MatrixSimilarity<646 docs, 46329 features>

hot_words shape: 6 300

{0: {1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 18009, 7258, 4697, 7260, 16989, 3674, 91, 87, 16993, 18020, 616, 4714, 5228, 40044, 1646, 4720, 3185, 15986, 34928, 5236, 113, 34936, 6777, 126, 15999, 127, 4737, 40067, 5252, 643, 4739, 13444, 8840, 1157, 133, 4749, 3219, 10388, 17562, 5278, 46239, 5287, 3751, 167, 680, 6827, 4784, 16048, 16050, 180, 46260, 16054, 6839, 4792, 2743, 4789, 17083, 16060, 4790, 16062, 43200, 5315, 46276, 46279, 17098, 6860, 5836, 16081, 43219, 1237, 1750, 15575, 8921, 2266, 6877, 12511, 12512, 21216, 226, 4834, 6884, 16101, 4838, 742, 2280, 2281, 227, 7915, 6886, 6893, 2798, 6894, 5870, 4849, 242, 1779, 4852, 21215, 44791, 4864, 3329, 258, 4865, 4866, 44805, 4877, 21264, 4882, 274, 8986, 8987, 796, 32029, 4382, 21277, 4896, 1825, 801, 3363, 36644, 1830, 4393, 36138, 303, 815, 4401, 12594, 21299, 7986, 820, 310, 1337, 21307, 4411, 317, 33598, 5953, 17730, 5954, 10050, 17733, 17734, 25927, 21320, 17739, 4939, 21324, 4942, 33615, 6885, 16210, 6071, 18261, 5976, 860, 16740, 16745, 2922, 4969, 17263, 6512, 33649, 16242, 2419, 17775, 373, 1398, 880, 1916, 17276, 16255, 1920, 43394, 3974, 4999, 396, 8080, 16788, 18325, 1942, 16279, 1433, 43418, 36252, 17311, 43425, 16802, 7585, 15959, 7594, 36268, 4525, 7597, 5551, 6063, 36272, 36275, 4533, 16309, 18358, 36280, 1465, 441, 7611, 16825, 16829, 4538, 2488, 2495, 8129, 4545, 4547, 16836, 4549, 7621, 1484, 1997, 11214, 1999, 16846, 16847, 4563, 7636, 14293, 7638, 4567, 16855, 17369, 16861, 478, 16351, 18400, 17377, 993, 9699, 5085, 6111, 7645, 6119, 6124, 17903, 1011, 4597, 6646, 16376, 6138, 16891, 16892, 7165, 4606}, 1: {0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 12391, 28267, 12396, 109, 9836, 12399, 11884, 12401, 12400, 12403, 627, 117, 629, 9847, 628, 17020, 637, 9855, 639, 12418, 643, 1668, 133, 3715, 14470, 1160, 12424, 11912, 9867, 33420, 10376, 655, 12433, 148, 150, 3735, 1176, 12440, 154, 21659, 1180, 3742, 10399, 11936, 1185, 31904, 675, 13472, 167, 1704, 7337, 11946, 171, 172, 8876, 8878, 2734, 1200, 1709, 2226, 8877, 180, 1155, 697, 12475, 189, 8894, 1215, 1218, 4291, 708, 709, 3271, 2760, 6354, 2771, 1748, 213, 3798, 727, 730, 20187, 44767, 225, 2786, 2787, 13028, 1765, 1254, 13543, 26344, 740, 11497, 1771, 3819, 13549, 11502, 751, 1775, 752, 242, 21743, 12524, 759, 11511, 2809, 2812, 35581, 257, 8962, 771, 259, 15623, 1288, 3849, 12048, 1810, 786, 788, 3862, 793, 7450, 798, 24862, 7458, 12579, 31524, 31523, 7459, 1322, 810, 25391, 12081, 1329, 820, 3386, 1850, 9023, 319, 835, 9029, 325, 4424, 330, 12107, 13134, 846, 3409, 3924, 1878, 854, 344, 11609, 5978, 1883, 11612, 343, 11615, 358, 4457, 362, 875, 1385, 1900, 4462, 3439, 12144, 369, 3438, 1396, 38773, 28025, 2428, 13305, 13183, 12161, 12674, 1922, 34690, 2438, 1926, 13193, 907, 9100, 911, 13204, 1431, 10135, 2456, 44956, 925, 413, 32670, 1952, 928, 23455, 5540, 1956, 1447, 12200, 1448, 1452, 8109, 12205, 1965, 9651, 2486, 5559, 1464, 956, 1982, 959, 3522, 12235, 976, 3025, 10194, 1491, 12244, 465, 30675, 5585, 472, 470, 10714, 475, 3027, 478, 1503, 479, 5089, 483, 2532, 995, 9190, 5607, 1512, 1513, 9703, 10728, 494, 1518, 1520, 2545, 1007, 1524, 501, 503, 1017, 1534}, 2: {0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 3146, 1100, 26701, 1614, 1102, 592, 3577, 35410, 2639, 2644, 3159, 25688, 1626, 91, 3162, 1119, 608, 21089, 1634, 102, 2662, 31848, 2665, 11881, 27242, 12907, 1131, 1132, 15388, 2672, 3185, 1138, 627, 43124, 2675, 113, 1657, 2682, 3194, 127, 3715, 1668, 133, 3717, 135, 2696, 3209, 1162, 1158, 1676, 2701, 11916, 1167, 138, 1169, 148, 2710, 1174, 152, 1177, 22167, 26779, 21659, 157, 158, 1183, 30880, 1185, 26784, 2209, 2724, 3232, 672, 167, 4256, 8876, 685, 4269, 1202, 2226, 691, 1205, 3253, 1207, 2231, 2242, 4291, 14026, 27340, 1740, 1231, 14032, 24273, 3284, 1749, 213, 727, 217, 730, 2266, 14044, 1246, 1248, 225, 1254, 742, 745, 3819, 14060, 12013, 750, 1775, 242, 1780, 1268, 759, 760, 249, 33536, 1281, 261, 262, 2311, 1290, 267, 37132, 5902, 1810, 7958, 39191, 280, 793, 43813, 1318, 807, 295, 45354, 1324, 28461, 1838, 28462, 815, 1329, 820, 1333, 317, 2366, 39743, 832, 2365, 45378, 835, 330, 1356, 845, 334, 1359, 4433, 4438, 854, 14168, 1370, 1883, 1372, 1371, 860, 863, 3935, 3937, 1378, 11618, 3426, 870, 358, 3942, 361, 874, 362, 875, 28010, 3438, 2416, 369, 880, 14196, 886, 4472, 1403, 894, 895, 2432, 385, 904, 905, 27528, 907, 909, 911, 1431, 409, 1433, 925, 1950, 415, 928, 413, 13731, 3494, 20902, 937, 1452, 942, 1968, 1973, 1464, 1977, 956, 34240, 3009, 32706, 14278, 3015, 456, 1993, 973, 975, 976, 465, 466, 1491, 14290, 2512, 1494, 472, 475, 480, 3554, 995, 2532, 3048, 1513, 23529, 3564, 494, 498, 500, 501, 503, 1017, 3070}, 3: {1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 6731, 9293, 31823, 2133, 9303, 601, 91, 43615, 608, 9314, 10338, 25709, 1646, 10349, 6257, 7794, 27763, 11381, 9337, 7801, 637, 3709, 639, 11391, 9345, 7299, 3715, 1668, 41606, 11401, 11402, 4233, 9868, 10893, 142, 5259, 9872, 25744, 25741, 148, 10389, 34455, 3735, 8345, 8857, 154, 10396, 1178, 7839, 10399, 8554, 1704, 10409, 9900, 10412, 2734, 14512, 10416, 7858, 9394, 9904, 6325, 2232, 1721, 38589, 8894, 6336, 1220, 9925, 11461, 3271, 9420, 719, 14544, 2773, 3286, 3287, 214, 20187, 9438, 26335, 6048, 13534, 226, 3811, 19172, 1766, 2280, 36585, 14575, 2801, 9457, 10993, 10485, 23797, 759, 27896, 5882, 8443, 23803, 1790, 767, 8962, 9476, 7433, 6924, 2316, 2318, 3853, 14608, 4371, 9494, 8983, 6425, 793, 362, 6433, 7458, 2339, 810, 1835, 8493, 6447, 1329, 28466, 44855, 9527, 1338, 10044, 317, 3390, 10047, 41280, 31554, 2372, 9029, 11592, 9547, 3916, 9042, 10066, 3925, 343, 10072, 5978, 860, 8030, 10079, 10593, 9572, 2916, 9061, 3430, 6501, 4969, 10089, 30571, 10603, 11117, 9582, 10607, 6505, 14193, 28529, 14707, 7197, 369, 11639, 23929, 894, 1919, 3459, 11652, 2438, 10631, 907, 10642, 9109, 2454, 14743, 2456, 29594, 11164, 6559, 9631, 3999, 1951, 14754, 14756, 31653, 9638, 31654, 33704, 45984, 3500, 31661, 1453, 1455, 9645, 9649, 41394, 9651, 9652, 10165, 30718, 2999, 31672, 1982, 9662, 44483, 11205, 2505, 5581, 10704, 465, 977, 31699, 9172, 4053, 9174, 31703, 4567, 470, 10714, 475, 5076, 478, 480, 23008, 9186, 30692, 9190, 9703, 10216, 491, 30699, 1005, 2542, 31726, 1007, 494, 25586, 10222, 18417, 10736, 8178, 3064, 1529, 509, 1534}, 4: {0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 601, 7258, 91, 5722, 5214, 4703, 608, 3679, 2143, 101, 6758, 5224, 616, 7277, 2158, 4723, 5236, 6267, 1660, 637, 639, 4737, 4739, 5252, 133, 1668, 4606, 23688, 5768, 17035, 2188, 5772, 38034, 5779, 3220, 6805, 2199, 1688, 5273, 154, 155, 1694, 4767, 5280, 5278, 5284, 1191, 1704, 167, 3754, 5802, 5290, 3751, 3247, 5296, 3257, 5818, 5823, 3265, 708, 5318, 5830, 4294, 1738, 5841, 5330, 4825, 4316, 734, 6369, 5349, 4838, 4326, 2280, 4329, 46315, 6380, 29660, 44269, 5871, 5873, 242, 7927, 759, 760, 2812, 1277, 8448, 3329, 4866, 2304, 4869, 5382, 7430, 3848, 3339, 2318, 782, 3857, 5906, 26513, 788, 2841, 7450, 4382, 1825, 7458, 801, 37156, 4393, 810, 7979, 3886, 815, 4911, 4401, 7986, 1329, 820, 5942, 3896, 8506, 2874, 317, 5441, 835, 5445, 5958, 6578, 5964, 5965, 4942, 8016, 8024, 344, 4952, 860, 1884, 29533, 8545, 8037, 3430, 6504, 7017, 2922, 4457, 362, 5998, 2928, 373, 374, 2935, 1398, 8057, 6011, 6015, 32127, 384, 4994, 8579, 4996, 8072, 396, 6541, 5006, 6540, 5009, 1938, 1427, 7571, 2965, 1942, 6039, 1940, 7574, 2970, 409, 7068, 7575, 8606, 5014, 5018, 7585, 5017, 6561, 7588, 1447, 3497, 6058, 5547, 1965, 6065, 4529, 21939, 4531, 6069, 5043, 5559, 7096, 1465, 6074, 3515, 4533, 6077, 5054, 7103, 448, 6080, 6076, 4547, 8132, 4552, 4555, 1484, 39372, 39374, 4561, 6611, 5078, 470, 1496, 5081, 472, 7131, 4572, 7133, 5598, 5086, 4576, 4577, 6111, 478, 4580, 1508, 480, 1503, 5096, 1506, 4584, 23019, 493, 494, 498, 5108, 18935, 1529, 6138, 7163, 10238, 5119}, 5: {0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91, 14940, 9308, 14937, 14943, 608, 6755, 1124, 13924, 14950, 5219, 14947, 9325, 3697, 14961, 11893, 14968, 12408, 15485, 637, 5247, 1668, 1157, 23172, 647, 15492, 15498, 5773, 19087, 13969, 9362, 15506, 1681, 148, 11926, 1176, 2713, 155, 1180, 15517, 1692, 20124, 10401, 19105, 675, 674, 19109, 167, 1704, 11946, 15019, 12458, 1709, 682, 9091, 2224, 15025, 20656, 176, 180, 7858, 12982, 15031, 15543, 41136, 14013, 2239, 1729, 708, 9413, 21700, 712, 15562, 15051, 2765, 15057, 15061, 9942, 15063, 21718, 22747, 15068, 15069, 32475, 13535, 15583, 15074, 227, 19683, 2789, 1766, 13542, 13036, 2799, 752, 3312, 13552, 242, 26867, 1268, 15618, 759, 2809, 763, 28924, 2812, 10495, 2817, 2818, 14083, 769, 259, 15622, 2823, 1288, 8962, 15109, 19720, 15629, 19213, 3345, 786, 788, 280, 25375, 2337, 15650, 804, 15653, 3366, 807, 2349, 15151, 7984, 1329, 21810, 820, 12602, 1338, 317, 11582, 5953, 2370, 835, 323, 15688, 1864, 15693, 854, 13142, 344, 15705, 4955, 860, 23899, 11615, 863, 15199, 15711, 13155, 15205, 872, 4457, 15722, 362, 15724, 875, 3438, 15215, 369, 883, 19828, 24437, 374, 29179, 9593, 19834, 15227, 894, 19326, 13186, 35203, 2436, 15749, 389, 19847, 15750, 19849, 2438, 1922, 6028, 909, 15752, 2446, 13200, 2448, 409, 21923, 9644, 14766, 22959, 14771, 23989, 12728, 9145, 14778, 14779, 3000, 12733, 7102, 3007, 9665, 14786, 12226, 2498, 14789, 8645, 15301, 15305, 15818, 461, 976, 5585, 977, 1489, 15358, 472, 1496, 42457, 2524, 478, 19422, 480, 15330, 15843, 20452, 26084, 6631, 14827, 492, 15343, 3571, 14836, 15348, 19446, 14839, 11765, 1017, 14843, 14844, 14846}}

word_bagNum shape: 6 50

{0: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960], 1: [0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613], 2: [0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651], 3: [1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284], 4: [0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740], 5: [0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91]}

after all_words, word_bag shape: 6 300

{0: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91], 1: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91], 2: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91], 3: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91], 4: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91], 5: [1536, 7681, 17410, 17411, 17415, 6664, 17420, 15886, 4623, 17935, 4625, 5139, 4631, 17916, 17437, 544, 16422, 5671, 1065, 4650, 4651, 4653, 4690, 16943, 4657, 17458, 15921, 51, 7222, 17464, 17465, 10299, 15932, 64, 6209, 66, 17474, 4680, 8264, 8266, 40008, 6730, 8273, 6738, 5203, 5206, 18005, 15958, 597, 15960, 0, 3, 11785, 2569, 32779, 9227, 526, 21519, 530, 4116, 533, 11805, 2590, 2591, 3105, 7203, 1571, 8740, 1574, 12836, 1062, 1577, 2553, 4654, 1071, 2094, 30257, 51, 30260, 53, 28213, 24633, 1082, 1087, 68, 8779, 78, 12367, 11859, 2647, 91, 13916, 13917, 15455, 608, 9825, 1634, 12387, 13412, 613, 0, 3, 520, 1547, 12300, 2062, 3599, 1040, 26641, 18, 25616, 2577, 13846, 2583, 4121, 25114, 1051, 1052, 25629, 1054, 1567, 2591, 3105, 3616, 4126, 1060, 4125, 1062, 1063, 26663, 1577, 13863, 1066, 1580, 45, 1071, 51, 3123, 53, 2614, 3125, 1082, 2622, 66, 2627, 11843, 1093, 1606, 1605, 3651, 1536, 0, 10242, 3, 37889, 1029, 10248, 2569, 9740, 9745, 10770, 17938, 2577, 10257, 9238, 3094, 9752, 9751, 9754, 30235, 9243, 18425, 9246, 2590, 24096, 9249, 9250, 9251, 4643, 10272, 9252, 5666, 3616, 3625, 4133, 4136, 1071, 9264, 4657, 51, 9267, 22583, 10808, 40504, 10304, 6210, 3650, 37444, 68, 9284, 0, 5121, 4098, 3, 3078, 7175, 1543, 1545, 22027, 5131, 14, 4623, 4625, 22547, 533, 2588, 2590, 1570, 4643, 2597, 5669, 5159, 6183, 2602, 45, 6702, 18937, 5168, 5169, 48, 4657, 3063, 51, 1590, 12343, 5686, 5689, 2105, 1586, 5175, 5694, 6721, 68, 2630, 29767, 29778, 4692, 2133, 5204, 6740, 0, 14849, 512, 3, 11266, 14853, 2053, 23047, 1527, 2569, 15370, 14861, 13, 19471, 2577, 11793, 14867, 18423, 533, 15384, 14875, 15388, 11807, 15396, 4132, 1574, 14890, 14893, 14896, 14897, 1586, 51, 1590, 14911, 1088, 15429, 14406, 23111, 16968, 14921, 14925, 16461, 14929, 15442, 8789, 14934, 2647, 3161, 7770, 91]}

features_data_frame.shape: (6, 255)

0 30

1 185

2 139

3 66

4 69

5 157

class_Proportion:

[0.04643962848297214, 0.28637770897832815, 0.21517027863777088, 0.1021671826625387, 0.10681114551083591, 0.24303405572755418]

test_data_frame.head(2)

     Unnamed: 0                                            content  \

854         854  据Mobileexpose报道,华硕已经正式向媒体发出邀请,定于6月14日在台湾举办记者会,...  

101         101   6月6日,王者荣耀猴三棍重做引起王者峡谷一阵轩然大波,毕竟这个强势的猴子已经陪伴我们好几个...  

                     id                                   tags  \

854  6429089676803440897  ['科技', '华硕', '华硕ZenFone', '台湾', '手机']  

101  6429098400347586818       ['游戏', '猴子', '王者荣耀', '黄忠', '游戏']  

                   time                     title  \

854  2017-06-07 10:11:00        华硕ZenFone AR宣布本月发售  

101  2017-06-07 10:39:20  猴子重做之后是加强还是削弱?狂到站对面泉水拿双杀  

                                            doc_words  \

854  [报道, 华硕, 已经, 正式, 媒体, 发出, 邀请, 定于, 月, 日, 台湾, 举办,...  

101  [月, 日, 王者, 荣耀, 猴三棍, 重, 做, 引起, 王者, 峡谷, 一阵, 轩然大波...  

                                               corpus  \

854  [(142, 1), (362, 1), (472, 1), (475, 1), (494,...  

101  [(0, 2), (68, 3), (133, 1), (184, 1), (226, 1)...  

                                                tfidf   visual01   visual02  \

854  [(142, 0.13953435619531032), (362, 0.046441336...  21.684397 -30.567736  

101  [(0, 0.012838015508020575), (68, 0.04742284222...  67.188065  21.183245  

    keyword_index  

854              1  

101              3  

print the first sample

Unnamed: 0                                                     854

content          据Mobileexpose报道,华硕已经正式向媒体发出邀请,定于6月14日在台湾举办记者会,...

id                                             6429089676803440897

tags                         ['科技', '华硕', '华硕ZenFone', '台湾', '手机']

time                                           2017-06-07 10:11:00

title                                           华硕ZenFone AR宣布本月发售

doc_words        [报道, 华硕, 已经, 正式, 媒体, 发出, 邀请, 定于, 月, 日, 台湾, 举办,...

corpus           [(142, 1), (362, 1), (472, 1), (475, 1), (494,...

tfidf            [(142, 0.13953435619531032), (362, 0.046441336...

visual01                                                   21.6844

visual02                                                  -30.5677

keyword_index                                                    1

Name: 854, dtype: object

test_data_frame.iloc[0].corpus:  [(142, 1), (362, 1), (472, 1), (475, 1), (494, 1), (530, 1), (872, 1), (909, 1), (1254, 1), (1312, 1), (1878, 1), (2577, 1), (2783, 1), (2979, 1), (3697, 1), (5508, 1), (9052, 1), (12204, 1), (12256, 1), (12591, 1), (12936, 1), (12991, 1), (13128, 1), (13194, 1), (13244, 1), (13317, 1), (31670, 1), (31683, 1), (33417, 1)]

[1.45708072e-43 1.78656934e-66 7.12148875e-63 1.71090490e-53

4.71385662e-54 2.08405934e-64]

[-35.34436300647761, -16.431856044032266, -20.267559000416433, -22.405433968586664, -27.97121661401147, -18.05089965903481]

F:\File_Jupyter\实用代码\naive_bayes(简单贝叶斯)\TextClassPrediction_kNN_NB_LDA_P.py:346: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

 test_data_frame['predicted_class'] = test_data_frame['corpus'].apply(predict_text_ByMax)       #预测所有测试文档   predict all test documents

test_data_frame:       Unnamed: 0                                            content  \

854          854  据Mobileexpose报道,华硕已经正式向媒体发出邀请,定于6月14日在台湾举办记者会,...  

101          101   6月6日,王者荣耀猴三棍重做引起王者峡谷一阵轩然大波,毕竟这个强势的猴子已经陪伴我们好几个...  

738          738  骗子往往都很会讲故事,比如以下这些硅谷骗局:验血公司Theranos,号称只要从指尖抽几滴血...  

511          511  专访 Whyd 创始人 孟崨在学校,他是最调皮,却又成绩最好的学生,让老师头疼不已。在公司,...  

725          725  据介绍,喜马拉雅FM会员月费为18元,年度会员188元,价格与视频网站会员价格相仿。在会员福...  

...          ...                                                ...  

805          805  每经记者 王海慜 每经编辑 叶峰今日盘中,昨日领涨的中小创出现休整,而昨日暂时休整的一批龙头...  

448          448  中国人买什么都喜欢大的,房子要买面积大的、手机要买屏大的,买车自然也是要挑选空间大的。抛开拉...  

782          782  中证网讯 (记者 徐金忠)6月7日,国能电动汽车瑞典有限公司(NEVS)亮相CES亚洲消费电...  

1264        1264  目前日系豪华品牌讴歌已经开启了国产之路,在推出CDX车型后,讴歌在国内的知名度一度飙升。CD...  

1195        1195  近日有爆料称,乐视位于北京达美中心的办公地因未及时缴纳办公地费用已被停止物业一切服务;物业公...  

                      id                                   tags  \

854   6429089676803440897  ['科技', '华硕', '华硕ZenFone', '台湾', '手机']  

101   6429098400347586818       ['游戏', '猴子', '王者荣耀', '黄忠', '游戏']  

738   6413133652368982274     ['科技', '厨卫电器', '榨汁机', '小家电', '硅谷']  

511   6428827159980867842     ['科技', '智能家居', '音箱', '苹果公司', '法国']  

725   6428841852455354625                  ['科技', '喜马拉雅山', '科技']  

...                   ...                                    ...  

805   6429151552733069569                           ['财经', '财经']  

448   6415852634885341441    ['汽车', 'SUV', '国产车', '概念车', '汽车用品']  

782   6428858665063383297   ['科技', '新能源汽车', '电动汽车', '新能源', '经济']  

1264  6427822755417194753    ['汽车', '日本汽车', '讴歌汽车', 'SUV', '空调']  

1195  6429093420292210945                     ['科技', '乐视', '科技']  

                    time                        title  \

854   2017-06-07 10:11:00           华硕ZenFone AR宣布本月发售  

101   2017-06-07 10:39:20     猴子重做之后是加强还是削弱?狂到站对面泉水拿双杀  

738   2017-04-26 10:41:39                绝!他用一台榨汁机骗了8亿  

511   2017-06-08 11:06:00    他的智能音箱一上市,苹果公司就推出了HomePod  

725   2017-06-07 18:37:00  喜马拉雅FM推出“付费会员”,当天召集超221万名会员  

...                   ...                          ...  

805   2017-06-08 14:30:00          盘中近20家龙头白马股集体创下历史新高  

448   2017-05-03 18:37:20      别瞎找了!10万左右尺寸最大的SUV都在这里了  

782   2017-06-07 19:12:00      倡导移动出行新概念 NEVS两款概念量产车亮相  

1264  2017-06-08 09:54:40        居然还有一款车,最低配和中高配看不出差别?  

1195  2017-06-08 10:45:00     乐视被爆未及时缴物业费,员工或将被阻止进大楼办公  

                                             doc_words  \

854   [报道, 华硕, 已经, 正式, 媒体, 发出, 邀请, 定于, 月, 日, 台湾, 举办,...  

101   [月, 日, 王者, 荣耀, 猴三棍, 重, 做, 引起, 王者, 峡谷, 一阵, 轩然大波...  

738   [骗子, 往往, 很会, 讲故事, 以下, 硅谷, 骗局, 验血, 公司, 号称, 指尖, ...  

511   [专访, 创始人, 孟, 崨, 学校, 最, 调皮, 却, 成绩, 最好, 学生, 老师, ...  

725   [据介绍, 喜马拉雅, 会员, 月费, 元, 年度, 会员, 元, 价格, 视频, 网站, ...  

...                                                 ...  

805   [每经, 记者, 王海, 慜, 每经, 编辑, 叶峰, 今日, 盘中, 昨日, 领涨, 中小...  

448   [中国, 人买, 喜欢, 房子, 买, 面积, 手机, 买, 屏大, 买车, 自然, 挑选,...  

782   [中证网, 讯, 记者, 徐金忠, 月, 日, 国, 电动汽车, 瑞典, 有限公司, 亮相,...  

1264  [目前, 日系, 豪华, 品牌, 讴歌, 已经, 开启, 国产, 路, 推出, 车型, 后,...  

1195  [近日, 爆料, 称, 乐视, 位于, 北京, 达美, 中心, 办公地, 因未, 及时, 缴...  

                                                corpus  \

854   [(142, 1), (362, 1), (472, 1), (475, 1), (494,...  

101   [(0, 2), (68, 3), (133, 1), (184, 1), (226, 1)...  

738   [(0, 2), (45, 1), (48, 1), (133, 2), (155, 1),...  

511   [(0, 10), (13, 2), (14, 2), (20, 1), (45, 1), ...  

725   [(30, 1), (102, 1), (142, 1), (154, 1), (189, ...  

...                                                 ...  

805   [(113, 1), (167, 1), (169, 1), (214, 1), (258,...  

448   [(4, 2), (8, 1), (14, 1), (51, 6), (53, 2), (6...  

782   [(15, 2), (30, 1), (53, 7), (93, 1), (143, 1),...  

1264  [(0, 1), (20, 1), (51, 1), (176, 1), (225, 1),...  

1195  [(57, 1), (111, 1), (191, 1), (361, 1), (476, ...  

                                                 tfidf   visual01   visual02  \

854   [(142, 0.13953435619531032), (362, 0.046441336...  21.684397 -30.567736  

101   [(0, 0.012838015508020575), (68, 0.04742284222...  67.188065  21.183245  

738   [(0, 0.008984009118453712), (45, 0.01791359767... -22.855194 -11.270862  

511   [(0, 0.04361196171462796), (13, 0.028607388065... -22.198786  12.217076  

725   [(30, 0.05815947983270004), (102, 0.0450585853...  26.268911  21.240065  

...                                                 ...        ...        ...  

805   [(113, 0.030899018921031703), (167, 0.02103003... -66.232071   0.221611  

448   [(4, 0.04071064284477513), (8, 0.0235138776022...  41.836094 -44.539528  

782   [(15, 0.03392075672049564), (30, 0.03003603467... -26.810091 -29.602842  

1264  [(0, 0.009883726180653873), (20, 0.04080153677...  36.279522 -52.474297  

1195  [(57, 0.09668298763559263), (111, 0.1255406499...  -6.373239  16.101738  

     keyword_index  predicted_class  

854               1                1  

101               3                3  

738               1                1  

511               1                2  

725               1                1  

...             ...              ...  

805               2                2  

448               5                5  

782               1                1  

1264              5                5  

1195              1                1  

[647 rows x 13 columns]

SModel_CS_acc_score: 0.7047913446676971

300

label_category_ID 2

一个

一些

概念

经营

补贴

股市

增持

成本

乳业

万吨

train_data_frame.corpus[0]

[(0, 6), (1, 1), (2, 1), (3, 3), (4, 2), (5, 2), (6, 1), (7, 1), (8, 2), (9, 1), (10, 3), (11, 1), (12, 2), (13, 2), (14, 2), (15, 1), (16, 1), (17, 2), (18, 1), (19, 1), (20, 2), (21, 1), (22, 2), (23, 2), (24, 1), (25, 1), (26, 1), (27, 1), (28, 1), (29, 2), (30, 3), (31, 4), (32, 3), (33, 1), (34, 1), (35, 1), (36, 7), (37, 1), (38, 1), (39, 2), (40, 3), (41, 1), (42, 1), (43, 1), (44, 1), (45, 2), (46, 1), (47, 1), (48, 1), (49, 2), (50, 4), (51, 21), (52, 3), (53, 7), (54, 1), (55, 2), (56, 1), (57, 4), (58, 2), (59, 1), (60, 5), (61, 1), (62, 1), (63, 1), (64, 2), (65, 1), (66, 3), (67, 1), (68, 2), (69, 2), (70, 1), (71, 1), (72, 1), (73, 1), (74, 2), (75, 1), (76, 1), (77, 1), (78, 1), (79, 2), (80, 1), (81, 1), (82, 1), (83, 4), (84, 7), (85, 2), (86, 3), (87, 1), (88, 9), (89, 1), (90, 1), (91, 8), (92, 3), (93, 1), (94, 4), (95, 1), (96, 2), (97, 1), (98, 7), (99, 1), (100, 2), (101, 1), (102, 1), (103, 1), (104, 1), (105, 1), (106, 1), (107, 1), (108, 1), (109, 2), (110, 1), (111, 2), (112, 1), (113, 1), (114, 1), (115, 1), (116, 1), (117, 1), (118, 1), (119, 1), (120, 1), (121, 2), (122, 1), (123, 1), (124, 1), (125, 1), (126, 5), (127, 1), (128, 4), (129, 1), (130, 1), (131, 1), (132, 2), (133, 2), (134, 1), (135, 5), (136, 1), (137, 1), (138, 3), (139, 1), (140, 1), (141, 1), (142, 1), (143, 1), (144, 1), (145, 2), (146, 1), (147, 1), (148, 2), (149, 4), (150, 1), (151, 1), (152, 2), (153, 2), (154, 1), (155, 3), (156, 1), (157, 1), (158, 1), (159, 1), (160, 1), (161, 2), (162, 1), (163, 1), (164, 1), (165, 2), (166, 1), (167, 3), (168, 1), (169, 1), (170, 3), (171, 3), (172, 1), (173, 2), (174, 1), (175, 1), (176, 2), (177, 5), (178, 1), (179, 1), (180, 1), (181, 1), (182, 1), (183, 1), (184, 4), (185, 1), (186, 1), (187, 1), (188, 1), (189, 3), (190, 1), (191, 14), (192, 2), (193, 2), (194, 2), (195, 1), (196, 3), (197, 1), (198, 1), (199, 11), (200, 6), (201, 1), (202, 1), (203, 2), (204, 1), (205, 8), (206, 2), (207, 2), (208, 2), (209, 1), (210, 1), (211, 1), (212, 1), (213, 1), (214, 1), (215, 1), (216, 3), (217, 1), (218, 1), (219, 2), (220, 2), (221, 1), (222, 1), (223, 1), (224, 1), (225, 17), (226, 1), (227, 1), (228, 1), (229, 1), (230, 1), (231, 1), (232, 2), (233, 1), (234, 1), (235, 3), (236, 1), (237, 1), (238, 2), (239, 1), (240, 1), (241, 1), (242, 1), (243, 2), (244, 2), (245, 1), (246, 1), (247, 2), (248, 2), (249, 2), (250, 1), (251, 1), (252, 2), (253, 1), (254, 1), (255, 1), (256, 1), (257, 1), (258, 3), (259, 3), (260, 1), (261, 3), (262, 2), (263, 1), (264, 1), (265, 6), (266, 1), (267, 3), (268, 1), (269, 1), (270, 3), (271, 2), (272, 1), (273, 2), (274, 1), (275, 1), (276, 5), (277, 1), (278, 4), (279, 4), (280, 25), (281, 2), (282, 2), (283, 2), (284, 7), (285, 1), (286, 1), (287, 2), (288, 2), (289, 1), (290, 1), (291, 1), (292, 1), (293, 3), (294, 2), (295, 1), (296, 3), (297, 1), (298, 3), (299, 2), (300, 1), (301, 1), (302, 1), (303, 2), (304, 1), (305, 1), (306, 1), (307, 2), (308, 2), (309, 1), (310, 1), (311, 1), (312, 1), (313, 1), (314, 1), (315, 1), (316, 7), (317, 2), (318, 2), (319, 1), (320, 1), (321, 1), (322, 1), (323, 1), (324, 1), (325, 4), (326, 1), (327, 2), (328, 1), (329, 1), (330, 3), (331, 3), (332, 1), (333, 2), (334, 2), (335, 1), (336, 1), (337, 2), (338, 1), (339, 1), (340, 1), (341, 1), (342, 1), (343, 1), (344, 2), (345, 1), (346, 1), (347, 2), (348, 1), (349, 2), (350, 5), (351, 2), (352, 3), (353, 1), (354, 4), (355, 1), (356, 1), (357, 2), (358, 4), (359, 2), (360, 2), (361, 1), (362, 9), (363, 2), (364, 2), (365, 1), (366, 1), (367, 7), (368, 1), (369, 4), (370, 2), (371, 1), (372, 1), (373, 1), (374, 1), (375, 1), (376, 1), (377, 1), (378, 2), (379, 1), (380, 3), (381, 1), (382, 2), (383, 1), (384, 3), (385, 26), (386, 1), (387, 1), (388, 1), (389, 3), (390, 1), (391, 2), (392, 1), (393, 4), (394, 4), (395, 4), (396, 2), (397, 1), (398, 40), (399, 2), (400, 4), (401, 1), (402, 1), (403, 2), (404, 1), (405, 1), (406, 2), (407, 1), (408, 1), (409, 3), (410, 1), (411, 1), (412, 2), (413, 7), (414, 4), (415, 2), (416, 1), (417, 1), (418, 1), (419, 3), (420, 1), (421, 1), (422, 1), (423, 1), (424, 1), (425, 1), (426, 1), (427, 2), (428, 1), (429, 1), (430, 1), (431, 1), (432, 5), (433, 1), (434, 1), (435, 1), (436, 1), (437, 1), (438, 1), (439, 1), (440, 1), (441, 1), (442, 1), (443, 3), (444, 3), (445, 2), (446, 5), (447, 1), (448, 1), (449, 1), (450, 4), (451, 1), (452, 2), (453, 2), (454, 1), (455, 4), (456, 1), (457, 1), (458, 1), (459, 2), (460, 1), (461, 1), (462, 5), (463, 2), (464, 1), (465, 5), (466, 74), (467, 2), (468, 1), (469, 1), (470, 2), (471, 22), (472, 2), (473, 1), (474, 1), (475, 2), (476, 2), (477, 2), (478, 2), (479, 1), (480, 1), (481, 1), (482, 1), (483, 2), (484, 1), (485, 1), (486, 2), (487, 1), (488, 2), (489, 1), (490, 1), (491, 1), (492, 4), (493, 1), (494, 2), (495, 4), (496, 2), (497, 1), (498, 1), (499, 1), (500, 1), (501, 5), (502, 1), (503, 13), (504, 4), (505, 3), (506, 1), (507, 7), (508, 1), (509, 1), (510, 1), (511, 1), (512, 1), (513, 1), (514, 2), (515, 1), (516, 3), (517, 4), (518, 1), (519, 1), (520, 1), (521, 1), (522, 1), (523, 1), (524, 1), (525, 1), (526, 2), (527, 2), (528, 1), (529, 1), (530, 1), (531, 1), (532, 1), (533, 1), (534, 1), (535, 2), (536, 5), (537, 2), (538, 1), (539, 1), (540, 1), (541, 7), (542, 1), (543, 1), (544, 1), (545, 2), (546, 1), (547, 3), (548, 2), (549, 1), (550, 1), (551, 2), (552, 1), (553, 2), (554, 1), (555, 1), (556, 2), (557, 1), (558, 2), (559, 5), (560, 2), (561, 1), (562, 1), (563, 1), (564, 1), (565, 1), (566, 1), (567, 7), (568, 2), (569, 1), (570, 2), (571, 1), (572, 1), (573, 1), (574, 4), (575, 1), (576, 2), (577, 2), (578, 1), (579, 2), (580, 1), (581, 1), (582, 1), (583, 2), (584, 1), (585, 1), (586, 1), (587, 4), (588, 1), (589, 4), (590, 2), (591, 1), (592, 1), (593, 1), (594, 2), (595, 1), (596, 1), (597, 1), (598, 1), (599, 1), (600, 1), (601, 1), (602, 1), (603, 1), (604, 1), (605, 1), (606, 1), (607, 1), (608, 2), (609, 1), (610, 2), (611, 1), (612, 1), (613, 11), (614, 1), (615, 1), (616, 3), (617, 1), (618, 1), (619, 1), (620, 1), (621, 1), (622, 1), (623, 1), (624, 32), (625, 2), (626, 1), (627, 8), (628, 1), (629, 3), (630, 3), (631, 1), (632, 1), (633, 4), (634, 1), (635, 1), (636, 2), (637, 1), (638, 3), (639, 2), (640, 1), (641, 1), (642, 1), (643, 3), (644, 5), (645, 4), (646, 1), (647, 1), (648, 3), (649, 1), (650, 1), (651, 1), (652, 1), (653, 1), (654, 1), (655, 2), (656, 1), (657, 7), (658, 1), (659, 2), (660, 1), (661, 2), (662, 1), (663, 1), (664, 1), (665, 1), (666, 1), (667, 1), (668, 4), (669, 1), (670, 1), (671, 3), (672, 1), (673, 1), (674, 2), (675, 1), (676, 1), (677, 1), (678, 1), (679, 1), (680, 2), (681, 2), (682, 1), (683, 1), (684, 1), (685, 3), (686, 1), (687, 1), (688, 1), (689, 1), (690, 4), (691, 1), (692, 2), (693, 3), (694, 1), (695, 2), (696, 1), (697, 1), (698, 2), (699, 1), (700, 1), (701, 4), (702, 1), (703, 1), (704, 2), (705, 1), (706, 1), (707, 1), (708, 1), (709, 2), (710, 1), (711, 3), (712, 1), (713, 1), (714, 4), (715, 1), (716, 1), (717, 1), (718, 2), (719, 1), (720, 1), (721, 2), (722, 1), (723, 1), (724, 4), (725, 1), (726, 1), (727, 1), (728, 1), (729, 2), (730, 12), (731, 2), (732, 1), (733, 2), (734, 3), (735, 1), (736, 26), (737, 1), (738, 5), (739, 1), (740, 2), (741, 5), (742, 2), (743, 3), (744, 3), (745, 2), (746, 1), (747, 3), (748, 2), (749, 2), (750, 2), (751, 1), (752, 1), (753, 2), (754, 1), (755, 1), (756, 1), (757, 1), (758, 1), (759, 4), (760, 1), (761, 1), (762, 1), (763, 1), (764, 1), (765, 2), (766, 1), (767, 1), (768, 1), (769, 2), (770, 8), (771, 2), (772, 4), (773, 1), (774, 8), (775, 3), (776, 1), (777, 1), (778, 3), (779, 1), (780, 1), (781, 1), (782, 5), (783, 2), (784, 2), (785, 1), (786, 4), (787, 1), (788, 1), (789, 1), (790, 1), (791, 1), (792, 1), (793, 4), (794, 1), (795, 1), (796, 1), (797, 5), (798, 3), (799, 5), (800, 3), (801, 1), (802, 1), (803, 1), (804, 1), (805, 2), (806, 2), (807, 2), (808, 1), (809, 1), (810, 1), (811, 1), (812, 1), (813, 1), (814, 1), (815, 3), (816, 1), (817, 2), (818, 1), (819, 1), (820, 11), (821, 1), (822, 1), (823, 2), (824, 3), (825, 1), (826, 1), (827, 1), (828, 1), (829, 1), (830, 3), (831, 4), (832, 46), (833, 1), (834, 1), (835, 2), (836, 2), (837, 1), (838, 1), (839, 2), (840, 2), (841, 1), (842, 1), (843, 2), (844, 2), (845, 2), (846, 1), (847, 1), (848, 2), (849, 1), (850, 1), (851, 1), (852, 3), (853, 1), (854, 1), (855, 6), (856, 1), (857, 1), (858, 1)]

[33. 74. 73. 31. 47. 48.]

<class 'numpy.ndarray'>

SModel_acc_score: 0.8114374034003091

kNNC_acc_score: 0.8160741885625966

GNBC_acc_score: 0.6352395672333848

MNBC_acc_score: 0.6352395672333848

BNBC_acc_score: 0.29675425038639874

LDAC_acc_score: 0.8238021638330757

PerceptronC_acc_score: 0.8222565687789799


相关文章
|
12月前
|
机器学习/深度学习 算法 数据挖掘
K-means聚类算法是机器学习中常用的一种聚类方法,通过将数据集划分为K个簇来简化数据结构
K-means聚类算法是机器学习中常用的一种聚类方法,通过将数据集划分为K个簇来简化数据结构。本文介绍了K-means算法的基本原理,包括初始化、数据点分配与簇中心更新等步骤,以及如何在Python中实现该算法,最后讨论了其优缺点及应用场景。
1142 6
|
10月前
|
机器学习/深度学习 算法 数据可视化
利用SVM(支持向量机)分类算法对鸢尾花数据集进行分类
本文介绍了如何使用支持向量机(SVM)算法对鸢尾花数据集进行分类。作者通过Python的sklearn库加载数据,并利用pandas、matplotlib等工具进行数据分析和可视化。
848 70
|
机器学习/深度学习 人工智能 算法
【新闻文本分类识别系统】Python+卷积神经网络算法+人工智能+深度学习+计算机毕设项目+Django网页界面平台
文本分类识别系统。本系统使用Python作为主要开发语言,首先收集了10种中文文本数据集("体育类", "财经类", "房产类", "家居类", "教育类", "科技类", "时尚类", "时政类", "游戏类", "娱乐类"),然后基于TensorFlow搭建CNN卷积神经网络算法模型。通过对数据集进行多轮迭代训练,最后得到一个识别精度较高的模型,并保存为本地的h5格式。然后使用Django开发Web网页端操作界面,实现用户上传一段文本识别其所属的类别。
407 1
【新闻文本分类识别系统】Python+卷积神经网络算法+人工智能+深度学习+计算机毕设项目+Django网页界面平台
|
数据采集 前端开发 算法
基于朴素贝叶斯算法的新闻类型预测,django框架开发,前端bootstrap,有爬虫有数据库
本文介绍了一个基于Django框架和朴素贝叶斯算法开发的新闻类型预测系统,该系统具备用户登录注册、后台管理、数据展示、新闻分类分布分析、新闻数量排名和新闻标题预测等功能,旨在提高新闻处理效率和个性化推荐服务。
181 4
|
机器学习/深度学习 算法 Python
python与朴素贝叶斯算法(附示例和代码)
朴素贝叶斯算法以其高效性和优良的分类性能,成为文本处理领域一项受欢迎的方法。提供的代码示例证明了其在Python语言中的易用性和实用性。尽管算法假设了特征之间的独立性,但在实际应用中,它仍然能够提供强大的分类能力。通过调整参数和优化模型,你可以进一步提升朴素贝叶斯分类器的性能。
465 0
|
监控 数据可视化 算法
基于朴素贝叶斯算法的微博舆情监控系统,flask后端,可视化丰富
本文介绍了一个基于朴素贝叶斯算法和Python技术栈的微博舆情监控系统,该系统使用Flask作为后端框架,通过数据爬取、清洗、情感分析和可视化等手段,为用户提供丰富的舆情分析和监测功能。
426 0
|
算法
m基于GA遗传优化的高斯白噪声信道SNR估计算法matlab仿真
**MATLAB2022a模拟展示了遗传算法在AWGN信道中估计SNR的效能。该算法利用生物进化原理全局寻优,解决通信系统中复杂环境下的SNR估计问题。核心代码执行多代选择、重组和突变操作,逐步优化SNR估计。结果以图形形式对比了真实SNR与估计值,并显示了均方根误差(RMSE),体现了算法的准确性。**
205 0
|
22天前
|
机器学习/深度学习 算法 机器人
【水下图像增强融合算法】基于融合的水下图像与视频增强研究(Matlab代码实现)
【水下图像增强融合算法】基于融合的水下图像与视频增强研究(Matlab代码实现)
129 0
|
1月前
|
数据采集 分布式计算 并行计算
mRMR算法实现特征选择-MATLAB
mRMR算法实现特征选择-MATLAB
111 2
|
2月前
|
传感器 机器学习/深度学习 编解码
MATLAB|主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性
MATLAB|主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性
176 3

热门文章

最新文章