ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生

简介: ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生


目录

基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生

设计思路

1、定义数据集

2、DFS设计

输出结果

feature_matrix_cats_df.csv

feature_matrix_nums.csv


 

 

推荐文章

Py之featuretools:featuretools库的简介、安装、使用方法之详细攻略

ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生

ML之FE:基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生实现

 

基于单个csv文件数据集(自动切分为两个dataframe表)利用featuretools工具实现自动特征生成/特征衍生

设计思路

1、定义数据集

contents={"name": ['Bob',        'LiSa',                     'Mary',                       'Alan'],

         "ID":   [1,              2,                            3,                            4],    # 输出 NaN

         "age":  [np.nan,        28,                           38 ,                          '' ],   # 输出

       "born": [pd.NaT,     pd.Timestamp("1990-01-01"),  pd.Timestamp("1980-01-01"),        ''],     # 输出 NaT

         "sex":  ['男',          '女',                        '女',                        '男',],   # 输出 None

         "hobbey":['打篮球',     '打羽毛球',                   '打乒乓球',                    '',],   # 输出

         "money":[200.0,                240.0,                   290.0,                     300.0],  # 输出

         "weight":[140.5,                120.8,                 169.4,                      155.6],  # 输出

         }

2、DFS设计

  • (1)、指定一个包含数据集中所有实体的字典
  • (2)、指定实体间如何关联:当两个实体有一对多关系时,我们称之为“one”实体,即“parent entity”。
  • (3)、运行深度特征合成:DFS的最小输入是一组实体、一组关系和计算特性的“target_entity”。DFS的输出是一个特征矩阵和相应的特征定义列表。
    让我们首先为数据中的每个客户创建一个特性矩阵,那么现在有几十个新特性来描述客户的行为。
  • (4)、改变目标的实体:DFS如此强大的原因之一是它可以为我们的数据中的任何实体创建一个特征矩阵。例如,如果我们想为会话构建特性
  • (5)、理解特征输出:一般来说,Featuretools通过特性名称引用生成的特性。
    为了让特性更容易理解,Featuretools提供了两个额外的工具,Featuretools .graph_feature()和Featuretools .describe_feature(),
    来帮助解释什么是特性以及Featuretools生成特性的步骤。
  • (6)、特征谱系图
    特征谱系图可视地遍历功能生成过程。从基本数据开始,它们一步一步地展示应用的原语和生成的中间特征,以创建最终特征。
  • (7)、特征描述:功能工具还可以自动生成功能的英文句子描述。特性描述有助于解释什么是特性,并且可以通过包含手动定义的自定义来进一步改进。
    有关如何自定义自动生成的特性描述的详细信息,请参见生成特性描述。

 

 

输出结果

1.    name  ID  age       born sex hobbey  money  weight
2. 0   Bob   1  NaN        NaT   男    打篮球  200.0   140.5
3. 1  LiSa   2   28 1990-01-01   女   打羽毛球  240.0   120.8
4. 2  Mary   3   38 1980-01-01   女   打乒乓球  290.0   169.4
5. 3  Alan   4             NaT   男         300.0   155.6
6. -------------------------------------------
7. nums_df:----------------------------------
8.    name  ID   age  money  weight
9. 0   Bob   1   NaN  200.0   140.5
10. 1  LiSa   2  28.0  240.0   120.8
11. 2  Mary   3  38.0  290.0   169.4
12. 3  Alan   4   NaN  300.0   155.6
13. cats_df:----------------------------------
14.    ID hobbey sex        born
15. 0   4    NaN   男         NaN
16. 1   1    打篮球   男         NaN
17. 2   2   打羽毛球   女  1990-01-01
18. ---------------------------------DFS设计:-----------------------------------
19. feature_matrix_nums 
20.        ID   age  money  weight cats.hobbey cats.sex  cats.COUNT(nums)  \
21. name                                                                   
22. Bob    1   NaN  200.0   140.5         打篮球        男               1.0
23. LiSa   2  28.0  240.0   120.8        打羽毛球        女               1.0
24. Mary   3  38.0  290.0   169.4         NaN      NaN               NaN   
25. 
26.       cats.MAX(nums.age)  cats.MAX(nums.money)  cats.MAX(nums.weight)  \
27. name                                                                    
28. Bob                  NaN                 200.0                  140.5
29. LiSa                28.0                 240.0                  120.8
30. Mary                 NaN                   NaN                    NaN   
31. 
32.       cats.MEAN(nums.age)  cats.MEAN(nums.money)  cats.MEAN(nums.weight)  \
33. name                                                                       
34. Bob                   NaN                  200.0                   140.5
35. LiSa                 28.0                  240.0                   120.8
36. Mary                  NaN                    NaN                     NaN   
37. 
38.       cats.MIN(nums.age)  cats.MIN(nums.money)  cats.MIN(nums.weight)  \
39. name                                                                    
40. Bob                  NaN                 200.0                  140.5
41. LiSa                28.0                 240.0                  120.8
42. Mary                 NaN                   NaN                    NaN   
43. 
44.       cats.SKEW(nums.age)  cats.SKEW(nums.money)  cats.SKEW(nums.weight)  \
45. name                                                                       
46. Bob                   NaN                    NaN                     NaN   
47. LiSa                  NaN                    NaN                     NaN   
48. Mary                  NaN                    NaN                     NaN   
49. 
50.       cats.STD(nums.age)  cats.STD(nums.money)  cats.STD(nums.weight)  \
51. name                                                                    
52. Bob                  NaN                   NaN                    NaN   
53. LiSa                 NaN                   NaN                    NaN   
54. Mary                 NaN                   NaN                    NaN   
55. 
56.       cats.SUM(nums.age)  cats.SUM(nums.money)  cats.SUM(nums.weight)  \
57. name                                                                    
58. Bob                  0.0                 200.0                  140.5
59. LiSa                28.0                 240.0                  120.8
60. Mary                 NaN                   NaN                    NaN   
61. 
62.       cats.DAY(born)  cats.MONTH(born)  cats.WEEKDAY(born)  cats.YEAR(born)  
63. name                                                                         
64. Bob              NaN               NaN                 NaN              NaN  
65. LiSa             1.0               1.0                 0.0           1990.0
66. Mary             NaN               NaN                 NaN              NaN  
67. features_defs_nums: 29 [<Feature: ID>, <Feature: age>, <Feature: money>, <Feature: weight>, <Feature: cats.hobbey>, <Feature: cats.sex>, <Feature: cats.COUNT(nums)>, <Feature: cats.MAX(nums.age)>, <Feature: cats.MAX(nums.money)>, <Feature: cats.MAX(nums.weight)>, <Feature: cats.MEAN(nums.age)>, <Feature: cats.MEAN(nums.money)>, <Feature: cats.MEAN(nums.weight)>, <Feature: cats.MIN(nums.age)>, <Feature: cats.MIN(nums.money)>, <Feature: cats.MIN(nums.weight)>, <Feature: cats.SKEW(nums.age)>, <Feature: cats.SKEW(nums.money)>, <Feature: cats.SKEW(nums.weight)>, <Feature: cats.STD(nums.age)>, <Feature: cats.STD(nums.money)>, <Feature: cats.STD(nums.weight)>, <Feature: cats.SUM(nums.age)>, <Feature: cats.SUM(nums.money)>, <Feature: cats.SUM(nums.weight)>, <Feature: cats.DAY(born)>, <Feature: cats.MONTH(born)>, <Feature: cats.WEEKDAY(born)>, <Feature: cats.YEAR(born)>]
68. feature_matrix_cats_df 
69.     hobbey sex  COUNT(nums)  MAX(nums.age)  MAX(nums.money)  MAX(nums.weight)  \
70. ID                                                                             
71. 4     NaN   男            1            NaN            300.0             155.6
72. 1     打篮球   男            1            NaN            200.0             140.5
73. 2    打羽毛球   女            1           28.0            240.0             120.8
74. 
75.     MEAN(nums.age)  MEAN(nums.money)  MEAN(nums.weight)  MIN(nums.age)  \
76. ID                                                                       
77. 4              NaN             300.0              155.6            NaN   
78. 1              NaN             200.0              140.5            NaN   
79. 2             28.0             240.0              120.8           28.0
80. 
81.     MIN(nums.money)  MIN(nums.weight)  SKEW(nums.age)  SKEW(nums.money)  \
82. ID                                                                        
83. 4             300.0             155.6             NaN               NaN   
84. 1             200.0             140.5             NaN               NaN   
85. 2             240.0             120.8             NaN               NaN   
86. 
87.     SKEW(nums.weight)  STD(nums.age)  STD(nums.money)  STD(nums.weight)  \
88. ID                                                                        
89. 4                 NaN            NaN              NaN               NaN   
90. 1                 NaN            NaN              NaN               NaN   
91. 2                 NaN            NaN              NaN               NaN   
92. 
93.     SUM(nums.age)  SUM(nums.money)  SUM(nums.weight)  DAY(born)  MONTH(born)  \
94. ID                                                                             
95. 4             0.0            300.0             155.6        NaN          NaN   
96. 1             0.0            200.0             140.5        NaN          NaN   
97. 2            28.0            240.0             120.8        1.0          1.0
98. 
99.     WEEKDAY(born)  YEAR(born)  
100. ID                             
101. 4             NaN         NaN  
102. 1             NaN         NaN  
103. 2             0.0      1990.0
104. features_defs_cats_df: 25 [<Feature: hobbey>, <Feature: sex>, <Feature: COUNT(nums)>, <Feature: MAX(nums.age)>, <Feature: MAX(nums.money)>, <Feature: MAX(nums.weight)>, <Feature: MEAN(nums.age)>, <Feature: MEAN(nums.money)>, <Feature: MEAN(nums.weight)>, <Feature: MIN(nums.age)>, <Feature: MIN(nums.money)>, <Feature: MIN(nums.weight)>, <Feature: SKEW(nums.age)>, <Feature: SKEW(nums.money)>, <Feature: SKEW(nums.weight)>, <Feature: STD(nums.age)>, <Feature: STD(nums.money)>, <Feature: STD(nums.weight)>, <Feature: SUM(nums.age)>, <Feature: SUM(nums.money)>, <Feature: SUM(nums.weight)>, <Feature: DAY(born)>, <Feature: MONTH(born)>, <Feature: WEEKDAY(born)>, <Feature: YEAR(born)>]
105. <Feature: SUM(nums.age)>
106. The sum of the "age" of all instances of "nums" for each "ID" in "cats".

 

 

feature_matrix_cats_df.csv

features_defs_cats_df: 25

[<Feature: hobbey>, <Feature: sex>, <Feature: COUNT(nums)>, <Feature: MAX(nums.age)>, <Feature: MAX(nums.money)>, <Feature: MAX(nums.weight)>, <Feature: MEAN(nums.age)>, <Feature: MEAN(nums.money)>, <Feature: MEAN(nums.weight)>, <Feature: MIN(nums.age)>, <Feature: MIN(nums.money)>, <Feature: MIN(nums.weight)>, <Feature: SKEW(nums.age)>, <Feature: SKEW(nums.money)>, <Feature: SKEW(nums.weight)>, <Feature: STD(nums.age)>, <Feature: STD(nums.money)>, <Feature: STD(nums.weight)>, <Feature: SUM(nums.age)>, <Feature: SUM(nums.money)>, <Feature: SUM(nums.weight)>, <Feature: DAY(born)>, <Feature: MONTH(born)>, <Feature: WEEKDAY(born)>, <Feature: YEAR(born)>]

ID hobbey sex COUNT(nums) MAX(nums.age) MAX(nums.money) MAX(nums.weight) MEAN(nums.age) MEAN(nums.money) MEAN(nums.weight) MIN(nums.age) MIN(nums.money) MIN(nums.weight) SKEW(nums.age) SKEW(nums.money) SKEW(nums.weight) STD(nums.age) STD(nums.money) STD(nums.weight) SUM(nums.age) SUM(nums.money) SUM(nums.weight) DAY(born) MONTH(born) WEEKDAY(born) YEAR(born)
4   1   300 155.6   300 155.6   300 155.6             0 300 155.6        
1 打篮球 1   200 140.5   200 140.5   200 140.5             0 200 140.5        
2 打羽毛球 1 28 240 120.8 28 240 120.8 28 240 120.8             28 240 120.8 1 1 0 1990

 

ID hobbey sex COUNT(nums)            
4   1            
1 打篮球 1            
2 打羽毛球 1            
  MAX(nums.age) MAX(nums.money) MAX(nums.weight) MEAN(nums.age) MEAN(nums.money) MEAN(nums.weight) MIN(nums.age) MIN(nums.money) MIN(nums.weight)
    300 155.6   300 155.6   300 155.6
    200 140.5   200 140.5   200 140.5
  28 240 120.8 28 240 120.8 28 240 120.8
  SKEW(nums.age) SKEW(nums.money) SKEW(nums.weight) STD(nums.age) STD(nums.money) STD(nums.weight) SUM(nums.age) SUM(nums.money) SUM(nums.weight)
              0 300 155.6
              0 200 140.5
              28 240 120.8
  DAY(born) MONTH(born) WEEKDAY(born) YEAR(born)          
                   
                   
  1 1 0 1990          

字段解释

  1. <Feature: hobbey> : The "hobbey".
  2. <Feature: sex> : The "sex".
  3. <Feature: COUNT(nums)> : The number of all instances of "nums" for each "ID" in "cats".
  4. <Feature: MAX(nums.age)> : The maximum of the "age" of all instances of "nums" for each "ID" in "cats".
  5. <Feature: MAX(nums.money)> : The maximum of the "money" of all instances of "nums" for each "ID" in "cats".
  6. <Feature: MAX(nums.weight)> : The maximum of the "weight" of all instances of "nums" for each "ID" in "cats".
  7. <Feature: MEAN(nums.age)> : The average of the "age" of all instances of "nums" for each "ID" in "cats".
  8. <Feature: MEAN(nums.money)> : The average of the "money" of all instances of "nums" for each "ID" in "cats".
  9. <Feature: MEAN(nums.weight)> : The average of the "weight" of all instances of "nums" for each "ID" in "cats".
  10. <Feature: MIN(nums.age)> : The minimum of the "age" of all instances of "nums" for each "ID" in "cats".
  11. <Feature: MIN(nums.money)> : The minimum of the "money" of all instances of "nums" for each "ID" in "cats".
  12. <Feature: MIN(nums.weight)> : The minimum of the "weight" of all instances of "nums" for each "ID" in "cats".
  13. <Feature: SKEW(nums.age)> : The skewness of the "age" of all instances of "nums" for each "ID" in "cats".
  14. <Feature: SKEW(nums.money)> : The skewness of the "money" of all instances of "nums" for each "ID" in "cats".
  15. <Feature: SKEW(nums.weight)> : The skewness of the "weight" of all instances of "nums" for each "ID" in "cats".
  16. <Feature: STD(nums.age)> : The standard deviation of the "age" of all instances of "nums" for each "ID" in "cats".
  17. <Feature: STD(nums.money)> : The standard deviation of the "money" of all instances of "nums" for each "ID" in "cats".
  18. <Feature: STD(nums.weight)> : The standard deviation of the "weight" of all instances of "nums" for each "ID" in "cats".
  19. <Feature: SUM(nums.age)> : The sum of the "age" of all instances of "nums" for each "ID" in "cats".
  20. <Feature: SUM(nums.money)> : The sum of the "money" of all instances of "nums" for each "ID" in "cats".
  21. <Feature: SUM(nums.weight)> : The sum of the "weight" of all instances of "nums" for each "ID" in "cats".
  22. <Feature: DAY(born)> : The day of the month of the "born".
  23. <Feature: MONTH(born)> : The month of the "born".
  24. <Feature: WEEKDAY(born)> : The day of the week of the "born".
  25. <Feature: YEAR(born)> : The year of the "born".

 

 

feature_matrix_nums.csv

features_defs_nums: 29

[<Feature: ID>, <Feature: age>, <Feature: money>, <Feature: weight>, <Feature: cats.hobbey>, <Feature: cats.sex>, <Feature: cats.COUNT(nums)>, <Feature: cats.MAX(nums.age)>, <Feature: cats.MAX(nums.money)>, <Feature: cats.MAX(nums.weight)>, <Feature: cats.MEAN(nums.age)>, <Feature: cats.MEAN(nums.money)>, <Feature: cats.MEAN(nums.weight)>, <Feature: cats.MIN(nums.age)>, <Feature: cats.MIN(nums.money)>, <Feature: cats.MIN(nums.weight)>, <Feature: cats.SKEW(nums.age)>, <Feature: cats.SKEW(nums.money)>, <Feature: cats.SKEW(nums.weight)>, <Feature: cats.STD(nums.age)>, <Feature: cats.STD(nums.money)>, <Feature: cats.STD(nums.weight)>, <Feature: cats.SUM(nums.age)>, <Feature: cats.SUM(nums.money)>, <Feature: cats.SUM(nums.weight)>, <Feature: cats.DAY(born)>, <Feature: cats.MONTH(born)>, <Feature: cats.WEEKDAY(born)>, <Feature: cats.YEAR(born)>]

name ID age money weight cats.hobbey cats.sex cats.COUNT(nums) cats.MAX(nums.age) cats.MAX(nums.money) cats.MAX(nums.weight) cats.MEAN(nums.age) cats.MEAN(nums.money) cats.MEAN(nums.weight) cats.MIN(nums.age) cats.MIN(nums.money) cats.MIN(nums.weight) cats.SKEW(nums.age) cats.SKEW(nums.money) cats.SKEW(nums.weight) cats.STD(nums.age) cats.STD(nums.money) cats.STD(nums.weight) cats.SUM(nums.age) cats.SUM(nums.money) cats.SUM(nums.weight) cats.DAY(born) cats.MONTH(born) cats.WEEKDAY(born) cats.YEAR(born)
Bob 1   200 140.5 打篮球 1   200 140.5   200 140.5   200 140.5             0 200 140.5        
LiSa 2 28 240 120.8 打羽毛球 1 28 240 120.8 28 240 120.8 28 240 120.8             28 240 120.8 1 1 0 1990
Mary 3 38 290 169.4                                                  
Alan 4   300 155.6   1   300 155.6   300 155.6   300 155.6             0 300 155.6        

 

name ID age money weight          
Bob 1   200 140.5          
LiSa 2 28 240 120.8          
Mary 3 38 290 169.4          
Alan 4   300 155.6          
  cats.hobbey cats.sex cats.COUNT(nums)            
  打篮球 1            
  打羽毛球 1            
                   
    1            
  cats.MAX(nums.age) cats.MAX(nums.money) cats.MAX(nums.weight) cats.MEAN(nums.age) cats.MEAN(nums.money) cats.MEAN(nums.weight) cats.MIN(nums.age) cats.MIN(nums.money) cats.MIN(nums.weight)
    200 140.5   200 140.5   200 140.5
  28 240 120.8 28 240 120.8 28 240 120.8
                   
    300 155.6   300 155.6   300 155.6
  cats.SKEW(nums.age) cats.SKEW(nums.money) cats.SKEW(nums.weight) cats.STD(nums.age) cats.STD(nums.money) cats.STD(nums.weight) cats.SUM(nums.age) cats.SUM(nums.money) cats.SUM(nums.weight)
              0 200 140.5
              28 240 120.8
                   
              0 300 155.6
  cats.DAY(born) cats.MONTH(born) cats.WEEKDAY(born) cats.YEAR(born)          
                   
  1 1 0 1990          
                   
                   

 

字段解释

  1. <Feature: ID> : The "ID".
  2. <Feature: age> : The "age".
  3. <Feature: money> : The "money".
  4. <Feature: weight> : The "weight".
  5. <Feature: cats.sex> : The "sex" for the instance of "cats" associated with this instance of "nums".
  6. <Feature: cats.hobbey> : The "hobbey" for the instance of "cats" associated with this instance of "nums".
  7. <Feature: cats.COUNT(nums)> : The number of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  8. <Feature: cats.MAX(nums.age)> : The maximum of the "age" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  9. <Feature: cats.MAX(nums.money)> : The maximum of the "money" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  10. <Feature: cats.MAX(nums.weight)> : The maximum of the "weight" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  11. <Feature: cats.MEAN(nums.age)> : The average of the "age" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  12. <Feature: cats.MEAN(nums.money)> : The average of the "money" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  13. <Feature: cats.MEAN(nums.weight)> : The average of the "weight" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  14. <Feature: cats.MIN(nums.age)> : The minimum of the "age" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  15. <Feature: cats.MIN(nums.money)> : The minimum of the "money" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  16. <Feature: cats.MIN(nums.weight)> : The minimum of the "weight" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  17. <Feature: cats.SKEW(nums.age)> : The skewness of the "age" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  18. <Feature: cats.SKEW(nums.money)> : The skewness of the "money" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  19. <Feature: cats.SKEW(nums.weight)> : The skewness of the "weight" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  20. <Feature: cats.STD(nums.age)> : The standard deviation of the "age" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  21. <Feature: cats.STD(nums.money)> : The standard deviation of the "money" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  22. <Feature: cats.STD(nums.weight)> : The standard deviation of the "weight" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  23. <Feature: cats.SUM(nums.age)> : The sum of the "age" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  24. <Feature: cats.SUM(nums.money)> : The sum of the "money" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  25. <Feature: cats.SUM(nums.weight)> : The sum of the "weight" of all instances of "nums" for each "ID" in "cats" for the instance of "cats" associated with this instance of "nums".
  26. <Feature: cats.DAY(born)> : The day of the month of the "born" for the instance of "cats" associated with this instance of "nums".
  27. <Feature: cats.MONTH(born)> : The month of the "born" for the instance of "cats" associated with this instance of "nums".
  28. <Feature: cats.WEEKDAY(born)> : The day of the week of the "born" for the instance of "cats" associated with this instance of "nums".
  29. <Feature: cats.YEAR(born)> : The year of the "born" for the instance of "cats" associated with this instance of "nums".


相关文章
|
3月前
|
数据处理 开发工具 git
coco2017数据集转换为yolo格式(记录过程)
最近做一个yolov5的落地应用项目,用的anylabeling打标,需要将coco2017的数据集转为yolo格式,故写下记录过程!
|
3月前
|
XML 存储 数据处理
python绘制热力图-数据处理-VOC数据类别标签分布及数量统计(附代码)
python绘制热力图-数据处理-VOC数据类别标签分布及数量统计(附代码)
64 0
|
17天前
|
机器学习/深度学习 存储 算法
【数据分享】R语言SVM和LDA文本挖掘分类开源软件存储库标签数据和词云可视化
【数据分享】R语言SVM和LDA文本挖掘分类开源软件存储库标签数据和词云可视化
|
机器学习/深度学习 数据采集 算法
UCI数据集详解及其数据处理(附148个数据集及处理代码)
UCI数据集详解及其数据处理(附148个数据集及处理代码)
1167 1
|
自然语言处理 算法 安全
PaddleNLP基于ERNIR3.0文本分类:WOS数据集为例(层次分类)
文本分类任务是自然语言处理中最常见的任务,文本分类任务简单来说就是对给定的一个句子或一段文本使用文本分类器进行分类。文本分类任务广泛应用于长短文本分类、情感分析、新闻分类、事件类别分类、政务数据分类、商品信息分类、商品类目预测、文章分类、论文类别分类、专利分类、案件描述分类、罪名分类、意图分类、论文专利分类、邮件自动标签、评论正负识别、药物反应分类、对话分类、税种识别、来电信息自动分类、投诉分类、广告检测、敏感违法内容检测、内容安全检测、舆情分析、话题标记等各类日常或专业领域中。 文本分类任务可以根据标签类型分为**多分类(multi class)、多标签(multi label)、层次分类
PaddleNLP基于ERNIR3.0文本分类:WOS数据集为例(层次分类)
ML之FE:基于load_mock_customer数据集(模拟客户,单个DataFrame)利用featuretools工具实现自动特征生成/特征衍生
ML之FE:基于load_mock_customer数据集(模拟客户,单个DataFrame)利用featuretools工具实现自动特征生成/特征衍生
ML之FE:基于load_mock_customer数据集(模拟客户,单个DataFrame)利用featuretools工具实现自动特征生成/特征衍生
|
机器学习/深度学习
ML之FE:基于BigMartSales数据集利用Featuretools工具(1个dataframe表结构切为2个Entity表结构)实现自动特征工程之详细攻略
ML之FE:基于BigMartSales数据集利用Featuretools工具(1个dataframe表结构切为2个Entity表结构)实现自动特征工程之详细攻略
ML之FE:基于BigMartSales数据集利用Featuretools工具(1个dataframe表结构切为2个Entity表结构)实现自动特征工程之详细攻略
ML之FE:基于load_mock_customer数据集(模拟客户)利用featuretools工具实现自动特征生成/特征衍生
ML之FE:基于load_mock_customer数据集(模拟客户)利用featuretools工具实现自动特征生成/特征衍生
ML之FE:基于load_mock_customer数据集(模拟客户)利用featuretools工具实现自动特征生成/特征衍生
|
机器学习/深度学习 索引 Python
ML之FE:特征工程中常用的五大数据集划分方法(特殊类型数据分割,如时间序列数据分割法)讲解及其代码
ML之FE:特征工程中常用的五大数据集划分方法(特殊类型数据分割,如时间序列数据分割法)讲解及其代码
ML之FE:数据随机抽样之利用pandas的sample函数对超大样本的数据集进行随机采样,并另存为csv文件
ML之FE:数据随机抽样之利用pandas的sample函数对超大样本的数据集进行随机采样,并另存为csv文件