FG 特征算子配置指南
本文档详细介绍 Feature Generator (FG) 的各种特征算子配置方法,帮助您快速掌握不同场景下的特征工程配置技巧。
1. 基础特征算子
1.1 ID 特征
ID 特征用于处理离散型特征,支持单值和多值场景。
配置示例:
{ "features": [ { "feature_type": "id_feature", "feature_name": "user_vip_level", "expression": "user:vip_level", "need_prefix": true, "default_value": "0" }, { "feature_type": "id_feature", "feature_name": "user_gender", "expression": "user:gender", "need_prefix": false, "default_value": "unknown" }, { "feature_type": "id_feature", "feature_name": "item_category", "expression": "item:category", "need_prefix": true, "separator": "|" }, { "feature_type": "id_feature", "feature_name": "user_tags", "expression": "user:tags", "need_prefix": true, "separator": ",", "value_dimension": 3 } ] }
配置说明:
| 字段名 | 是否必选 | 说明 |
| feature_type | 是 | 固定为 id_feature |
| feature_name | 是 | 输出特征名称 |
| expression | 是 | 输入字段来源,格式为 domain:field_name |
| need_prefix | 否 | 是否拼接 feature_name 作为前缀,默认 false |
| separator | 否 | 多值分隔符,默认 \u001D |
| default_value | 否 | 空值时的默认值 |
| value_dimension | 否 | 输出维度,0 表示不截断 |
测试数据示例:
{ "input": { "vip_level": {"type": "optional_string", "values": ["黄金", "钻石", "白银", "铂金", null]}, "gender": {"type": "optional_string", "values": ["男", "女", null, "男", "女"]}, "category": {"type": "optional_string", "values": ["数码|手机", "服饰|运动", "家居|厨具"]}, "tags": {"type": "optional_string", "values": ["运动,音乐,美食", "科技,游戏,阅读"]} }, "expected_output": { "user_vip_level": ["黄金", "钻石", "白银", "铂金", "0"], "user_gender": ["男", "女", "unknown", "男", "女"], "item_category": [["数码", "手机"], ["服饰", "运动"], ["家居", "厨具"]], "user_tags": [["运动", "音乐", "美食"], ["科技", "游戏", "阅读"]] } }
1.2 原始特征
原始特征用于处理连续型数值特征,支持多种归一化方法。
配置示例:
{ "features": [ { "feature_type": "raw_feature", "feature_name": "item_price_norm", "expression": "item:price", "value_type": "float" }, { "feature_type": "raw_feature", "feature_name": "item_ctr_log", "expression": "item:ctr", "normalizer": "method=log10" }, { "feature_type": "raw_feature", "feature_name": "item_rating_norm", "expression": "item:rating", "normalizer": "method=minmax,min=0.0,max=5.0" }, { "feature_type": "raw_feature", "feature_name": "item_sales_zscore", "expression": "item:sales", "normalizer": "method=zscore,mean=1000.0,standard_deviation=500.0" }, { "feature_type": "raw_feature", "feature_name": "item_discount", "expression": "item:discount", "value_type": "double", "default_value": "1.0" } ] }
归一化方法说明:
| 方法 | 配置示例 | 计算公式 |
| log10 | method=log10 |
x = log10(x) |
| minmax | method=minmax,min=0,max=5 |
x = (x - min) / (max - min) |
| zscore | method=zscore,mean=0,standard_deviation=1 |
x = (x - mean) / std |
| expression | method=expression,expr=sigmoid(x) |
自定义表达式 |
2. 表达式特征
表达式特征支持复杂的数学运算和逻辑判断,内置丰富的函数库。
配置示例:
{ "features": [ { "feature_type": "expr_feature", "feature_name": "age_price_interaction", "expression": "age * price", "variables": ["user:age", "item:price"], "value_type": "float", "value_dimension": 1 }, { "feature_type": "expr_feature", "feature_name": "ctr_sigmoid", "expression": "sigmoid(ctr)", "variables": ["item:ctr"], "value_type": "float" }, { "feature_type": "expr_feature", "feature_name": "rating_price_ratio", "expression": "rating / (price / 100 + 1)", "variables": ["item:rating", "item:price"], "value_type": "double", "default_value": "0.0" }, { "feature_type": "expr_feature", "feature_name": "age_bucket", "expression": "age >= 18 ? (age >= 60 ? 3 : (age >= 35 ? 2 : 1)) : 0", "variables": ["user:age"], "value_type": "int32" }, { "feature_type": "expr_feature", "feature_name": "log_sales", "expression": "log10(sales + 1)", "variables": ["item:sales"], "value_type": "float" } ] }
内置函数:
| 类别 | 函数 | 说明 |
| 数学函数 | sin, cos, tan, log, log10, exp, sqrt |
基础数学运算 |
| 统计函数 | sigmoid, round, floor, ceil, abs |
数值处理 |
| 向量函数 | min, max, sum, avg, len, dot |
聚合运算 |
| 距离函数 | sphere_dist, haversine, euclid_dist |
地理/向量距离 |
| 归约函数 | reduce_min, reduce_max, reduce_sum, reduce_mean |
向量归约 |
测试数据:
{ "input": { "age": {"type": "optional_float", "values": [25.0, 35.0, 45.0, 15.0, 70.0]}, "price": {"type": "optional_float", "values": [99.0, 599.0, 1999.0, 50.0, 299.0]}, "ctr": {"type": "optional_float", "values": [0.1, 0.5, 0.9, 0.01, 0.99]}, "rating": {"type": "optional_float", "values": [4.5, 3.8, 5.0, 2.5, 4.0]}, "sales": {"type": "optional_float", "values": [99.0, 999.0, 9999.0, 10.0, 100.0]} } }
3. 组合特征
组合特征用于生成多个字段的笛卡尔积,实现特征交叉。
配置示例:
{ "features": [ { "feature_type": "combo_feature", "feature_name": "gender_category_combo", "expression": ["user:gender", "item:category_id"], "need_prefix": true }, { "feature_type": "combo_feature", "feature_name": "age_brand_combo", "expression": ["user:age_group", "item:brand"], "need_prefix": true, "separator": "|" }, { "feature_type": "combo_feature", "feature_name": "city_category_combo", "expression": ["user:city", "item:category_id"], "need_prefix": false, "default_value": "unknown" }, { "feature_type": "combo_feature", "feature_name": "level_brand_combo", "expression": ["user:vip_level", "item:brand", "item:category_id"], "need_prefix": true } ] }
输出示例:
| gender | category_id | 输出 (need_prefix=true) |
| 男 | 数码 | gender_category_combo_男_数码 |
| 女 | 美妆 | gender_category_combo_女_美妆 |
| age_group | brand (多值) | 输出 |
| 青年 | Apple|iPhone | [age_brand_combo_青年_Apple, age_brand_combo_青年_iPhone] |
4. 查找匹配特征
4.1 Lookup 特征
Lookup 特征用于从字典中根据 key 查询 value,支持多 key 合并。
配置示例:
{ "features": [ { "feature_type": "lookup_feature", "feature_name": "brand_tier", "map": "user:brand_tier_map", "key": "item:brand", "need_discrete": true, "default_value": "1" }, { "feature_type": "lookup_feature", "feature_name": "category_score", "map": "user:category_score_map", "key": "item:category", "need_discrete": false, "combiner": "sum", "value_type": "float", "default_value": "0.0" }, { "feature_type": "lookup_feature", "feature_name": "tag_pref", "map": "user:tag_pref_map", "key": "item:tags", "need_discrete": false, "combiner": "mean", "separator": ",", "value_type": "float", "default_value": "0.0" }, { "feature_type": "lookup_feature", "feature_name": "brand_level", "map": "user:brand_level_map", "key": "item:brand", "need_prefix": true, "need_key": true, "default_value": "unknown" } ] }
配置说明:
| 字段名 | 说明 |
| map | 字典字段,类型为 map 或多值 string |
| key | 查询的 key |
| need_discrete | true 输出离散特征名,false 输出数值 |
| combiner | 多 key 时的合并方式:sum/mean/max/min |
| need_prefix | 是否拼接 feature_name 前缀 |
| need_key | 是否拼接 key 到输出 |
测试数据:
{ "input": { "brand_tier_map": { "type": "map_string_string", "values": [ {"Nike": "3", "Adidas": "3", "LV": "4", "Uniqlo": "2"}, {"Apple": "5", "Huawei": "5", "Xiaomi": "4"} ] }, "brand": {"type": "optional_string", "values": ["Nike", "LV", "Apple"]} } }
4.2 Match 特征
Match 特征用于两层嵌套字典的匹配查询。
配置示例:
{ "features": [ { "feature_type": "match_feature", "feature_name": "user_brand_purchase_history", "category": "item:category_name", "item": "item:brand", "user": "user:purchase_history_map", "match_type": "hit", "need_discrete": false }, { "feature_type": "match_feature", "feature_name": "user_category_ctr", "category": "ALL", "item": "item:category_id", "user": "user:category_ctr_map", "match_type": "hit", "need_discrete": true }, { "feature_type": "match_feature", "feature_name": "user_brand_pref", "category": "item:main_category", "item": "item:brand", "user": "user:brand_pref_map", "match_type": "hit", "need_discrete": false, "value_type": "float" } ] }
数据格式说明:
User 字段使用特定格式表示两层嵌套字典:
|分隔第一层 map 的 items^分隔第一层 map 的 key 和 value,分隔第二层 map 的 items:分隔第二层 map 的 key 和 value
示例:服饰^Nike:3,Adidas:2|数码^Apple:1,Huawei:2
解析为:
{ "服饰": {"Nike": 3, "Adidas": 2}, "数码": {"Apple": 1, "Huawei": 2} }
5. 重叠分析特征
Overlap 特征用于计算两个文本序列的重叠程度,常用于搜索相关性计算。
基础配置:
{ "features": [ { "feature_type": "overlap_feature", "feature_name": "query_in_title", "query": "user:query_text", "title": "item:title", "method": "is_contain", "separator": " " }, { "feature_type": "overlap_feature", "feature_name": "query_title_equal", "query": "user:query_text", "title": "item:title", "method": "is_equal", "separator": " " }, { "feature_type": "overlap_feature", "feature_name": "query_index_in_title", "query": "user:query_text", "title": "item:title", "method": "index_of", "separator": " " } ] }
高级重叠分析(含邻近度计算):
{ "features": [ { "feature_type": "overlap_feature", "feature_name": "query_category_overlap", "query": "user:tokenized_query", "title": "item:category_tags", "method": "query_common_ratio", "separator": " " }, { "feature_type": "overlap_feature", "feature_name": "title_query_overlap", "query": "user:tokenized_query", "title": "item:category_tags", "method": "title_common_ratio", "separator": " " }, { "feature_type": "overlap_feature", "feature_name": "query_proximity_min_cover", "query": "user:tokenized_query", "title": "item:category_tags", "method": "proximity_min_cover", "separator": " " }, { "feature_type": "overlap_feature", "feature_name": "query_proximity_min_dist", "query": "user:tokenized_query", "title": "item:category_tags", "method": "proximity_min_dist", "separator": " " } ] }
Method 类型说明:
| Method | 说明 | 输出范围 |
| query_common_ratio | query 中匹配 term 占 query 总 term 的比例 | [0, 1] |
| title_common_ratio | 匹配 term 占 title 总 term 的比例 | [0, 1] |
| is_contain | query 是否全部包含在 title 中(保持顺序) | 0/1 |
| is_equal | query 与 title 是否完全相同 | 0/1 |
| index_of | query 第一次出现在 title 中的位置 | -1 或 >=0 |
| proximity_min_cover | 覆盖所有 query term 的最短片段长度 | [0, len(title)] |
| proximity_min_dist | query term 间的最小成对距离 | [0, len(title)+1] |
| proximity_max_dist | query term 间的最大成对距离 | [0, len(title)+1] |
| proximity_avg_dist | query term 间的平均成对距离 | [0, len(title)+1] |
6. 序列特征
6.1 Sequence 特征
Sequence 特征用于处理用户历史行为序列。
配置示例:
{ "features": [ { "feature_type": "sequence_feature", "feature_name": "user_click_item_seq", "sequence_name": "click_seq", "sequence_length": 10, "sequence_pk": "user:user_id", "sequence_table": "click_seq", "sequence_delim": ";", "features": [ { "feature_name": "item_id", "feature_type": "id_feature", "expression": "item_id", "need_prefix": false } ] }, { "feature_type": "sequence_feature", "feature_name": "user_purchase_item_seq", "sequence_name": "purchase_seq", "sequence_length": 5, "sequence_pk": "user:user_id", "sequence_table": "purchase_seq", "sequence_delim": ",", "features": [ { "feature_name": "item_id", "feature_type": "id_feature", "expression": "item_id", "need_prefix": false } ] } ] }
配置说明:
| 字段名 | 说明 |
| sequence_name | 序列名称 |
| sequence_length | 序列最大长度,超出部分截断 |
| sequence_pk | 主键字段,如 user_id |
| sequence_delim | 序列元素分隔符 |
| features | 序列中每个元素的子特征配置 |
6.2 Sequence Combine 特征
Sequence Combine 特征用于对序列中的多值元素进行合并操作。
配置示例:
{ "features": [ { "feature_type": "sequence_combine_feature", "feature_name": "user_behavior_weighted_sum", "expression": "user:behavior_seq", "combiner": "sum", "need_discrete": false, "separator": "," }, { "feature_type": "sequence_combine_feature", "feature_name": "user_behavior_mean", "expression": "user:behavior_seq", "combiner": "mean", "separator": "," }, { "feature_type": "sequence_combine_feature", "feature_name": "user_action_score", "expression": "user:action_events", "combiner": "sum", "separator": "|", "sequence_delim": ";", "value_map": { "expo": 1, "click": 2, "buy": 4, "cart": 3 } }, { "feature_type": "sequence_combine_feature", "feature_name": "user_behavior_max", "expression": "user:behavior_seq", "combiner": "max", "separator": "," } ] }
Combiner 类型:
| 类型 | 说明 |
| sum | 求和 |
| mean | 平均值 |
| max | 最大值 |
| min | 最小值 |
| count | 计数 |
Value Map 功能:
当输入是行为事件字符串(如 "expo|click|buy")时,可以通过 value_map 将事件映射为数值后再合并。
示例:输入 "expo|click|buy",value_map 为 {"expo": 1, "click": 2, "buy": 4},输出为 7。
7. 文本处理特征
7.1 Tokenize 特征
Tokenize 特征用于对文本进行分词处理。
配置示例:
{ "features": [ { "feature_type": "tokenize_feature", "feature_name": "item_title_tokens", "expression": "item:title", "vocab_file": "tokenizer.json", "output_type": "word" }, { "feature_type": "tokenize_feature", "feature_name": "item_title_word_ids", "expression": "item:title", "vocab_file": "tokenizer.json", "output_type": "word_id" }, { "feature_type": "tokenize_feature", "feature_name": "item_desc_tokens", "expression": "item:description", "vocab_file": "tokenizer.json", "output_type": "word", "default_value": "" } ] }
配置说明:
| 字段名 | 说明 |
| vocab_file | 分词词典文件路径 |
| output_type | word(输出词)或 word_id(输出词 ID) |
| output_delim | 输出分隔符(离线任务使用) |
| tokenizer_type | sentencepiece 或其他 huggingface 分词器 |
支持的词典格式:
tokenizer.json(BPE)bert-base-chinese-vocab.json(WordPiece)spiece.model(SentencePiece)
7.2 Text Normalizer 特征
Text Normalizer 特征用于文本归一化处理,包括大小写转换、简繁体转换、全半角转换等。
配置示例:
{ "features": [ { "feature_type": "text_normalizer", "feature_name": "query_normalized", "expression": "user:query_text", "parameter": 15 }, { "feature_type": "text_normalizer", "feature_name": "title_normalized", "expression": "item:title", "parameter": 60, "remove_space": true }, { "feature_type": "text_normalizer", "feature_name": "description_lower", "expression": "item:description", "parameter": 4, "max_length": 500 }, { "feature_type": "text_normalizer", "feature_name": "content_splitchars", "expression": "item:content", "parameter": 512 } ] }
Parameter 参数(可组合使用):
| 值 | 功能 |
| 2 | 小写转大写 |
| 4 | 大写转小写 |
| 8 | 全角转半角 |
| 16 | 繁体转简体 |
| 32 | 去特殊符号 |
| 512 | 汉字拆成单字(空格分隔) |
常用组合:
15= 4+8+16+32 = 大写转小写 + 全角转半角 + 繁体转简体 + 去特殊符号60= 4+8+16+32 = 同上4= 仅大写转小写512= 汉字拆分
8. 高级特征算子
8.1 BM25 特征
BM25 特征用于计算搜索相关性评分。
配置示例:
{ "features": [ { "feature_type": "bm25_feature", "feature_name": "query_title_bm25", "query": "user:tokenized_query", "document": "item:tokenized_title", "k1": 1.2, "b": 0.75, "separator": " ", "avg_doc_length": 10, "document_number": 1000, "term_doc_freq_dict": { "iPhone": 5, "15": 10, "手机": 20, "Apple": 3, "运动鞋": 15, "跑步": 12, "Nike": 4 } }, { "feature_type": "bm25_feature", "feature_name": "query_desc_bm25", "query": "user:tokenized_query", "document": "item:tokenized_desc", "k1": 2.0, "b": 0.5, "separator": " ", "avg_doc_length": 20, "document_number": 500, "term_doc_freq_dict": { "高品质": 10, "舒适": 15, "智能": 20, "拍照": 15 } } ] }
配置说明:
| 字段名 | 说明 |
| query | 查询词字段 |
| document | 文档字段 |
| k1 | BM25 参数,控制词频饱和度,通常 1.2-2.0 |
| b | BM25 参数,控制文档长度影响,通常 0.5-0.75 |
| avg_doc_length | 平均文档长度 |
| document_number | 文档总数 |
| term_doc_freq_dict | 词频字典,key 为词,value 为包含该词的文档数 |
| term_doc_freq_file | 词频文件路径(与 dict 二选一) |
8.2 KV 点积特征
KV 点积特征用于计算两个 key-value 向量的点积,或两个集合的交集大小。
配置示例:
{ "features": [ { "feature_type": "kv_dot_product", "feature_name": "user_item_interest_score", "query": "user:interest_kv", "document": "item:feature_kv", "separator": ",", "kv_delimiter": ":" }, { "feature_type": "kv_dot_product", "feature_name": "user_tag_item_tag_sim", "query": "user:user_tags_kv", "document": "item:item_tags_kv", "separator": "|", "kv_delimiter": "=" }, { "feature_type": "kv_dot_product", "feature_name": "user_category_pref", "query": "user:category_pref_kv", "document": "item:category_score_kv", "separator": ";", "kv_delimiter": ":", "default_value": "0.0" } ] }
数据格式示例:
| 输入类型 | 格式示例 |
| 逗号分隔 KV | "运动:0.8,音乐:0.5,美食:0.3" |
| 竖线分隔 KV | `"运动=0.9 |
| 分号分隔 KV | "数码:0.9;服饰:0.7" |
| 纯标签列表 | ["a", "b", "c"](无 value 时默认值为 1.0) |
8.3 字符串替换特征
字符串替换特征用于将匹配的所有子串替换为指定内容。
配置示例:
{ "features": [ { "feature_type": "str_replace_feature", "feature_name": "category_normalized", "expression": "item:category_raw", "replacements": { "手机": "智能手机", "电脑": "笔记本电脑", "平板": "平板电脑" } }, { "feature_type": "str_replace_feature", "feature_name": "brand_normalized", "expression": "item:brand_raw", "replacements": { "苹果": "Apple", "华为": "Huawei", "小米": "Xiaomi", "联想": "Lenovo" } }, { "feature_type": "str_replace_feature", "feature_name": "text_cleaned", "expression": "user:input_text", "replacements": { "|": "", " ": "", "\t": "" } } ] }
高级配置(使用替换文件):
{ "feature_type": "str_replace_feature", "feature_name": "norm_str", "expression": ["user:profile"], "default_value": "", "replace_file": "synonyms.txt", "replacements": { "|": "", "aa": "x", "a": "X" }, "value_dimension": 1 }
替换文件格式:
每行一个替换规则,使用 Tab 分隔:
原始文本\t替换文本 手机\t智能手机 电脑\t笔记本电脑
8.4 正则替换特征
正则替换特征使用正则表达式进行文本替换。
配置示例:
{ "features": [ { "feature_type": "regex_replace_feature", "feature_name": "phone_cleaned", "expression": "user:phone_input", "regex_pattern": "[^0-9]", "replacement": "", "icase": false, "replace_all": true }, { "feature_type": "regex_replace_feature", "feature_name": "email_cleaned", "expression": "user:email_input", "regex_pattern": "\\s+", "replacement": "", "icase": false, "replace_all": true }, { "feature_type": "regex_replace_feature", "feature_name": "text_normalized", "expression": "user:text_input", "regex_pattern": "[\\|\\#\\(.*\\)]", "replacement": " ", "icase": true, "replace_all": true }, { "feature_type": "regex_replace_feature", "feature_name": "html_tag_removed", "expression": "item:html_content", "regex_pattern": "<[^>]+>", "replacement": "", "icase": false, "replace_all": true } ] }
配置说明:
| 字段名 | 说明 |
| regex_pattern | 正则表达式模式 |
| replacement | 替换文本,为空则删除匹配内容 |
| replace_all | 是否全局替换,默认 true |
| icase | 是否忽略大小写,默认 false |
常用正则示例:
| 用途 | 正则表达式 |
| 提取数字 | [^0-9] |
| 去除空格 | \s+ |
| 去除 HTML 标签 | <[^>]+> |
| 去除特殊符号 | [|\#\(\)\[\]] |
8.5 布尔掩码特征
Bool Mask 特征用于根据布尔值过滤序列元素,类似 tf.boolean_mask。
配置示例:
{ "features": [ { "feature_type": "bool_mask_feature", "feature_name": "valid_click_seq", "expression": ["user:click_item_seq", "user:click_valid_mask"], "value_type": "string" }, { "feature_type": "bool_mask_feature", "feature_name": "high_price_items", "expression": ["user:view_item_prices", "user:high_price_mask"], "value_type": "float", "sequence_delim": "," }, { "feature_type": "bool_mask_feature", "feature_name": "recent_actions", "expression": ["user:action_seq", "user:recent_mask"], "value_type": "string", "sequence_delim": "|" } ] }
工作原理:
- expression 第一个字段为待过滤的序列
- expression 第二个字段为 mask(布尔值或 0/1)
- mask 为 true/1 的元素保留,false/0 的元素过滤
示例:
| click_item_seq | click_valid_mask | 输出 |
| "1001,1002,1003,1004,1005" | "1,0,1,1,0" | ["1001", "1003", "1004"] |
8.6 切片特征
Slice 特征用于对数组进行切片操作,支持 Python 风格的切片语法。
配置示例:
{ "features": [ { "feature_type": "slice_feature", "feature_name": "recent_10_clicks", "expression": "user:click_seq", "slice": "0:10:1", "value_type": "string" }, { "feature_type": "slice_feature", "feature_name": "last_5_items", "expression": "user:click_seq", "slice": "-5:", "value_type": "string" }, { "feature_type": "slice_feature", "feature_name": "even_position_items", "expression": "user:click_seq", "slice": "::2", "value_type": "string" }, { "feature_type": "slice_feature", "feature_name": "reversed_seq", "expression": "user:click_seq", "slice": "::-1", "value_type": "string" }, { "feature_type": "slice_feature", "feature_name": "first_item", "expression": "user:click_seq", "slice": "0", "value_type": "string" } ] }
Slice 语法说明:
| Slice | 说明 | 示例(输入 [1,2,3,4,5]) |
0:10:1 |
从索引 0 到 10,步长 1 | [1,2,3,4,5] |
-5: |
最后 5 个元素 | [1,2,3,4,5] |
::2 |
从头到尾,步长 2(偶数索引) | [1,3,5] |
::-1 |
反转序列 | [5,4,3,2,1] |
0 |
单个索引(取第一个元素) | 1 |
1:3 |
索引 1 到 3(不含) | [2,3] |
:2 |
从开始到索引 2(不含) | [1,2] |
2: |
从索引 2 到结束 | [3,4,5] |
附录:通用配置字段
以下字段在多数特征算子中通用:
| 字段名 | 类型 | 说明 |
| feature_type | string | 特征算子类型(必填) |
| feature_name | string | 输出特征名称(必填) |
| expression | string/array | 输入字段来源 |
| value_type | string | 输出值类型:string/int32/int64/float/double |
| default_value | string | 空值时的默认值 |
| value_dimension | int | 输出维度,0 表示不截断 |
| separator | string | 多值分隔符,默认 \u001D |
| stub_type | bool | true 表示仅作为中间结果,不输出给模型 |
| normalizer | string | 归一化配置 |
| num_buckets | int | 分箱数量(用于离散化) |
| hash_bucket_size | int | Hash 分桶大小 |
总结
FG 提供了丰富的特征算子,覆盖从基础特征处理到复杂特征工程的各种场景:
- 基础算子:id_feature、raw_feature 处理离散和连续特征
- 计算算子:expr_feature 支持复杂表达式计算
- 交叉算子:combo_feature 实现特征笛卡尔积
- 查找算子:lookup_feature、match_feature 支持字典查询
- 文本算子:overlap_feature、bm25_feature 处理文本相关性
- 序列算子:sequence_feature、sequence_combine_feature 处理行为序列
- 处理算子:tokenize_feature、text_normalizer 进行文本预处理
- 变换算子:str_replace_feature、regex_replace_feature 进行字符串替换
- 工具算子:bool_mask_feature、slice_feature 提供数组操作能力
根据业务场景选择合适的特征算子,可以高效地完成特征工程任务。