Feature Generator(FG)特征算子配置指南

简介: 本文档全面介绍Feature Generator(FG)的各类特征算子配置方法,涵盖基础(ID/原始特征)、计算(表达式)、交叉(组合)、查找(Lookup/Match)、文本(重叠/BM25)、序列、预处理(分词/归一化)及字符串处理(正则替换/切片)等9大类算子,附详细配置示例与说明。

FG 特征算子配置指南

本文档详细介绍 Feature Generator (FG) 的各种特征算子配置方法,帮助您快速掌握不同场景下的特征工程配置技巧。

参考:Feature Generator 快速开始



1. 基础特征算子

1.1 ID 特征

ID 特征用于处理离散型特征,支持单值和多值场景。

配置示例:

{
  "features": [
    {
      "feature_type": "id_feature",
      "feature_name": "user_vip_level",
      "expression": "user:vip_level",
      "need_prefix": true,
      "default_value": "0"
    },
    {
      "feature_type": "id_feature",
      "feature_name": "user_gender",
      "expression": "user:gender",
      "need_prefix": false,
      "default_value": "unknown"
    },
    {
      "feature_type": "id_feature",
      "feature_name": "item_category",
      "expression": "item:category",
      "need_prefix": true,
      "separator": "|"
    },
    {
      "feature_type": "id_feature",
      "feature_name": "user_tags",
      "expression": "user:tags",
      "need_prefix": true,
      "separator": ",",
      "value_dimension": 3
    }
  ]
}

配置说明:

字段名 是否必选 说明
feature_type 固定为 id_feature
feature_name 输出特征名称
expression 输入字段来源,格式为 domain:field_name
need_prefix 是否拼接 feature_name 作为前缀,默认 false
separator 多值分隔符,默认 \u001D
default_value 空值时的默认值
value_dimension 输出维度,0 表示不截断

测试数据示例:

{
  "input": {
    "vip_level": {"type": "optional_string", "values": ["黄金", "钻石", "白银", "铂金", null]},
    "gender": {"type": "optional_string", "values": ["男", "女", null, "男", "女"]},
    "category": {"type": "optional_string", "values": ["数码|手机", "服饰|运动", "家居|厨具"]},
    "tags": {"type": "optional_string", "values": ["运动,音乐,美食", "科技,游戏,阅读"]}
  },
  "expected_output": {
    "user_vip_level": ["黄金", "钻石", "白银", "铂金", "0"],
    "user_gender": ["男", "女", "unknown", "男", "女"],
    "item_category": [["数码", "手机"], ["服饰", "运动"], ["家居", "厨具"]],
    "user_tags": [["运动", "音乐", "美食"], ["科技", "游戏", "阅读"]]
  }
}

1.2 原始特征

原始特征用于处理连续型数值特征,支持多种归一化方法。

配置示例:

{
  "features": [
    {
      "feature_type": "raw_feature",
      "feature_name": "item_price_norm",
      "expression": "item:price",
      "value_type": "float"
    },
    {
      "feature_type": "raw_feature",
      "feature_name": "item_ctr_log",
      "expression": "item:ctr",
      "normalizer": "method=log10"
    },
    {
      "feature_type": "raw_feature",
      "feature_name": "item_rating_norm",
      "expression": "item:rating",
      "normalizer": "method=minmax,min=0.0,max=5.0"
    },
    {
      "feature_type": "raw_feature",
      "feature_name": "item_sales_zscore",
      "expression": "item:sales",
      "normalizer": "method=zscore,mean=1000.0,standard_deviation=500.0"
    },
    {
      "feature_type": "raw_feature",
      "feature_name": "item_discount",
      "expression": "item:discount",
      "value_type": "double",
      "default_value": "1.0"
    }
  ]
}

归一化方法说明:

方法 配置示例 计算公式
log10 method=log10 x = log10(x)
minmax method=minmax,min=0,max=5 x = (x - min) / (max - min)
zscore method=zscore,mean=0,standard_deviation=1 x = (x - mean) / std
expression method=expression,expr=sigmoid(x) 自定义表达式

2. 表达式特征

表达式特征支持复杂的数学运算和逻辑判断,内置丰富的函数库。

配置示例:

{
  "features": [
    {
      "feature_type": "expr_feature",
      "feature_name": "age_price_interaction",
      "expression": "age * price",
      "variables": ["user:age", "item:price"],
      "value_type": "float",
      "value_dimension": 1
    },
    {
      "feature_type": "expr_feature",
      "feature_name": "ctr_sigmoid",
      "expression": "sigmoid(ctr)",
      "variables": ["item:ctr"],
      "value_type": "float"
    },
    {
      "feature_type": "expr_feature",
      "feature_name": "rating_price_ratio",
      "expression": "rating / (price / 100 + 1)",
      "variables": ["item:rating", "item:price"],
      "value_type": "double",
      "default_value": "0.0"
    },
    {
      "feature_type": "expr_feature",
      "feature_name": "age_bucket",
      "expression": "age >= 18 ? (age >= 60 ? 3 : (age >= 35 ? 2 : 1)) : 0",
      "variables": ["user:age"],
      "value_type": "int32"
    },
    {
      "feature_type": "expr_feature",
      "feature_name": "log_sales",
      "expression": "log10(sales + 1)",
      "variables": ["item:sales"],
      "value_type": "float"
    }
  ]
}

内置函数:

类别 函数 说明
数学函数 sin, cos, tan, log, log10, exp, sqrt 基础数学运算
统计函数 sigmoid, round, floor, ceil, abs 数值处理
向量函数 min, max, sum, avg, len, dot 聚合运算
距离函数 sphere_dist, haversine, euclid_dist 地理/向量距离
归约函数 reduce_min, reduce_max, reduce_sum, reduce_mean 向量归约

测试数据:

{
  "input": {
    "age": {"type": "optional_float", "values": [25.0, 35.0, 45.0, 15.0, 70.0]},
    "price": {"type": "optional_float", "values": [99.0, 599.0, 1999.0, 50.0, 299.0]},
    "ctr": {"type": "optional_float", "values": [0.1, 0.5, 0.9, 0.01, 0.99]},
    "rating": {"type": "optional_float", "values": [4.5, 3.8, 5.0, 2.5, 4.0]},
    "sales": {"type": "optional_float", "values": [99.0, 999.0, 9999.0, 10.0, 100.0]}
  }
}

3. 组合特征

组合特征用于生成多个字段的笛卡尔积,实现特征交叉。

配置示例:

{
  "features": [
    {
      "feature_type": "combo_feature",
      "feature_name": "gender_category_combo",
      "expression": ["user:gender", "item:category_id"],
      "need_prefix": true
    },
    {
      "feature_type": "combo_feature",
      "feature_name": "age_brand_combo",
      "expression": ["user:age_group", "item:brand"],
      "need_prefix": true,
      "separator": "|"
    },
    {
      "feature_type": "combo_feature",
      "feature_name": "city_category_combo",
      "expression": ["user:city", "item:category_id"],
      "need_prefix": false,
      "default_value": "unknown"
    },
    {
      "feature_type": "combo_feature",
      "feature_name": "level_brand_combo",
      "expression": ["user:vip_level", "item:brand", "item:category_id"],
      "need_prefix": true
    }
  ]
}

输出示例:

gender category_id 输出 (need_prefix=true)
数码 gender_category_combo_男_数码
美妆 gender_category_combo_女_美妆
age_group brand (多值) 输出
青年 Apple|iPhone [age_brand_combo_青年_Apple, age_brand_combo_青年_iPhone]

4. 查找匹配特征

4.1 Lookup 特征

Lookup 特征用于从字典中根据 key 查询 value,支持多 key 合并。

配置示例:

{
  "features": [
    {
      "feature_type": "lookup_feature",
      "feature_name": "brand_tier",
      "map": "user:brand_tier_map",
      "key": "item:brand",
      "need_discrete": true,
      "default_value": "1"
    },
    {
      "feature_type": "lookup_feature",
      "feature_name": "category_score",
      "map": "user:category_score_map",
      "key": "item:category",
      "need_discrete": false,
      "combiner": "sum",
      "value_type": "float",
      "default_value": "0.0"
    },
    {
      "feature_type": "lookup_feature",
      "feature_name": "tag_pref",
      "map": "user:tag_pref_map",
      "key": "item:tags",
      "need_discrete": false,
      "combiner": "mean",
      "separator": ",",
      "value_type": "float",
      "default_value": "0.0"
    },
    {
      "feature_type": "lookup_feature",
      "feature_name": "brand_level",
      "map": "user:brand_level_map",
      "key": "item:brand",
      "need_prefix": true,
      "need_key": true,
      "default_value": "unknown"
    }
  ]
}

配置说明:

字段名 说明
map 字典字段,类型为 map 或多值 string
key 查询的 key
need_discrete true 输出离散特征名,false 输出数值
combiner 多 key 时的合并方式:sum/mean/max/min
need_prefix 是否拼接 feature_name 前缀
need_key 是否拼接 key 到输出

测试数据:

{
  "input": {
    "brand_tier_map": {
      "type": "map_string_string",
      "values": [
        {"Nike": "3", "Adidas": "3", "LV": "4", "Uniqlo": "2"},
        {"Apple": "5", "Huawei": "5", "Xiaomi": "4"}
      ]
    },
    "brand": {"type": "optional_string", "values": ["Nike", "LV", "Apple"]}
  }
}

4.2 Match 特征

Match 特征用于两层嵌套字典的匹配查询。

配置示例:

{
  "features": [
    {
      "feature_type": "match_feature",
      "feature_name": "user_brand_purchase_history",
      "category": "item:category_name",
      "item": "item:brand",
      "user": "user:purchase_history_map",
      "match_type": "hit",
      "need_discrete": false
    },
    {
      "feature_type": "match_feature",
      "feature_name": "user_category_ctr",
      "category": "ALL",
      "item": "item:category_id",
      "user": "user:category_ctr_map",
      "match_type": "hit",
      "need_discrete": true
    },
    {
      "feature_type": "match_feature",
      "feature_name": "user_brand_pref",
      "category": "item:main_category",
      "item": "item:brand",
      "user": "user:brand_pref_map",
      "match_type": "hit",
      "need_discrete": false,
      "value_type": "float"
    }
  ]
}

数据格式说明:

User 字段使用特定格式表示两层嵌套字典:

  • | 分隔第一层 map 的 items
  • ^ 分隔第一层 map 的 key 和 value
  • , 分隔第二层 map 的 items
  • : 分隔第二层 map 的 key 和 value

示例:服饰^Nike:3,Adidas:2|数码^Apple:1,Huawei:2

解析为:

{
  "服饰": {"Nike": 3, "Adidas": 2},
  "数码": {"Apple": 1, "Huawei": 2}
}

5. 重叠分析特征

Overlap 特征用于计算两个文本序列的重叠程度,常用于搜索相关性计算。

基础配置:

{
  "features": [
    {
      "feature_type": "overlap_feature",
      "feature_name": "query_in_title",
      "query": "user:query_text",
      "title": "item:title",
      "method": "is_contain",
      "separator": " "
    },
    {
      "feature_type": "overlap_feature",
      "feature_name": "query_title_equal",
      "query": "user:query_text",
      "title": "item:title",
      "method": "is_equal",
      "separator": " "
    },
    {
      "feature_type": "overlap_feature",
      "feature_name": "query_index_in_title",
      "query": "user:query_text",
      "title": "item:title",
      "method": "index_of",
      "separator": " "
    }
  ]
}

高级重叠分析(含邻近度计算):

{
  "features": [
    {
      "feature_type": "overlap_feature",
      "feature_name": "query_category_overlap",
      "query": "user:tokenized_query",
      "title": "item:category_tags",
      "method": "query_common_ratio",
      "separator": " "
    },
    {
      "feature_type": "overlap_feature",
      "feature_name": "title_query_overlap",
      "query": "user:tokenized_query",
      "title": "item:category_tags",
      "method": "title_common_ratio",
      "separator": " "
    },
    {
      "feature_type": "overlap_feature",
      "feature_name": "query_proximity_min_cover",
      "query": "user:tokenized_query",
      "title": "item:category_tags",
      "method": "proximity_min_cover",
      "separator": " "
    },
    {
      "feature_type": "overlap_feature",
      "feature_name": "query_proximity_min_dist",
      "query": "user:tokenized_query",
      "title": "item:category_tags",
      "method": "proximity_min_dist",
      "separator": " "
    }
  ]
}

Method 类型说明:

Method 说明 输出范围
query_common_ratio query 中匹配 term 占 query 总 term 的比例 [0, 1]
title_common_ratio 匹配 term 占 title 总 term 的比例 [0, 1]
is_contain query 是否全部包含在 title 中(保持顺序) 0/1
is_equal query 与 title 是否完全相同 0/1
index_of query 第一次出现在 title 中的位置 -1 或 >=0
proximity_min_cover 覆盖所有 query term 的最短片段长度 [0, len(title)]
proximity_min_dist query term 间的最小成对距离 [0, len(title)+1]
proximity_max_dist query term 间的最大成对距离 [0, len(title)+1]
proximity_avg_dist query term 间的平均成对距离 [0, len(title)+1]

6. 序列特征

6.1 Sequence 特征

Sequence 特征用于处理用户历史行为序列。

配置示例:

{
  "features": [
    {
      "feature_type": "sequence_feature",
      "feature_name": "user_click_item_seq",
      "sequence_name": "click_seq",
      "sequence_length": 10,
      "sequence_pk": "user:user_id",
      "sequence_table": "click_seq",
      "sequence_delim": ";",
      "features": [
        {
          "feature_name": "item_id",
          "feature_type": "id_feature",
          "expression": "item_id",
          "need_prefix": false
        }
      ]
    },
    {
      "feature_type": "sequence_feature",
      "feature_name": "user_purchase_item_seq",
      "sequence_name": "purchase_seq",
      "sequence_length": 5,
      "sequence_pk": "user:user_id",
      "sequence_table": "purchase_seq",
      "sequence_delim": ",",
      "features": [
        {
          "feature_name": "item_id",
          "feature_type": "id_feature",
          "expression": "item_id",
          "need_prefix": false
        }
      ]
    }
  ]
}

配置说明:

字段名 说明
sequence_name 序列名称
sequence_length 序列最大长度,超出部分截断
sequence_pk 主键字段,如 user_id
sequence_delim 序列元素分隔符
features 序列中每个元素的子特征配置

6.2 Sequence Combine 特征

Sequence Combine 特征用于对序列中的多值元素进行合并操作。

配置示例:

{
  "features": [
    {
      "feature_type": "sequence_combine_feature",
      "feature_name": "user_behavior_weighted_sum",
      "expression": "user:behavior_seq",
      "combiner": "sum",
      "need_discrete": false,
      "separator": ","
    },
    {
      "feature_type": "sequence_combine_feature",
      "feature_name": "user_behavior_mean",
      "expression": "user:behavior_seq",
      "combiner": "mean",
      "separator": ","
    },
    {
      "feature_type": "sequence_combine_feature",
      "feature_name": "user_action_score",
      "expression": "user:action_events",
      "combiner": "sum",
      "separator": "|",
      "sequence_delim": ";",
      "value_map": {
        "expo": 1,
        "click": 2,
        "buy": 4,
        "cart": 3
      }
    },
    {
      "feature_type": "sequence_combine_feature",
      "feature_name": "user_behavior_max",
      "expression": "user:behavior_seq",
      "combiner": "max",
      "separator": ","
    }
  ]
}

Combiner 类型:

类型 说明
sum 求和
mean 平均值
max 最大值
min 最小值
count 计数

Value Map 功能:

当输入是行为事件字符串(如 "expo|click|buy")时,可以通过 value_map 将事件映射为数值后再合并。

示例:输入 "expo|click|buy",value_map 为 {"expo": 1, "click": 2, "buy": 4},输出为 7


7. 文本处理特征

7.1 Tokenize 特征

Tokenize 特征用于对文本进行分词处理。

配置示例:

{
  "features": [
    {
      "feature_type": "tokenize_feature",
      "feature_name": "item_title_tokens",
      "expression": "item:title",
      "vocab_file": "tokenizer.json",
      "output_type": "word"
    },
    {
      "feature_type": "tokenize_feature",
      "feature_name": "item_title_word_ids",
      "expression": "item:title",
      "vocab_file": "tokenizer.json",
      "output_type": "word_id"
    },
    {
      "feature_type": "tokenize_feature",
      "feature_name": "item_desc_tokens",
      "expression": "item:description",
      "vocab_file": "tokenizer.json",
      "output_type": "word",
      "default_value": ""
    }
  ]
}

配置说明:

字段名 说明
vocab_file 分词词典文件路径
output_type word(输出词)或 word_id(输出词 ID)
output_delim 输出分隔符(离线任务使用)
tokenizer_type sentencepiece 或其他 huggingface 分词器

支持的词典格式:

  • tokenizer.json (BPE)
  • bert-base-chinese-vocab.json (WordPiece)
  • spiece.model (SentencePiece)

7.2 Text Normalizer 特征

Text Normalizer 特征用于文本归一化处理,包括大小写转换、简繁体转换、全半角转换等。

配置示例:

{
  "features": [
    {
      "feature_type": "text_normalizer",
      "feature_name": "query_normalized",
      "expression": "user:query_text",
      "parameter": 15
    },
    {
      "feature_type": "text_normalizer",
      "feature_name": "title_normalized",
      "expression": "item:title",
      "parameter": 60,
      "remove_space": true
    },
    {
      "feature_type": "text_normalizer",
      "feature_name": "description_lower",
      "expression": "item:description",
      "parameter": 4,
      "max_length": 500
    },
    {
      "feature_type": "text_normalizer",
      "feature_name": "content_splitchars",
      "expression": "item:content",
      "parameter": 512
    }
  ]
}

Parameter 参数(可组合使用):

功能
2 小写转大写
4 大写转小写
8 全角转半角
16 繁体转简体
32 去特殊符号
512 汉字拆成单字(空格分隔)

常用组合:

  • 15 = 4+8+16+32 = 大写转小写 + 全角转半角 + 繁体转简体 + 去特殊符号
  • 60 = 4+8+16+32 = 同上
  • 4 = 仅大写转小写
  • 512 = 汉字拆分

8. 高级特征算子

8.1 BM25 特征

BM25 特征用于计算搜索相关性评分。

配置示例:

{
  "features": [
    {
      "feature_type": "bm25_feature",
      "feature_name": "query_title_bm25",
      "query": "user:tokenized_query",
      "document": "item:tokenized_title",
      "k1": 1.2,
      "b": 0.75,
      "separator": " ",
      "avg_doc_length": 10,
      "document_number": 1000,
      "term_doc_freq_dict": {
        "iPhone": 5,
        "15": 10,
        "手机": 20,
        "Apple": 3,
        "运动鞋": 15,
        "跑步": 12,
        "Nike": 4
      }
    },
    {
      "feature_type": "bm25_feature",
      "feature_name": "query_desc_bm25",
      "query": "user:tokenized_query",
      "document": "item:tokenized_desc",
      "k1": 2.0,
      "b": 0.5,
      "separator": " ",
      "avg_doc_length": 20,
      "document_number": 500,
      "term_doc_freq_dict": {
        "高品质": 10,
        "舒适": 15,
        "智能": 20,
        "拍照": 15
      }
    }
  ]
}

配置说明:

字段名 说明
query 查询词字段
document 文档字段
k1 BM25 参数,控制词频饱和度,通常 1.2-2.0
b BM25 参数,控制文档长度影响,通常 0.5-0.75
avg_doc_length 平均文档长度
document_number 文档总数
term_doc_freq_dict 词频字典,key 为词,value 为包含该词的文档数
term_doc_freq_file 词频文件路径(与 dict 二选一)

8.2 KV 点积特征

KV 点积特征用于计算两个 key-value 向量的点积,或两个集合的交集大小。

配置示例:

{
  "features": [
    {
      "feature_type": "kv_dot_product",
      "feature_name": "user_item_interest_score",
      "query": "user:interest_kv",
      "document": "item:feature_kv",
      "separator": ",",
      "kv_delimiter": ":"
    },
    {
      "feature_type": "kv_dot_product",
      "feature_name": "user_tag_item_tag_sim",
      "query": "user:user_tags_kv",
      "document": "item:item_tags_kv",
      "separator": "|",
      "kv_delimiter": "="
    },
    {
      "feature_type": "kv_dot_product",
      "feature_name": "user_category_pref",
      "query": "user:category_pref_kv",
      "document": "item:category_score_kv",
      "separator": ";",
      "kv_delimiter": ":",
      "default_value": "0.0"
    }
  ]
}

数据格式示例:

输入类型 格式示例
逗号分隔 KV "运动:0.8,音乐:0.5,美食:0.3"
竖线分隔 KV `"运动=0.9
分号分隔 KV "数码:0.9;服饰:0.7"
纯标签列表 ["a", "b", "c"](无 value 时默认值为 1.0)

8.3 字符串替换特征

字符串替换特征用于将匹配的所有子串替换为指定内容。

配置示例:

{
  "features": [
    {
      "feature_type": "str_replace_feature",
      "feature_name": "category_normalized",
      "expression": "item:category_raw",
      "replacements": {
        "手机": "智能手机",
        "电脑": "笔记本电脑",
        "平板": "平板电脑"
      }
    },
    {
      "feature_type": "str_replace_feature",
      "feature_name": "brand_normalized",
      "expression": "item:brand_raw",
      "replacements": {
        "苹果": "Apple",
        "华为": "Huawei",
        "小米": "Xiaomi",
        "联想": "Lenovo"
      }
    },
    {
      "feature_type": "str_replace_feature",
      "feature_name": "text_cleaned",
      "expression": "user:input_text",
      "replacements": {
        "|": "",
        " ": "",
        "\t": ""
      }
    }
  ]
}

高级配置(使用替换文件):

{
  "feature_type": "str_replace_feature",
  "feature_name": "norm_str",
  "expression": ["user:profile"],
  "default_value": "",
  "replace_file": "synonyms.txt",
  "replacements": {
    "|": "",
    "aa": "x",
    "a": "X"
  },
  "value_dimension": 1
}

替换文件格式:

每行一个替换规则,使用 Tab 分隔:

原始文本\t替换文本
手机\t智能手机
电脑\t笔记本电脑

8.4 正则替换特征

正则替换特征使用正则表达式进行文本替换。

配置示例:

{
  "features": [
    {
      "feature_type": "regex_replace_feature",
      "feature_name": "phone_cleaned",
      "expression": "user:phone_input",
      "regex_pattern": "[^0-9]",
      "replacement": "",
      "icase": false,
      "replace_all": true
    },
    {
      "feature_type": "regex_replace_feature",
      "feature_name": "email_cleaned",
      "expression": "user:email_input",
      "regex_pattern": "\\s+",
      "replacement": "",
      "icase": false,
      "replace_all": true
    },
    {
      "feature_type": "regex_replace_feature",
      "feature_name": "text_normalized",
      "expression": "user:text_input",
      "regex_pattern": "[\\|\\#\\(.*\\)]",
      "replacement": " ",
      "icase": true,
      "replace_all": true
    },
    {
      "feature_type": "regex_replace_feature",
      "feature_name": "html_tag_removed",
      "expression": "item:html_content",
      "regex_pattern": "<[^>]+>",
      "replacement": "",
      "icase": false,
      "replace_all": true
    }
  ]
}

配置说明:

字段名 说明
regex_pattern 正则表达式模式
replacement 替换文本,为空则删除匹配内容
replace_all 是否全局替换,默认 true
icase 是否忽略大小写,默认 false

常用正则示例:

用途 正则表达式
提取数字 [^0-9]
去除空格 \s+
去除 HTML 标签 <[^>]+>
去除特殊符号 [|\#\(\)\[\]]

8.5 布尔掩码特征

Bool Mask 特征用于根据布尔值过滤序列元素,类似 tf.boolean_mask

配置示例:

{
  "features": [
    {
      "feature_type": "bool_mask_feature",
      "feature_name": "valid_click_seq",
      "expression": ["user:click_item_seq", "user:click_valid_mask"],
      "value_type": "string"
    },
    {
      "feature_type": "bool_mask_feature",
      "feature_name": "high_price_items",
      "expression": ["user:view_item_prices", "user:high_price_mask"],
      "value_type": "float",
      "sequence_delim": ","
    },
    {
      "feature_type": "bool_mask_feature",
      "feature_name": "recent_actions",
      "expression": ["user:action_seq", "user:recent_mask"],
      "value_type": "string",
      "sequence_delim": "|"
    }
  ]
}

工作原理:

  • expression 第一个字段为待过滤的序列
  • expression 第二个字段为 mask(布尔值或 0/1)
  • mask 为 true/1 的元素保留,false/0 的元素过滤

示例:

click_item_seq click_valid_mask 输出
"1001,1002,1003,1004,1005" "1,0,1,1,0" ["1001", "1003", "1004"]

8.6 切片特征

Slice 特征用于对数组进行切片操作,支持 Python 风格的切片语法。

配置示例:

{
  "features": [
    {
      "feature_type": "slice_feature",
      "feature_name": "recent_10_clicks",
      "expression": "user:click_seq",
      "slice": "0:10:1",
      "value_type": "string"
    },
    {
      "feature_type": "slice_feature",
      "feature_name": "last_5_items",
      "expression": "user:click_seq",
      "slice": "-5:",
      "value_type": "string"
    },
    {
      "feature_type": "slice_feature",
      "feature_name": "even_position_items",
      "expression": "user:click_seq",
      "slice": "::2",
      "value_type": "string"
    },
    {
      "feature_type": "slice_feature",
      "feature_name": "reversed_seq",
      "expression": "user:click_seq",
      "slice": "::-1",
      "value_type": "string"
    },
    {
      "feature_type": "slice_feature",
      "feature_name": "first_item",
      "expression": "user:click_seq",
      "slice": "0",
      "value_type": "string"
    }
  ]
}

Slice 语法说明:

Slice 说明 示例(输入 [1,2,3,4,5])
0:10:1 从索引 0 到 10,步长 1 [1,2,3,4,5]
-5: 最后 5 个元素 [1,2,3,4,5]
::2 从头到尾,步长 2(偶数索引) [1,3,5]
::-1 反转序列 [5,4,3,2,1]
0 单个索引(取第一个元素) 1
1:3 索引 1 到 3(不含) [2,3]
:2 从开始到索引 2(不含) [1,2]
2: 从索引 2 到结束 [3,4,5]

附录:通用配置字段

以下字段在多数特征算子中通用:

字段名 类型 说明
feature_type string 特征算子类型(必填)
feature_name string 输出特征名称(必填)
expression string/array 输入字段来源
value_type string 输出值类型:string/int32/int64/float/double
default_value string 空值时的默认值
value_dimension int 输出维度,0 表示不截断
separator string 多值分隔符,默认 \u001D
stub_type bool true 表示仅作为中间结果,不输出给模型
normalizer string 归一化配置
num_buckets int 分箱数量(用于离散化)
hash_bucket_size int Hash 分桶大小

总结

FG 提供了丰富的特征算子,覆盖从基础特征处理到复杂特征工程的各种场景:

  1. 基础算子:id_feature、raw_feature 处理离散和连续特征
  2. 计算算子:expr_feature 支持复杂表达式计算
  3. 交叉算子:combo_feature 实现特征笛卡尔积
  4. 查找算子:lookup_feature、match_feature 支持字典查询
  5. 文本算子:overlap_feature、bm25_feature 处理文本相关性
  6. 序列算子:sequence_feature、sequence_combine_feature 处理行为序列
  7. 处理算子:tokenize_feature、text_normalizer 进行文本预处理
  8. 变换算子:str_replace_feature、regex_replace_feature 进行字符串替换
  9. 工具算子:bool_mask_feature、slice_feature 提供数组操作能力

根据业务场景选择合适的特征算子,可以高效地完成特征工程任务。

相关文章
|
22天前
|
机器学习/深度学习 JSON 自然语言处理
PAI-Rec 特征工程全解析:统计特征、实时特征、序列特征与 FG 特征算子
PAI-Rec是阿里云智能推荐的特征工程解决方案,支持离线统计、实时及序列特征自动衍生,并通过Feature Generator(17种内置算子)保障离线/在线特征一致性,大幅降低开发与维护成本。
284 9
|
29天前
|
存储 安全 Java
你还在手动传包、靠“共享盘”发版本?Artifact Registry 才是依赖管理的终局答案!
你还在手动传包、靠“共享盘”发版本?Artifact Registry 才是依赖管理的终局答案!
320 16
|
27天前
|
人工智能 前端开发 Java
【SpringAIAlibaba新手村系列】(4)流式输出与响应式编程
本文围绕 Spring AI 中的流式输出与响应式编程展开,重点解释了传统一次性响应与流式返回的差异,以及 Flux 在异步数据流中的核心作用。文章结合 ChatModel.stream() 与 ChatClient 的多种代码示例,说明如何实现 AI 内容的边生成边返回,并帮助读者理解流式调用在用户体验、性能和长文本场景中的实际价值。
560 4
【SpringAIAlibaba新手村系列】(4)流式输出与响应式编程
|
22天前
|
机器学习/深度学习 分布式计算 搜索推荐
PAI-Rec 召回引擎:构建高性能推荐系统的核心引擎
PAI-Rec是阿里云智能推荐平台的核心召回引擎,经阿里大规模场景验证。支持多路召回融合(U2I/I2I/向量/随机)、召回即过滤、毫秒级实时更新与分布式弹性架构,开箱即用,助力企业构建毫秒级、高精度、强实时的推荐系统。
178 9
|
20天前
|
存储 搜索推荐 PyTorch
为什么使用 TorchRec 训练和推理更快
本文结合TorchEasyRec实践,从四大维度解析推荐系统加速:1)KeyedJaggedTensor统一变长特征,实现Embedding批量融合查找;2)自动分布式分片突破单卡显存瓶颈;3)TrainPipelineSparseDist流水线并行,重叠通信与计算;4)fbgemm-gpu融合优化器,减少显存访问。端到端提升训练效率与扩展性。
176 9
|
22天前
|
机器学习/深度学习 搜索推荐 数据处理
PAI-Rec推荐开发平台:企业级智能推荐解决方案,驱动业务全域增长
PAI-Rec是阿里云一站式推荐系统平台,集成多路召回、多目标精排(如DBMTL)、GPU加速推理与灵活迭代能力,已助力电商、直播、音视频等多行业提升点击率、转化率与ROI,实现高效、低成本、可自主演进的智能推荐。
190 16
|
22天前
|
弹性计算 JavaScript 固态存储
2026年阿里云ECS新手入门指南:从零开始部署你的第一个应用
本文是作者基于两年阿里云ECS真实使用经验撰写的实战指南,涵盖选型建议、新手部署(含Node.js示例)、成本优化技巧,并附新用户专属优惠链接。内容客观实用,助力开发者低成本高效上云。(239字)
305 15
|
25天前
|
人工智能 自然语言处理 安全
超简单!用云效 + AI 编程工具智能管理代码仓库
本文将介绍如何通过阿里云云效 MCP 提升日常代码管理的效率。传统的 Git 命令行操作将被自然语言交互所取代,大幅降低代码管理的学习成本和操作复杂度。
|
21天前
|
分布式计算 MaxCompute iOS开发
TorchEasyRec 在 macOS 上的功能限制总结
本文总结tzrec在macOS上的功能限制:核心依赖(如torchrec、fbgemm-gpu、graphlearn等)无法安装;分布式训练、原生数据管线、Embedding模块、Triton/CUDA算子、TDM树模型等功能完全不可用;优化器与模型导出部分失效;单元测试大多因强依赖而失败。
130 15