用简历实体模型分析人力资源情况-阿里云开发者社区

用简历实体模型分析人力资源情况

2022-08-17 31289

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： HR妹子说校招季来了，简历铺天盖地，随便看了几个，不得不说现在的简历实在是太“卷”了。我就突然很想分析下今年的校招投递的简历的整体情况。无意中发现modelscope里提供了简历实体识别的模型。提供了对简历里几种重要实体的识别（https://modelscope.cn/#/models/damo/nlp_raner_named-entity-recognition_chinese-base-resume/summary）可以拿来用下。真实的投递简历当然是不能公开的啦，这里就以热心网友提供为公开的简历数据（https://paperswithcode.com/dataset/resume-n

-------------------我是正文分割线---------------------

分析流程

将简历内容调用简历实体识别模型识别实体内容，调用方法参考官方给出的代码范例。
将分析结果存储到hive并进行数据分析。
对接FineBI进行数据展示。

分析结果

我选了三个实体类型：专业、学历、职称（Emm, 其实很想选学校，但是这个模型不区分学校和企业）

数据量总共1508条，识别出有专业的有20条，有学历的数据有108条，有职称的数据有695条。（Emm, 为啥有人不写专业呢）

ODS(hive)=>DWS(hive)=>APP(mysql)

话不多说，上图：

学历大部分集中在大专以上，本科居多，可能是数据都是在职员工的简历吧，如果是现在的校招简历，一沓一沓的硕士。

职称看起来都是很高级的职位，可能是数据来源是公开简历，我等小透明也不会去公开简历。

专业集中在经管类，对着职称一票的经理董事，想问下我等码农专业还有机会吗？

最后，说下总体的使用感受吧：

识别准确率还是蛮高的，对行业、学历、职称的识别度较高，几乎没有识别错的，就是跑的有点慢 (小pc瑟瑟发抖)
单是一个抽取模型，不能将同义词进行归一，如识别出来大学本科、本科、本科学历，对BI还是有点不够用。
实体类型有点少，ORG类型有点粗，不能区分学校和企业。这个好像是原始训练数据就是这样？

附件

模型调用

frommodelscope.pipelinesimportpipelinefrommodelscope.utils.constantimportTasksimportjsonner_pipeline=pipeline(Tasks.named_entity_recognition, 'damo/nlp_raner_named-entity-recognition_chinese-base-resume')
result_file=open("./result.txt", "w", encoding="utf-8")
withopen("./test.txt", "r", encoding="utf-8") asf:
forlineinf.readlines():
result=ner_pipeline(line)
result_file.write(json.dumps(result) +"\n")
result_file.close()

ner结果

result.txt

生成ODS并导入到hive

ods_f=open("ods.csv", "w", encoding="utf-8")
withopen("./result.txt", 'r', encoding="utf-8") asf:
forlineinf.readlines():
output=eval(line).get("output")
print(output)
fortype_listinoutput:
dict_one= {}
dict_one[type_list.get("type")] =type_list.get("span")
name=dict_one.get("NAME", '-1')
occupation=dict_one.get("PRO", "-1")
education=dict_one.get("EDU", '-1')
title=dict_one.get("TITLE", '-1')
s1=name+"\t"+occupation+"\t"+education+"\t"+title+"\n"ods_f.write(s1)
ods_f.close()

ods.csv

--建库建表

create database jianli default character set utf8mb4 collate UTF8MB4_UNICODE_CI;CREATETABLE jianli_ods (  name VARCHAR(30),  education VARCHAR(30),  occupation VARCHAR(30),  title VARCHAR(30));load data local inpath '/root/ods.csv'intotable jianli_ods  partition(create_day='2022-08-16');

生成DWS（hive中操作）

-- 建库建表

USE jianli;CREATETABLE jianli_app (  group_type VARCHAR(30),  occupation_name VARCHAR(30),  occupation_count INT,  education_name VARCHAR(30),  education_count INT,  title_name VARCHAR(30),  title_count INT) row format delimited fields terminated by'\t' stored as textfile;INSERTINTO jianli.jianli_app(group_type, occupation_name, occupation_count, education_name, education_count, title_name,title_count)SELECT'1'as group_type,occupation as occupation_name,count(name)as occupation_count,'-1'as education_name,0as education_count,'-1'as title_name,0as title_count
from jianli.jianli_odsgroupby occupation;INSERTINTO jianli.jianli_app(group_type, occupation_name, occupation_count, education_name, education_count, title_name,title_count)SELECT'2'as group_type,'-1'as occupation_name,0as occupation_count,education as education_name,count(name)as education_count,'-1'as title_name,0as title_count
from jianli.jianli_odsgroupby education;INSERTINTO jianli.jianli_app(group_type, occupation_name, occupation_count, education_name, education_count, title_name,title_count)SELECT'3'as group_type,'-1'as occupation_name,0as occupation_count,'-1'as education_name,0as education_count,title as title_name,count(name)as title_count
from jianli.jianli_odsgroupby title;sqoop export \
--connect jdbc:mysql://xx.xx.xx.xx:3306/jianli \--username root --password xxxx \--table jianli_app \ --hcatalog-database jianli \ --hcatalog-table jianli_app \ -m 1

mysql对接FineBI

用简历实体模型分析人力资源情况

分析流程

分析结果

附件

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

用简历实体模型分析人力资源情况

分析流程

分析结果

附件

热门文章

最新文章

相关课程

相关电子书

相关实验场景